Yatian Pang (庞雅天)

Greetings! My name is Yatian Pang. I'm a final-year PhD student at the National University of Singapore supervised by Prof. Tay E. H. Francis and working closely with Prof. Li Yuan from PKU. I'm currently a research intern at Alibaba's Qwen team working with Shuai Bai and Dr. Hang Zhang.

My research interests are multi-modal understanding and generation, as well as unified models. My long-term goal is to develop general-purpose AI systems that seamlessly understand and generate content across diverse modalities.

I’m always happy to chat about my work or my field of study. Feel free to send me an email at yatian_pang@u.nus.edu if you're interested.

Email  /  Google Scholar  /  Github  /  Linkedin  / 

profile photo
Projects
[understanding] Qwen-3 VL
Qwen Team, Yatian Pang
2025
[Project Page] [code]

Core contributor to video understanding tasks for Qwen3-VL, especially with a focus on long video and streaming video understanding. Achieved state-of-the-art results on various benchmarks.

[unify] UniWorld
Bin Lin, et al., Yatian Pang, Li Yuan
2025
[arXiv link] [code]

Unified framework for image understanding, generation, and editing following GPT-4o. Solved critical challenge of connecting frozen VLMs with diffusion generators via novel semantic encoder.

[generation] Open-Sora-Plan
Open-Sora-Plan Team, Yatian Pang
2024
[arXiv link] [code]

One of the earliest open-source efforts to reproduce Sora. Contributed to high-resolution, long-duration video generation architecture and open-sourced high quality video data .

Selected works
[understanding] Video Sparse Attention for Streaming Long Video Understanding
Yatian Pang, et al.
Submission in progress, 2025
[arXiv link (coming soon)]
[unify] Unified Autoregressive Pretraining for Image Generation and Representation Learning
Yatian Pang, et al.
Submission in progress, 2025
[arXiv link (coming soon)]
[generation] Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang, et al.
arXiv, 2024
[arXiv link] [code]
[generation] DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
Yatian Pang, et al.
ICCV 2025
[arXiv link] [code]
[generation] Envision3D: One Image to 3D with Anchor Views Interpolation
Yatian Pang, et al.
arXiv, 2024
[arXiv link]
[understanding] Masked autoencoders for point cloud self-supervised learning
Yatian Pang, et al.
ECCV, 2022
[arXiv link] [code]
[understanding] MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Bin Lin, et al., Yatian Pang, et al.
IEEE Transactions on Multimedia (TMM), 2024
[arXiv link] [code]
[understanding] LanguageBind: Extending Video-Language Pretraining to N-Modality by Language-Based Semantic Alignment
Bin Zhu, et al., Yatian Pang, et al.
International Conference on Learning Representations (ICLR) 2024
[arXiv link] [code]
Education
National University of Singapore, Singapore
Ph.D.
Aug 2021 - Present
Advisor: Prof. Tay Eng Hock, Francis
Collaborator: Prof. Li Yuan (Peking University)

template from here