Yatian Pang (庞雅天)

Greetings! My name is Yatian Pang. I'm a final-year PhD student at the National University of Singapore supervised by Prof. Tay E. H. Francis and working closely with Prof. Li Yuan from PKU. I'm currently a research intern at Alibaba's Qwen team working with Shuai Bai and Dr. Hang Zhang.

My research interests are multi-modal understanding and generation, as well as unified models. My long-term goal is to develop general-purpose AI systems that seamlessly understand and generate content across diverse modalities.

I am ACTIVELY looking for a full-time position starting in 2026. Feel free to send me an email at yatian_pang@u.nus.edu if you're interested.

Email  /  Google Scholar  /  Github  /  Linkedin  / 

profile photo
Projects
[understanding] Qwen-3 VL
Qwen Team, Yatian Pang
2025 (Work in Progress)
[Project Page]

Core contributor to video understanding development for Qwen3-VL, especially with a focus on long video and streaming video understanding. Achieved state-of-the-art results on various benchmarks.

[unify] UniWorld
Bin Lin, et al., Yatian Pang, Li Yuan
2025
[arXiv link] [code]

Unified framework for image understanding, generation, and editing following GPT-4o. Solved critical challenge of connecting frozen VLMs with diffusion generators via novel semantic encoder.

[generation] Open-Sora-Plan
Open-Sora-Plan Team, Yatian Pang
2024
[arXiv link] [code]

One of the earliest open-source efforts to reproduce Sora. Contributed to high-resolution, long-duration video generation architecture and open-sourced high quality video data .

Selected works
[understanding] Video Sparse Attention for Streaming Long Video Understanding
Yatian Pang, et al.
Submission in progress, 2025
[arXiv link (coming soon)]
[unify] Unified Autoregressive Pretraining for Image Generation and Representation Learning
Yatian Pang, et al.
Submission in progress, 2025
[arXiv link (coming soon)]
[generation] Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang, et al.
arXiv, 2024
[arXiv link] [code]
[generation] DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
Yatian Pang, et al.
ICCV 2025
[arXiv link] [code]
[generation] Envision3D: One Image to 3D with Anchor Views Interpolation
Yatian Pang, et al.
arXiv, 2024
[arXiv link]
[understanding] Masked autoencoders for point cloud self-supervised learning
Yatian Pang, et al.
ECCV, 2022
[arXiv link] [code]
[understanding] MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Bin Lin, et al., Yatian Pang, et al.
IEEE Transactions on Multimedia (TMM), 2024
[arXiv link] [code]
[understanding] LanguageBind: Extending Video-Language Pretraining to N-Modality by Language-Based Semantic Alignment
Bin Zhu, et al., Yatian Pang, et al.
International Conference on Learning Representations (ICLR) 2024
[arXiv link] [code]
Education
National University of Singapore, Singapore
Ph.D.
Aug 2021 - Present
Advisor: Prof. Tay Eng Hock, Francis
Collaborator: Prof. Li Yuan (Peking University)

template from here