Min Zhao(赵敏)

Assistant Professor
School of Artificial Intelligence, Nanjing University
Email:  gracezhao1997@gmail.com

[Google Scholar] [Github] [WeChat] [Xiaohongshu]

TSAIL Group
[TSAIL Github] [TSAIL Homepage]

I am an Assistant Professor at the School of Artificial Intelligence, Nanjing University. Previously, I was a postdoctoral researcher at the TSAIL Group, Department of Computer Science and Technology, Tsinghua University, working under the supervision of Prof. Jun Zhu. I received my Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences in 2024. From 2022 to 2024, I was a visiting student at the TSAIL Group, Tsinghua University, where I worked closely with Prof. Jun Zhu, Prof. Chongxuan Li, and Dr. Fan Bao.

My current research interests focus on video world models — real-time, interactive, and physics-aware generative video systems. Along this line, I developed the Causal Forcing series (Causal Forcing, Causal Forcing++), which enables autoregressive diffusion distillation to be performed properly, and led minWM, the first full-stack open-source framework for real-time interactive video world models. I also curate Awesome Video World Models with AR Diffusion, a community repository that systematizes recent advances and emerging paradigms in this direction. I am actively recruiting prospective PhD students, master's students, and research interns who are passionate about generative models and video world models. I am also open to academic and industrial collaborations on real-time, physics-aware video world models. Please feel free to reach out.

Selected Publications Full Publications

    Video World Models
  • minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models
    Min Zhao*, Hongzhou Zhu*, Bokai Yan*, Zihan Zhou*, Yimin Chen*, Wenqiang Sun, Kaiwen Zheng, Guande He, Xiao Yang, Chongxuan Li, Fan Bao, Jun Zhu
    Technical Report, 2026
    the first full-stack open-source video world model framework
    [Paper] [Code]
  • Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation
    Hongzhou Zhu*, Min Zhao*, Guande He, Chongxuan Li, Jun Zhu
    International Conference on Machine Learning (ICML), 2026
    [Paper] [Code] [Website]
  • Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation
    Min Zhao*, Hongzhou Zhu*, Kaiwen Zheng, Zihan Zhou, Bokai Yan, Xinyuan Li, Xiao Yang, Chongxuan Li, Jun Zhu
    Technical Report, 2026
    [Paper] [Code]
  • Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
    Fan Bao, Chendong Xiang*, Gang Yue*, Guande He*, Hongzhou Zhu*, Kaiwen Zheng*, Min Zhao*, Shilong Liu*, Jun Zhu
    Technical Report, 2024
    the first large-scale text-to-video foundation models comparable to Sora
    [Tech Report] [Try Vidu]
  • Long Video and High-Resolution Image Generation
  • RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers
    Min Zhao, Guande He, Yixiao Chen, Hongzhou Zhu, Chongxuan Li, Jun Zhu
    International Conference on Machine Learning (ICML), 2025
    [Paper] [Code] [Website]
  • UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers
    Min Zhao, Hongzhou Zhu, Yingze Wang, Bokai Yan, Jintao Zhang, Guande He, Ling Yang, Chongxuan Li, Jun Zhu
    International Conference on Learning Representations (ICLR), 2026
    [Paper] [Code] [Website]
  • UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers
    Min Zhao, Bokai Yan, Xue Yang, Hongzhou Zhu, Jintao Zhang, Shilong Liu, Chongxuan Li, Jun Zhu
    Preprint, 2026
    [Paper] [Code] [Website]
  • Controllable generation
  • EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations
    Min Zhao, Fan Bao, Chongxuan Li, Jun Zhu
    Advances in Neural Information Processing Systems (NeurIPS), 2022
    [Paper] [Code] [Talk]
  • Identifying and Solving Conditional Image Leakage in Image-to-Video Generation
    Min Zhao, Hongzhou Zhu, Chendong Xiang, Kaiwen Zheng, Chongxuan Li, Jun Zhu
    Advances in Neural Information Processing Systems (NeurIPS), 2024
    [Paper] [Code] [Website]
  • Controlvideo: Adding conditional control for one shot text-to-video editing
    Min Zhao, Rongzhen Wang, Fan Bao, Chongxuan Li, Jun Zhu
    Science China Information Sciences, 2024
    [Paper] [Code] [Website] [Zhihu]

Mentored Students

  • Hongzhou Zhu (Incoming PhD Student at Tsinghua University, with Prof. Jun Zhu)
  • Bokai Yan (PhD Student at Renmin University of China, with Prof. Chongxuan Li)
  • Xue Yang (Incoming PhD Student at Peking University, with Prof. Siwei Ma)
  • Yu Huang (Incoming PhD Student at Tsinghua University, with Prof. Yinpeng Dong)
  • Jiaming Wu (Undergraduate Student at Tsinghua University, with Prof. Yinpeng Dong)
  • Yingze Wang (Undergraduate Student at Tsinghua University, with Prof. Jun Zhu)
  • Yixiao Chen (Undergraduate Student at Tsinghua University, with Prof. Jun Zhu)

Selected Awards

  • Tsinghua University, Shuimu Tsinghua Scholar, 2024
  • IEEE International Symposium on Biomedical Imaging (ISBI), Best Student Paper Nomination, 2024