Hi, I am currently a graduate student in the Institute of Computational Linguistics at Peking University, advised by Prof. Baobao Chang. Before that, I earned my bachelor’s degree from Tianjin University. I have had the privilege of collaborating with Prof. Minjia Zhang at the University of Illinois Urbana-Champaign and Prof. Junjie Hu at the University of Wisconsin–Madison. I have also engaged in research projects supported by Microsoft Research Asia, gaining exposure to both academic and industrial AI research settings.

I am fortunate to have collaborated with researchers at PKU, UIUC, Wisc, THU, and Microsoft Research Asia.

More information available in the 🔗 Resume

My research interests revolve around Foundation Language Models, Multimodal Learning, and Data-centric AI:

  • Data-centric Approaches: Constructing and curating synthetic and structured data to enhance foundation model capabilities while reducing training costs.
  • Multimodal Integration: Improving vision-language models through multimodal instruction tuning, alignment strategies, and reducing hallucinations.

My research interest includes neural machine translation and computer vision. I have published 8 papers at the top international AI conferences with total google scholar citations .

🔥 News

  • 2024.09:  🎉🎉 Our paper, “UltraEdit: Instruction-based Fine-Grained Image Editing at Scale”, has been accepted by NeurIPS 2024!

📝 Publications

ICLR 2024
sym

MMICL: Empowering vision-language model with multi-modal in-context learning

Haozhe Zhao, Zefan Cai, Shuzheng Si, Xiaojian Ma, Kaikai An, Liang Chen, Zixuan Liu, Sheng Wang, Wenjuan Han, Baobao Chang

[paper] [code]

  • The Twelfth International Conference on Learning Representations(ICLR), 2024.
NeurIPS 2024
sym

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang

[paper] [code]

  • The Thirty-Eighth Annual Conference on Neural Information Processing Systems(NeurIPS), 2024.
NAACL 2024
sym

Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

Haozhe Zhao, Zefan Cai, Shuzheng Si, Liang Chen, Yufeng He, Kaikai An, Baobao Chang

[paper] [code]

  • 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics(NAACL), 2024.
ECCV 2024, Oral
sym

An image is worth 1/2 tokens after layer 2: Plug-and-play inference acceleration for large vision-language models

Liang Chen, Haozhe Zhao, Tianyu Liu, Shuai Bai, Junyang Lin, Chang Zhou, Baobao Chang

[paper] [code]

  • The 18th European Conference on Computer Vision(ECCV), 2024, Oral.
preprint
sym

Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance

Haozhe Zhao, Shuzheng Si, Liang Chen, Yichi Zhang, Maosong Sun, Mingjia Zhang, Baobao Chang

[paper] [code]

  • Completed during an internship at UIUC under the supervision of Prof. Minjia Zhang.
preprint
sym

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Liang Chen*, Zekun Wang*, Shuhuai Ren*, Lei Li*, Haozhe Zhao*, Yunshui Li*, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Lingwei Meng, Shujie Hu, Yulong Chen, Junyang Lin, Shuai Bai, Andreas Vlachos, Xu Tan, Minjia Zhang, Wen Xiao, Aaron Yee, Tianyu Liu, Baobao Chang

[paper] [code]

  • We investigate the emerging trend in multimodal intelligence: the paradigm of next-token prediction.

📖 Educations

  • 2022.09 - 2025.06 (Expected), M.S. in Software Engineering, Peking University.
  • 2018.09 - 2022.06, Bachelor in Software Engineering, Tianjin University.