I am currently a third-year Ph.D. student at the Institute of Automation, Chinese Academy of Sciences, and a member of the State Key Laboratory of Multimodal Artificial Intelligence Systems, where I am advised by Prof. Chengqing Zong (宗成庆) and Assoc. Prof. Yang Zhao (赵阳).
My research lies at the intersection of Natural Language Processing (NLP) and Multimodal Large Language Models (MLLMs), with a current focus on Video-Guided Machine Translation (VMT). In this line of work, I explore how visual and linguistic signals can be effectively fused to enhance translation performance and efficiency.
Looking ahead, I am particularly interested in expanding my research to video question answering and broader topics in multimodal understanding.
You can find my CV here: Boyu Guan’s Curriculum Vitae. If you’re interested in collaboration or would like to chat, feel free to reach out to me at guanboyu2022[at]ia.ac.cn.
🔥Seeking internship opportunities in NLP and Multimodal LLMs.🔥
📚 Education:
- 2022.09 – 2027.06 (Expected) Ph.D. in Computer Science at the Institute of Automation, Chinese Academy of Sciences, Beijing, China, under the supervision of Prof. Chengqing Zong.
- 2018.09 – 2022.06 B.Sc. in Mathematics, School of Science, Northeastern University, Shenyang, China
📰 News:
- 2025.03: 👨🏫 I will serve as a teaching assistant for the Natural Language Processing course for Ph.D. students at Zhongguancun Academy.
- 2024.12: 🎉🎉 Our paper was accepted to COLING 2025 and selected for an oral presentation! Looking forward to seeing you in Abu Dhabi.
- 2024.09: 👨🏫 I will serve as a teaching assistant for the Practical Natural Language Processing course for undergraduate students at the University of Chinese Academy of Sciences (UCAS)
Publications
Conference Papers

TriFine: A Large-Scale Dataset of Vision-Audio-Subtitle for Tri-Modal Machine Translation and Benchmark with Fine-Grained Annotated Tags
This paper introduces TriFine, the first large-scale dataset for tri-modal (vision, audio, subtitle) machine translation with fine-grained annotated tags, and proposes a novel translation method FIAT that leverages this fine-grained information to achieve superior translation performance.
💻 Internships
- 2023.02 – 2023.08 Software Engineering Intern, Biren Technology (壁仞科技), Beijing, China.
Worked on the migration and optimization of pre-training and inference pipelines for large language models (LLMs), including LLaMA, LLaMA2, and ChatGLM. Responsibilities included architecture adaptation and efficiency improvements such as activation checkpointing.