I am currently a third-year Ph.D. student at the Institute of Automation, Chinese Academy of Sciences, and a member of the State Key Laboratory of Multimodal Artificial Intelligence Systems, where I am advised by Prof. Chengqing Zong (宗成庆) and Assoc. Prof. Yang Zhao (赵阳).

My research lies at the intersection of Natural Language Processing (NLP) and Multimodal Large Language Models (MLLMs), with a current focus on Video-Guided Machine Translation (VMT). In this line of work, I explore how visual and linguistic signals can be effectively fused to enhance translation performance and efficiency.

Looking ahead, I am particularly interested in expanding my research to video question answering and broader topics in multimodal understanding.

You can find my CV here: Boyu Guan’s Curriculum Vitae. If you’re interested in collaboration or would like to chat, feel free to reach out to me at guanboyu2022[at]ia.ac.cn.

📚 Education:

2022.09 – 2027.06 (Expected) Ph.D. in Computer Science at the Institute of Automation, Chinese Academy of Sciences, Beijing, China, under the supervision of Prof. Chengqing Zong.
2018.09 – 2022.06 B.Sc. in Mathematics, School of Science, Northeastern University, Shenyang, China

📰 News:

2025.09: 👨‍🏫 I will serve as a teaching assistant for the Fall 2025 undergraduate course Practical Natural Language Processing at the University of Chinese Academy of Sciences (UCAS).
2025.08: 🎉🎉 Our paper was accepted to EMNLP 2025 main conference, looking forward to seeing you in Suzhou.
2025.03: 👨‍🏫 I will serve as a teaching assistant for the Spring 2025 Natural Language Processing course for Ph.D. students at Zhongguancun Academy.
2024.12: 🎉🎉 Our paper was accepted to COLING 2025 (oral)! Looking forward to seeing you in Abu Dhabi.
2024.09: 👨‍🏫 I will serve as a teaching assistant for the Fall 2024 undergraduate course Practical Natural Language Processing at the University of Chinese Academy of Sciences (UCAS).

Publications

You can also find my articles on my Google Scholar profile. ^* Equal contribution. ^# Corresponding author.

Conference Papers

SHIFT: Selected Helpful Informative Frame for Video-guided Machine Translation

Boyu Guan, Chuang Han, Yining Zhang, Yupu Liang, Zhiyang Zhang, Yang Zhao^#, Chengqing Zong^#

CCF B EMNLP 2025 Main Conference

This paper introduces SHIFT (Selected Helpful Informative Frame for Translation), a lightweight, plug-and-play framework for video-guided machine translation (VMT) that adaptively selects only the most informative video frame—or none when unnecessary—to improve translation quality and efficiency using multimodal large language models (MLLMs).

PDF Code Slides

TriFine: A Large-Scale Dataset of Vision-Audio-Subtitle for Tri-Modal Machine Translation and Benchmark with Fine-Grained Annotated Tags

Boyu Guan, Yining Zhang, Yang Zhao^#, Chengqing Zong

CCF B COLING 2025 Oral (9.8%)

This paper introduces TriFine, the first large-scale dataset for tri-modal (vision, audio, subtitle) machine translation with fine-grained annotated tags, and proposes a novel translation method FIAT that leverages this fine-grained information to achieve superior translation performance.

PDF Code Dataset Slides

Patent

1. A Video Machine Translation Method and Device Integrating Fine-Grained Multimodal Information

(Invention Patent Under Substantive Examination)
Authors: Yang Zhao (Advisor), Boyu Guan, Yining Zhang, Chengqing Zong

2. An Adaptive Key Frame Selection Method for Video Machine Translation

(Invention Patent Under Substantive Examination)
Authors: Yang Zhao (Advisor), Boyu Guan, Chuang Han, Chengqing Zong

💻 Internships

2023.02 – 2023.08 Software Engineering Intern, Biren Technology (壁仞科技), Beijing, China.
Worked on the migration and optimization of pre-training and inference pipelines for large language models (LLMs), including LLaMA, LLaMA2, and ChatGLM. Responsibilities included architecture adaptation and efficiency improvements such as activation checkpointing.

Boyu Guan (管博宇)