Publications

You can also find my articles on my Google Scholar profile.

Conference Papers


SHIFT: Selected Helpful Informative Frame for Video-guided Machine Translation

SHIFT: Selected Helpful Informative Frame for Video-guided Machine Translation

This paper introduces SHIFT (Selected Helpful Informative Frame for Translation), a lightweight, plug-and-play framework for video-guided machine translation (VMT) that adaptively selects only the most informative video frame—or none when unnecessary—to improve translation quality and efficiency using multimodal large language models (MLLMs).

PDF Code Slides

TriFine: A Large-Scale Dataset of Vision-Audio-Subtitle for Tri-Modal Machine Translation and Benchmark with Fine-Grained Annotated Tags

TriFine: A Large-Scale Dataset of Vision-Audio-Subtitle for Tri-Modal Machine Translation and Benchmark with Fine-Grained Annotated Tags

This paper introduces TriFine, the first large-scale dataset for tri-modal (vision, audio, subtitle) machine translation with fine-grained annotated tags, and proposes a novel translation method FIAT that leverages this fine-grained information to achieve superior translation performance.

PDF Code Dataset Slides