I'm Xingjian Diao, a Ph.D. candidate in Computer Science at Dartmouth College 🌲, co-advised by Prof. Soroush Vosoughi and Prof. Jiang Gui. During my Ph.D. at Dartmouth, I interned twice at Amazon on computer vision and robotics (Summer 2025) and VLM systems (Summer 2026), and at Samsung Research America on agentic memories (Spring 2026).
Previously, I completed my M.S. in Computer Science at Northwestern University 💜, advised by Prof. Nabil Alshurafa. I received my B.S. in Computer Science from the University of Pittsburgh 💙, graduating with Cum Laude honors.
My research focuses on multimodal learning for video, audio, and language understanding. I have developed methods for multimodal reasoning, efficient multimodal learning, and generative multimodal modeling, aiming to build scalable and generalizable multimodal models that advance multimodal question answering, video understanding, and audio–visual reasoning across complex real-world scenarios and dynamic environments. Highlights of my work include:
-
Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization
Findings of ACL 2026
Xingjian Diao, Zheyuan Liu, Chunhui Zhang, Weiyi Wu, Keyi Kong, Lin Shi, Kaize Ding, Soroush Vosoughi, Jiang Gui -
SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models
EMNLP 2025 — (Oral Presentation, top 4.35%)
Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Peijun Qing, Soroush Vosoughi, Jiang Gui -
ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering
EMNLP 2025 — (Oral Presentation, top 4.35%)
Xingjian Diao, Weiyi Wu, Keyi Kong, Peijun Qing, Xinwen Xu, Ming Cheng, Soroush Vosoughi, Jiang Gui -
Temporal Working Memory: Query-Guided Temporal Segment Refinement for Enhanced Multimodal Understanding
Findings of NAACL 2025 — Guarini Graduate Student Travel Award (Dartmouth College)
Xingjian Diao, Chunhui Zhang, Weiyi Wu, Zhongyu Ouyang, Peijun Qing, Ming Cheng, Soroush Vosoughi, Jiang Gui -
Learning Musical Representations for Music Performance Question Answering
Findings of EMNLP 2024 — BMDS Travel Award (Dartmouth College)
Xingjian Diao, Chunhui Zhang, Tingxuan Wu, Ming Cheng, Zhongyu Ouyang, Weiyi Wu, Jiang Gui -
Learning Sparsity for Effective and Efficient Music Performance Question Answering
ACL 2025
Xingjian Diao, Tianzhen Yang, Chunhui Zhang, Weiyi Wu, Ming Cheng, Jiang Gui -
FT2TF: First-Person Statement Text-To-Talking Face Generation
WACV 2025
Xingjian Diao, Ming Cheng, Wayner Barrios, SouYoung Jin
-
Amazon Science (Jun 2026 – Sept 2026)
Applied Scientist Intern, Sunnyvale, CA
Research on vision language models. -
Samsung Research America (Mar 2026 – Jun 2026)
NLP Research Intern, Mountain View, CA
Research on agentic memories. -
Amazon Science (Jun 2025 – Sept 2025)
Applied Scientist Intern, Santa Cruz, CA
Research on computer vision and robotics.

