About Me
I am a second-year Master’s student at Tsinghua University and a member of the CVML Lab, advised by Prof. Chun Yuan.
Before joining Tsinghua, I received my B.S. in Computer Science and Technology from the Central University of Finance and Economics in 2025. My recent research focuses on training better foundation multimodal large language models.
Research Interests
- Multimodal large language models
- Controllable video generation and world models
- Multimodal Image Fusion
- Remote sensing understanding and reasoning
Education
- Tsinghua University, M.S. in Computer Technology, 2025 - present
- Central University of Finance and Economics, B.S. in Computer Science and Technology, 2021 - 2025
Publications
Image Generation
Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach
TPAMI 2026 IF:18.6 Diffusion Transformer Text-Controlled Image Fusion Multimodal Segmentation
RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion From The Perspective of Referring Image Segmentation
ICASSP 2026 Oral Infrared-Visible Fusion Referring Image Segmentation Text-Driven Fusion
Two in One: Robust Fusion of Infrared and Visible Images in Rainy Condition
JAS 2026 IF:19.2 Infrared-Visible Fusion Rain Removal Robust Perception Coupled Restoration
Where Fusion Meets Dehazing: A Coupled Framework for Robust Visible-Infrared Image Fusion in Haze
TIP Under Review Infrared-Visible Fusion Image Dehazing Adverse Weather Coupled Restoration
Multimodal Understanding
FOVIS: Foveated Vision for Ultra-High-Resolution Remote Sensing Reasoning
Under Review Remote Sensing Reasoning Ultra-High Resolution Foveated Attention
Look Where It Matters: Training-Free Ultra-HR Remote Sensing VQA via Adaptive Zoom Search
Arxiv 2025 Remote Sensing VQA Ultra-High Resolution Training-Free Plug-and-play
GRASP: Geospatial Pixel Reasoning via Structured Policy Learning
Arxiv 2025 Remote Sensing Geospatial Pixel Reasoning Structured Policy Learning
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments
Arxiv 2025 Embodied AI VLA Benchmark Dynamic Spatial Reasoning Multi-Step Action
Internship

XPENG Motors
Embodied AI Research Intern · Topic: VLM Pre-training · Mentor: TBD.
July 2026 - Present

Tencent LIGHTSPEED STUDIOS
Research Intern · Topic: Multimodal Large Language Models · Mentor: Shengju Qian.
July 2024 - June 2025
Awards
M Award, Mathematical Contest in Modeling 2024
Huawei Scholarship, Central University of Finance and Economics 2023
First-Class Scholarship for Comprehensive Development, Central University of Finance and Economics 2022 and 2023
Outstanding Academic Scholarship, Central University of Finance and Economics 2023
