Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

publications

EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

Under Review , arXiv , 2025

A multimodal embodied reasoning benchmark for evaluating models across exploration, dynamic spatial-semantic reasoning, and multi-step task execution.

Recommended citation: M. Lin, W. Huang, Y. Li, Chengjie Jiang, K. Wu, F. Zhong, S. Qian, X. Wang, and X. Qi. EmbRACE-3K: Embodied Reasoning and Action in Complex Environments. arXiv, 2025.
Download Paper

GRASP: Geospatial Pixel Reasoning via Structured Policy Learning

Under Review , arXiv , 2025

A structured policy learning framework for geospatial pixel reasoning, improving language-to-pixel segmentation generalization with reduced reliance on dense mask supervision.

Recommended citation: Chengjie Jiang, Y. Zhou, J. Yan, J. Li, J. Li, Y. Zhou, H. He, and J. Li. GRASP: Geospatial Pixel Reasoning via Structured Policy Learning. arXiv, 2025.
Download Paper

Look Where It Matters: Training-Free Ultra-HR Remote Sensing VQA via Adaptive Zoom Search

Under Review , arXiv , 2025

A training-free pipeline for ultra-high-resolution remote sensing VQA that adaptively zooms into key regions, reducing token and memory cost while improving reasoning efficiency.

Recommended citation: Yunqi Zhou*, Chengjie Jiang*, Chun Yuan, and Jing Li. Look Where It Matters: Training-Free Ultra-HR Remote Sensing VQA via Adaptive Zoom Search. arXiv, 2025.
Download Paper

Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach

Published , IEEE Transactions on Pattern Analysis and Machine Intelligence , 2026

A unified Diffusion Transformer framework for semantic and controllable image fusion, supporting multiple fusion tasks and extending image fusion toward text-controllable fusion and multimodal segmentation.

Recommended citation: Jiayang Li*, Chengjie Jiang*, Junjun Jiang, Pengwei Liang, Jiayi Ma, and Liqiang Nie. Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach. IEEE TPAMI, 2026.
Download Paper

RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion From The Perspective of Referring Image Segmentation

Oral , ICASSP , 2026

A referring-image-segmentation perspective on text-driven infrared-visible image fusion, improving interpretability and downstream consistency.

Recommended citation: Siju Ma, Changxiyu Gong, Xiaofeng Fan, Yong Ma, and Chengjie Jiang. RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion From The Perspective of Referring Image Segmentation. ICASSP, 2026.
Download Paper

Two in One: Robust Fusion of Infrared and Visible Images in Rainy Condition

Published , JAS , 2026

A coupled image fusion and rain-removal framework that improves robustness for infrared-visible perception in rainy scenes.

Recommended citation: Jing Li, Jiafeng Yan, Chengjie Jiang, and Bin Yang. Two in One: Robust Fusion of Infrared and Visible Images in Rainy Condition. JAS, 2026.

FOVIS: Foveated Vision for Ultra-High-Resolution Remote Sensing Reasoning

Under Review , NeurIPS , 2026

A foveated vision approach for ultra-high-resolution remote sensing reasoning, dynamically selecting key regions for fine-grained attention.

Recommended citation: Y. Zhou, Chengjie Jiang, H. Zheng, X. Wang, S. Xu, Z. Long, L. Shi, X. Fan, and C. Yuan. FOVIS: Foveated Vision for Ultra-High-Resolution Remote Sensing Reasoning. Under review.