Publications

Image Generation

Multimodal Understanding

FOVIS: Foveated Vision for Ultra-High-Resolution Remote Sensing Reasoning thumbnail

FOVIS: Foveated Vision for Ultra-High-Resolution Remote Sensing Reasoning

Y. Zhou*, Chengjie Jiang*, H. Zheng, X. Wang, S. Xu, Z. Long, L. Shi, X. Fan, C. Yuan†

Under Review

A foveated vision approach for ultra-high-resolution remote sensing reasoning, dynamically selecting key regions for fine-grained attention.

Remote Sensing Reasoning Ultra-High Resolution Foveated Attention
GRASP: Geospatial Pixel Reasoning via Structured Policy Learning thumbnail

GRASP: Geospatial Pixel Reasoning via Structured Policy Learning

Chengjie Jiang, Y. Zhou, J. Yan, J. Li†, J. Li, Y. Zhou, H. He, J. Li

Arxiv 2025

A structured policy learning framework for geospatial pixel reasoning, improving language-to-pixel segmentation generalization with reduced reliance on dense mask supervision.

Remote Sensing Geospatial Pixel Reasoning Structured Policy Learning
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments thumbnail

EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

M. Lin*, W. Huang*, Y. Li, Chengjie Jiang, K. Wu, F. Zhong, S. Qian†, X. Wang, X. Qi†

Arxiv 2025

A multimodal embodied reasoning benchmark for evaluating models across exploration, dynamic spatial-semantic reasoning, and multi-step task execution.

Embodied AI VLA Benchmark Dynamic Spatial Reasoning Multi-Step Action