Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration

Yue Zhou^1,2 Litong Feng³ Mengcheng Lan² Xue Yang⁴ Qingyun Li⁵ Yiping Ke² Jiang Xue⁴ Wayne Zhang³

¹East China Normal University ²Nanyang Technological University ³SenseTime Research ⁴Shanghai Jiaotong University ⁵Harbin Institute of Technology

📢 Latest Updates

[2025.09.15] We released the benchmark and evaluation code.
[2025.09.08] Accepted by ISPRS JPRS.

Abstract

Mathematical reasoning is critical for tasks such as precise distance and area computations, trajectory estimations, and spatial analysis in unmanned aerial vehicle (UAV) based remote sensing, yet current vision-language models (VLMs) have not been adequately tested in this domain. To address this gap, we introduce \dataset, the first benchmark to rigorously evaluate multimodal mathematical reasoning in aerial vehicle imagery, moving beyond simple counting tasks to include domain-specific knowledge in areas such as geometry, logic, and algebra. The dataset comprises 3,773 high-quality vehicle-related questions captured from UAV views, covering 6 mathematical subjects and 20 topics. The data, collected at varying altitudes and from multiple UAV angles, reflects real-world UAV scenarios, ensuring the diversity and complexity of the constructed mathematical problems. In this paper, we benchmark 14 prominent VLMs through a comprehensive evaluation and demonstrate that, despite their success on previous multimodal benchmarks, these models struggle with the reasoning tasks in \dataset. Our detailed analysis highlights significant limitations in the mathematical reasoning capabilities of current VLMs and suggests avenues for future research. Furthermore, we explore the use of Chain-of-Thought prompting and fine-tuning techniques, which show promise in addressing the reasoning challenges in \dataset. Our findings not only expose the limitations of VLMs in mathematical reasoning but also offer valuable insights for advancing UAV-based trustworthy VLMs in real-world applications.

ARI: arithmetic, CNT: counting, ALG: algebra, STA: statistics, LOG: logic, GEO: geometry.

🏆 Contributions

Benchmark: We introduce AVI-Math, the first multimodal benchmark for mathematical reasoning in UAV imagery, covering six subjects and real-world UAV scenarios.
Analysis: We provide a comprehensive analysis, uncovering the limitations of current VLMs in mathematical reasoning and offering insights for future improvements.
Exploration: We explore the potential of Chain-of-Thought prompting and fine-tuning techniques to enhance VLM performance, providing a 215k-sample instruction set for VLMs to learn domain-specific knowledge in UAV scenarios.

💬 Benchmark

Examples of six mathematical reasoning subjects in AVI-Math.

Please download the dataset first and then refer to the code in the evaluation to infer and evaluate the score.

🔍 Analysis

Accuracy scores on the AVI-Math. AVG: average accuracy of the six subjects. FRE: free-form question, CHO: multiple choice question, T/F: true or false question. The highest scores among models in each part and overall are highlighted in blue and red. The table exclusively employs the original model weights without fine-tuning.

🚀 Exploration

Chain-of-Thought and fine-tuning results on various VLMs.

📜 Citation

@ARTICLE{zhou2025avimath,
  author={Zhou, Yue and Feng, Litong and Lan, Mengcheng and Yang, Xue and Li, Qingyun and Ke, Yiping and Jiang, Xue and Zhang, Wayne},
  journal={ISPRS Journal of Photogrammetry and Remote Sensing}, 
  title={Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration}, 
  year={2025},
  volume={},
  number={},
  pages={},
  doi={}
}

Contact

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
avi-math		avi-math
evaluation		evaluation
images		images
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration

📢 Latest Updates

Abstract

🏆 Contributions

💬 Benchmark

🔍 Analysis

🚀 Exploration

📜 Citation

Contact

About

Uh oh!

Releases

Packages

Languages

VisionXLab/avi-math

Folders and files

Latest commit

History

Repository files navigation

Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration

📢 Latest Updates

Abstract

🏆 Contributions

💬 Benchmark

🔍 Analysis

🚀 Exploration

📜 Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages