VADAR: Visual Agentic AI for Spatial Reasoning with a Dynamic API

This is the code for the paper Visual Agentic AI for Spatial Reasoning with a Dynamic API by Damiano Marsili, Rohun Agrawal, Yisong Yue and Georgia Gkioxari.

Project Page | Paper | Dataset | BibTeX

Quickstart

Clone the repo:

git clone https://github.com/damianomarsili/VADAR.git

Setup environment and download models:

cd VADAR
python -m venv venv
source venv/bin/activate
sh setup.sh
echo YOUR_OPEN_API_KEY > api.key

Note: This setup assumes CUDA 12.2 and Python 3.10. If using a different version of CUDA, replace the --index-url in setup.sh with a CUDA runtime that is compatible with your CUDA version. For example, for CUDA 11.8, replace with --index-url https://download.pytorch.org/whl/cu118.

VADAR uses SAM2, UniDepth and GroundingDINO.

For a quick exploration of VADAR's functionality, we have compiled a notebook demo-notebook/quickstart.ipynb. For evaluating on larger datasets, please refer to the "Evaluating VADAR" section below.

Omni3D-Bench

Omni3D-Bench contains 500 (image, question, answer) tuples of diverse real-world scenes sourced from Omni3D. The dataset is released under the Creative Commons Non-Commercial license. View samples from the dataset here.

Downloading the Benchmark

Omni3D-Bench is hosted on HuggingFace. The benchmark can be accessed with the following code:

from datasets import load_dataset
dataset = load_dataset("dmarsili/Omni3D-Bench")

Additionally, a .zip of the dataset can be downloaded at the above link.

Annotations

Samples in Omni3D-Bench consist of images, questions, and ground-truth answers. The annotations can be loaded as a python dictonary with the following format:

<!-- annotations.json -->
{
    "questions": [
        {
            "image_index"               : str, image ID
            "question_index"            : str, question ID
            "image"                     : PIL Image, image for query
            "question"                  : str, query
            "answer_type"               : str, expected answer type - {int, float, str}
            "answer"                    : str|int|float, ground truth response to the query
        },
        {
            ...
        },
        ...
    ]
}

Evaluating VADAR

Both Omni3D-Bench and the subset of CLEVR used in the paper can be downloaded with:

sh download_data.sh

You can use a custom dataset by placing it in the data directory. Your dataset folder should contain an images folder and an annotations.json in the format specified in the "Omni3D-Bench" section above.

To evaluate VADAR, run the following code:

python evaluate.py --annotations-json data/[DATASET_NAME]/annotations.json --image-pth data/[DATASET_NAME]/images/

Note: If evaluating VADAR on the CLEVR or GQA datasets, add the additional --dataset clevr OR gqa tag. If omitted, the prompts and API for Omni3D-Bench will be used.

The evaluation script will produce the following files:

results/[timestamp]/
├── signature_generator # signatures generated by Signature Agent
│   ├── image_1_question_2.html        
│   ├── image_5_question_8.html 
│   ├── image_9_question_14.html 
│   └── ...
├── api_generator # method implementations generated by API Agent
│   ├── method_1
│   │   ├── executable_program.py   # python implementation of method
│   │   └── result.json             # Unit test result
│   ├── method_2
│   │   ├── executable_program.py   # python implementation of method
│   │   └── result.json             # Unit test result
│   ├── ...    
│   └── api.json                    # JSON of generated API.
├── program_generator # programs generated by Program Agent
│   ├── image_0_question_0.html        
│   ├── image_0_question_1.html 
│   ├── image_1_question_2.html 
│   ├── ...
│   └── programs.json               # JSON of generated programs.
├── program_execution # execution log of programs. (TODO: dami here)
│   ├── image_0_questions_0
│   │   ├── executable_program.py   # python implementation of query solution
│   │   ├── result.json             # JSON of program output
│   │   └── trace.html              # Visualization of output trace.
│   ├── image_0_questions_1
│   │   ├── executable_program.py   # python implementation of query solution
│   │   ├── result.json             # JSON of program output
│   │   └── trace.html              # Visualization of output trace.
│   ├── ...    
│   └── execution.json              # JSON of experiment execution
├── execution.csv                   # CSV with full execution log.
└── results.txt                     # Summarized Results

Results

See RESULTS.md for detailed VADAR performance on Omni3D-Bench, CLEVR, and GQA, as well as comparison with other methods.

Citation

If you use VADAR or the Omni3D-Bench dataset in your research, please use the following BibTeX entry.

@misc{marsili2025visualagenticaispatial,
      title={Visual Agentic AI for Spatial Reasoning with a Dynamic API}, 
      author={Damiano Marsili and Rohun Agrawal and Yisong Yue and Georgia Gkioxari},
      year={2025},
      eprint={2502.06787},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.06787}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
agents		agents
demo-notebook		demo-notebook
docs		docs
engine		engine
prompts		prompts
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
RESULTS.md		RESULTS.md
download_data.sh		download_data.sh
evaluate.py		evaluate.py
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VADAR: Visual Agentic AI for Spatial Reasoning with a Dynamic API

Project Page | Paper | Dataset | BibTeX

Quickstart

Omni3D-Bench

Downloading the Benchmark

Annotations

Evaluating VADAR

Results

Citation

About

Releases

Packages

Contributors 2

Languages

License

damianomarsili/VADAR

Folders and files

Latest commit

History

Repository files navigation

VADAR: Visual Agentic AI for Spatial Reasoning with a Dynamic API

Project Page | Paper | Dataset | BibTeX

Quickstart

Omni3D-Bench

Downloading the Benchmark

Annotations

Evaluating VADAR

Results

Citation

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages