Under Review
Website |
arXiv |
Dataset |
Data Card
Learning to execute long-horizon mobile manipulation tasks is crucial for advancing robotics in household and workplace settings. However, current approaches are typically data-inefficient, underscoring the need for improved models that require realistically sized benchmarks to evaluate their efficiency. To address this, we introduce the LAMBDA (λ) benchmark––Long-horizon Actions for Mobile-manipulation Benchmarking of Directed Activities––which evaluates the data efficiency of models on language-conditioned, long-horizon, multi-room, multi-floor, pick-and-place tasks using a dataset of manageable size, more feasible for collection. Our benchmark includes 571 human-collected demonstrations that provide realism and diversity in simulated and real-world settings. Unlike planner-generated data, these trajectories offer natural variability and replay-verifiability, ensuring robust learning and evaluation. We leverage LAMBDA to benchmark current end-to-end learning methods and a modular neuro-symbolic approaches that combines foundation models with task and motion planning. We find that end-to-end methods—even when pretrained—yield lower success rates, while neuro-symbolic methods perform significantly better and require less data.
More detailed dataset information can be found in the dataset card DataCard.md.
Download the dataset from this DropBox.
Code that opens, reads, and displays the dataset contents can be found in this Google Colab notebook.
The simulation dataset comes in a single hdf5 file, and has the following hierarchy:
sim_dataset.hdf5/
├── data_11:11:28/
│ ├── folder_0
│ ├── folder_1
│ └── folder_2
├── data_11:14:08/
│ ├── folder_0
│ └── ...
└── ...
Under each folder, there are three main numpy files: depth_<num>
, inst_seg_<num>
, and rgb_<num>
,
which correspond to the depth image, segmentation image, and rgb image, respectively.
Under the metadata for each folder, there is a dumped json describing other metadata of each time step. The detailed metadata can be found in the dataset card.
Similarly, the real dataset also comes in a single hdf5 file, and has the following hierarchy:
real_dataset.hdf5/
└── FloorTrajectories/
├── data_00/
│ ├── folder_10/
│ │ ├── gripper_depth_10
│ │ ├── gripper_image_10
│ │ ├── left_fisheye_depth_10
│ │ ├── left_fisheye_image_10
│ │ ├── right_fisheye_depth_10
│ │ ├── right_fisheye_image_10
│ │ └── metadata
│ └── folder_11/
│ ├── gripper_depth_10
│ ├── gripper_image_10
│ └── ...
├── data_01/
│ └── folder_10/
│ └── ...
└── ...
Note that the right fisheye is located on the right side of the robot, but points towards the left side. So the right fisheye produces the left half of the image, and the left one produces the right half.
The images have the following sizes:
key | shape |
---|---|
gripper_depth_10 | (480, 640) |
gripper_image_10 | (480, 640, 3) |
left_fisheye_depth_10 | (240, 424) |
left_fisheye_image_10 | (640, 480, 3) |
right_fisheye_depth_10 | (240, 424) |
right_fisheye_image_10 | (640, 480, 3) |
The detailed metadata can be found in the dataset card.
cd collect_sim
python install -r sim_reqs.txt
cd custom_ai2thor_lib_code
- Move the files to the ai2thor library folder in the virtual environment
- Collect data
python mani.py --scene "<scene number>" --command "<natural language command>"
. Use the following keys to move in the simulator:
- WASD: moving the robot base
- J/L: rotate the robot left/right
- I/K: moving the robot head up/down
- G: grasp
- R: release
- Up arrow/down arrow: move robot shoulder up/down
- 7/4: move end-effector left/right
- 8/5 move end-effector up/down
- 9/6 move end-effector forward/backward
- Q: end collection and save data
- CTRL+C: restart collection without saving
cd collect_real
conda create --name <env> --file spot_env.txt
- Create a map using
python record_env_graph.py
. See this for more details on how to record the map. - Collect data using the map
python collect_spot_data.py -u <map folder> -t "<natural language command>"
@misc{lambdabenchmark,
title={{\lambda}: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics},
author={Ahmed Jaafar and Shreyas Sundara Raman and Yichen Wei and Sudarshan Harithas and Sofia Juliani and Anneke Wernerfelt and Benedict Quartey and Ifrah Idrees and Jason Xinyu Liu and Stefanie Tellex},
year={2025},
eprint={2412.05313},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2412.05313},
}