LAMBDA ($\lambda$) Benchmark

Under Review
Website | arXiv | Dataset | Data Card

Learning to execute long-horizon mobile manipulation tasks is crucial for advancing robotics in household and workplace settings. However, current approaches are typically data-inefficient, underscoring the need for improved models that require realistically sized benchmarks to evaluate their efficiency. To address this, we introduce the LAMBDA (λ) benchmark––Long-horizon Actions for Mobile-manipulation Benchmarking of Directed Activities––which evaluates the data efficiency of models on language-conditioned, long-horizon, multi-room, multi-floor, pick-and-place tasks using a dataset of manageable size, more feasible for collection. Our benchmark includes 571 human-collected demonstrations that provide realism and diversity in simulated and real-world settings. Unlike planner-generated data, these trajectories offer natural variability and replay-verifiability, ensuring robust learning and evaluation. We leverage LAMBDA to benchmark current end-to-end learning methods and a modular neuro-symbolic approaches that combines foundation models with task and motion planning. We find that end-to-end methods—even when pretrained—yield lower success rates, while neuro-symbolic methods perform significantly better and require less data.

Dataset Format

More detailed dataset information can be found in the dataset card DataCard.md.

Download the dataset from this DropBox.

Code that opens, reads, and displays the dataset contents can be found in this Google Colab notebook.

Sim Dataset

The simulation dataset comes in a single hdf5 file, and has the following hierarchy:

sim_dataset.hdf5/
├── data_11:11:28/
│   ├── folder_0
│   ├── folder_1
│   └── folder_2
├── data_11:14:08/
│   ├── folder_0
│   └── ...
└── ...

Under each folder, there are three main numpy files: depth_<num>, inst_seg_<num>, and rgb_<num>, which correspond to the depth image, segmentation image, and rgb image, respectively.

Under the metadata for each folder, there is a dumped json describing other metadata of each time step. The detailed metadata can be found in the dataset card.

Real Dataset

Similarly, the real dataset also comes in a single hdf5 file, and has the following hierarchy:

real_dataset.hdf5/
└── FloorTrajectories/
    ├── data_00/
    │   ├── folder_10/
    │   │   ├── gripper_depth_10
    │   │   ├── gripper_image_10
    │   │   ├── left_fisheye_depth_10
    │   │   ├── left_fisheye_image_10
    │   │   ├── right_fisheye_depth_10
    │   │   ├── right_fisheye_image_10
    │   │   └── metadata
    │   └── folder_11/
    │       ├── gripper_depth_10
    │       ├── gripper_image_10
    │       └── ...
    ├── data_01/
    │   └── folder_10/
    │       └── ...
    └── ...

Note that the right fisheye is located on the right side of the robot, but points towards the left side. So the right fisheye produces the left half of the image, and the left one produces the right half.

The images have the following sizes:

key	shape
gripper_depth_10	(480, 640)
gripper_image_10	(480, 640, 3)
left_fisheye_depth_10	(240, 424)
left_fisheye_image_10	(640, 480, 3)
right_fisheye_depth_10	(240, 424)
right_fisheye_image_10	(640, 480, 3)

The detailed metadata can be found in the dataset card.

Running Data Collection

Simulation (AI2THOR)

cd collect_sim
python install -r sim_reqs.txt
cd custom_ai2thor_lib_code
Move the files to the ai2thor library folder in the virtual environment
Collect data python mani.py --scene "<scene number>" --command "<natural language command>". Use the following keys to move in the simulator:

WASD: moving the robot base
J/L: rotate the robot left/right
I/K: moving the robot head up/down
G: grasp
R: release
Up arrow/down arrow: move robot shoulder up/down
7/4: move end-effector left/right
8/5 move end-effector up/down
9/6 move end-effector forward/backward
Q: end collection and save data
CTRL+C: restart collection without saving

Real (Spot)

cd collect_real
conda create --name <env> --file spot_env.txt
Create a map using python record_env_graph.py. See this for more details on how to record the map.
Collect data using the map python collect_spot_data.py -u <map folder> -t "<natural language command>"

BibTeX

   @misc{lambdabenchmark,
      title={{\lambda}: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics}, 
      author={Ahmed Jaafar and Shreyas Sundara Raman and Yichen Wei and Sudarshan Harithas and Sofia Juliani and Anneke Wernerfelt and Benedict Quartey and Ifrah Idrees and Jason Xinyu Liu and Stefanie Tellex},
      year={2025},
      eprint={2412.05313},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2412.05313}, 
    }

Name		Name	Last commit message	Last commit date
Latest commit History 334 Commits
collect_data		collect_data
dataset/dataset_transform		dataset/dataset_transform
media		media
metrics		metrics
models/main_models/rt1		models/main_models/rt1
run/rt1		run/rt1
web		web
.gitignore		.gitignore
DataCard.md		DataCard.md
LICENSE		LICENSE
README.md		README.md
compute_metric.py		compute_metric.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LAMBDA ($\lambda$) Benchmark

Dataset Format

Sim Dataset

Real Dataset

Running Data Collection

Simulation (AI2THOR)

Real (Spot)

BibTeX

About

Releases

Packages

Contributors 6

Languages

License

h2r/LAMBDA

Folders and files

Latest commit

History

Repository files navigation

LAMBDA ($\lambda$) Benchmark

Dataset Format

Sim Dataset

Real Dataset

Running Data Collection

Simulation (AI2THOR)

Real (Spot)

BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages