Skip to content
/ LAMBDA Public

A benchmark consisting of language-conditioned long-horizon (room-to-room and floor-to-floor navigation) MoMa tasks that are paired with human-collected expert demonstrations from both simulated and real-world environments.

License

Notifications You must be signed in to change notification settings

h2r/LAMBDA

Repository files navigation

LAMBDA ($\lambda$) Benchmark

Under Review
Website | arXiv | Dataset | Data Card

Sequential timesteps of images from sim and real collected robot trajectories along with the natural language command describing the task.

Learning to execute long-horizon mobile manipulation tasks is crucial for advancing robotics in household and workplace settings. However, current approaches are typically data-inefficient, underscoring the need for improved models that require realistically sized benchmarks to evaluate their efficiency. To address this, we introduce the LAMBDA (λ) benchmark––Long-horizon Actions for Mobile-manipulation Benchmarking of Directed Activities––which evaluates the data efficiency of models on language-conditioned, long-horizon, multi-room, multi-floor, pick-and-place tasks using a dataset of manageable size, more feasible for collection. Our benchmark includes 571 human-collected demonstrations that provide realism and diversity in simulated and real-world settings. Unlike planner-generated data, these trajectories offer natural variability and replay-verifiability, ensuring robust learning and evaluation. We leverage LAMBDA to benchmark current end-to-end learning methods and a modular neuro-symbolic approaches that combines foundation models with task and motion planning. We find that end-to-end methods—even when pretrained—yield lower success rates, while neuro-symbolic methods perform significantly better and require less data.

Dataset Format

More detailed dataset information can be found in the dataset card DataCard.md.

Download the dataset from this DropBox.

Code that opens, reads, and displays the dataset contents can be found in this Google Colab notebook.

Sim Dataset

The simulation dataset comes in a single hdf5 file, and has the following hierarchy:

sim_dataset.hdf5/
├── data_11:11:28/
│   ├── folder_0
│   ├── folder_1
│   └── folder_2
├── data_11:14:08/
│   ├── folder_0
│   └── ...
└── ...

Under each folder, there are three main numpy files: depth_<num>, inst_seg_<num>, and rgb_<num>, which correspond to the depth image, segmentation image, and rgb image, respectively.

Under the metadata for each folder, there is a dumped json describing other metadata of each time step. The detailed metadata can be found in the dataset card.

Real Dataset

Similarly, the real dataset also comes in a single hdf5 file, and has the following hierarchy:

real_dataset.hdf5/
└── FloorTrajectories/
    ├── data_00/
    │   ├── folder_10/
    │   │   ├── gripper_depth_10
    │   │   ├── gripper_image_10
    │   │   ├── left_fisheye_depth_10
    │   │   ├── left_fisheye_image_10
    │   │   ├── right_fisheye_depth_10
    │   │   ├── right_fisheye_image_10
    │   │   └── metadata
    │   └── folder_11/
    │       ├── gripper_depth_10
    │       ├── gripper_image_10
    │       └── ...
    ├── data_01/
    │   └── folder_10/
    │       └── ...
    └── ...

Note that the right fisheye is located on the right side of the robot, but points towards the left side. So the right fisheye produces the left half of the image, and the left one produces the right half.

The images have the following sizes:

key shape
gripper_depth_10 (480, 640)
gripper_image_10 (480, 640, 3)
left_fisheye_depth_10 (240, 424)
left_fisheye_image_10 (640, 480, 3)
right_fisheye_depth_10 (240, 424)
right_fisheye_image_10 (640, 480, 3)

The detailed metadata can be found in the dataset card.

Running Data Collection

Simulation (AI2THOR)

  1. cd collect_sim
  2. python install -r sim_reqs.txt
  3. cd custom_ai2thor_lib_code
  4. Move the files to the ai2thor library folder in the virtual environment
  5. Collect data python mani.py --scene "<scene number>" --command "<natural language command>". Use the following keys to move in the simulator:
  • WASD: moving the robot base
  • J/L: rotate the robot left/right
  • I/K: moving the robot head up/down
  • G: grasp
  • R: release
  • Up arrow/down arrow: move robot shoulder up/down
  • 7/4: move end-effector left/right
  • 8/5 move end-effector up/down
  • 9/6 move end-effector forward/backward
  • Q: end collection and save data
  • CTRL+C: restart collection without saving

Real (Spot)

  1. cd collect_real
  2. conda create --name <env> --file spot_env.txt
  3. Create a map using python record_env_graph.py. See this for more details on how to record the map.
  4. Collect data using the map python collect_spot_data.py -u <map folder> -t "<natural language command>"

BibTeX

   @misc{lambdabenchmark,
      title={{\lambda}: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics}, 
      author={Ahmed Jaafar and Shreyas Sundara Raman and Yichen Wei and Sudarshan Harithas and Sofia Juliani and Anneke Wernerfelt and Benedict Quartey and Ifrah Idrees and Jason Xinyu Liu and Stefanie Tellex},
      year={2025},
      eprint={2412.05313},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2412.05313}, 
    }

About

A benchmark consisting of language-conditioned long-horizon (room-to-room and floor-to-floor navigation) MoMa tasks that are paired with human-collected expert demonstrations from both simulated and real-world environments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published