[Diffusion] add GR00T-N1.7 pipeline with OpenPI serving#3798
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
@Yuxi1000 ptal |
|
Quick review noted. CI checks look good. |
|
Heyyy all things perfect. Some small, non-critical points I found.
It looks like line 175 in I noticed that Thank you for your time. |
|
Hi @Yuxi1000, thanks for the review. I have updated Here is how to set it up: 1. Clone MolmoSpaces and add the policy bridge git clone https://github.com/allenai/molmospaces.git
cd molmospaces
mkdir -p examples/gr00t_openpi
curl -L https://gist.github.com/timzsu/a8ac09797fc3fa29ff1a7af84a48a742/raw/gr00t_openpi_policy.py \
-o examples/gr00t_openpi/gr00t_openpi_policy.py
touch examples/gr00t_openpi/__init__.py2. Install dependencies and download simulation assets uv venv .venv
uv pip install -e ".[mujoco]" && uv pip install openpi-client websocketsSet a cache directory and trigger the download of MuJoCo scenes and benchmark JSONs: export MLSPACES_ASSETS_DIR=$HOME/.cache/molmospaces/gr00t-assets
uv run --no-sync python -m molmo_spaces.molmo_spaces_constants
uv run --no-sync python -c "from molmo_spaces.molmo_spaces_constants import get_resource_manager; get_resource_manager().install_all_for_data_type('benchmarks')"3. Run the eval I used the BENCH=$MLSPACES_ASSETS_DIR/benchmarks/molmospaces-bench-v1/procthor-10k/FrankaPickDroidMiniBench/FrankaPickDroidMiniBench_json_benchmark_20251231
VLLM_OMNI=/path/to/vllm-omni
MUJOCO_GL=egl \
PYOPENGL_PLATFORM=egl \
MUJOCO_EGL_DEVICE_ID=<physical-GPU-index> \
PYTHONPATH=$PWD uv run --no-sync python $VLLM_OMNI/examples/online_serving/gr00t/molmospace_gr00t_eval_demo.py \
--host 127.0.0.1 \
--port 8000 \
--benchmark_dir "$BENCH" \
--output_dir outputs/gr00t/molmospaces \
--max_episodes 1 \
--task_horizon_steps 2404. Reading results The script prints a summary line to stdout on exit: Inside that timestamped directory MolmoSpaces writes one |
|
Thank for your reply. I cloned |
|
Sorry, it was a local file that I forgot to commit. Please find it on my gist (https://gist.github.com/timzsu/a8ac09797fc3fa29ff1a7af84a48a742/raw/gr00t_openpi_policy.py). I have updated the instructions above. |
|
Thank you again for sharing the With the current GitHub + gist setup, I ran 10 episodes on FrankaPickDroidMiniBench and got: These 10 episodes are simply the first 10 episodes from the benchmark, with I uploaded two of the rollouts for a visual reference: episode_00000000_exo_camera_1_batch_1_of_1_2.mp4episode_00000000_exo_camera_1_batch_1_of_1.mp4In both videos the arm does move toward the target in roughly the right direction, but it never quite reaches or grasps the object, so the episodes end up as failures. To help narrow things down, could you confirm whether the code you’re running locally (MolmoSpaces branch + gr00t_openpi_policy.py) is exactly the same as what’s currently on GitHub + the gist? Also, roughly what success rate do you see on I’m happy to help debug on my side (e.g., checking observations, video/state wiring, etc.), it would also be great to know what behavior you’re seeing locally so we can tell whether this is a reproducibility gap or an actual regression. |
|
Hi @Yuxi1000, I haven't run it against multiple episodes. I will try recently and let you know whether I can reproduce the same failure. |
|
Hi @Yuxi1000, I have reproduced the failure locally, so I think our setup is likely to be the same. After some debugging, I also found that the gripper never closes ( |
|
Hi @timzsu, thanks for the confirmation. |
| position_ids = position_ids[1:] | ||
| else: | ||
| text_position_ids = position_ids[0] | ||
|
|
There was a problem hiding this comment.
Was InternVLA e2e tested with these adapter changes? create_causal_mask signature changed (param rename input_embeds → inputs_embeds) and cache_position kwarg removed. Also @check_model_inputs removed from Qwen3VLTextModel.forward.
| self, | ||
| config: Gr00tN1d7Config, | ||
| transformers_loading_kwargs: dict = {"trust_remote_code": True}, | ||
| ): |
There was a problem hiding this comment.
dict = {"trust_remote_code": True} as default arg — evaluated once at definition time, shared across all calls. dict = None with a None-guard in the body is safer.
|
|
||
|
|
||
| class BasicDataCollator: | ||
| def __call__(self, features: list[dict[str, Any]]) -> dict[str, torch.Tensor]: |
There was a problem hiding this comment.
BasicDataCollator is exported in __all__ and re-exported through dataio/collator/__init__.py, but nothing imports or instantiates it. Gr00tN1d7Processor uses Gr00tN1d7DataCollator from processing_gr00t_n1d7.py instead. Dead code.
|
|
||
|
|
||
| def get_gr00t_n1d7_post_process_func(od_config: OmniDiffusionConfig): | ||
| del od_config |
There was a problem hiding this comment.
del od_config + identity return. Registered in the post-process table but this is a complete no-op. If GR00T never needs post-processing, drop the registration and let the engine skip it.
| return () | ||
|
|
||
| def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]) -> set[str]: | ||
| for _ in weights: |
There was a problem hiding this comment.
Iterates and discards every weight. With weights_sources = () the engine shouldn't call this, but if it ever does, weights silently vanish. At minimum log a warning.
47e3b14 to
439ab31
Compare
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
… for GR00T-N1.7 Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
439ab31 to
9f5e89a
Compare
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
4145111 to
eb3b6a9
Compare
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Purpose
Address #3553. Adds NVIDIA GR00T-N1.7 as a vLLM-Omni robot policy pipeline that consumes observations from the OpenPI realtime endpoint (added in #3673) and returns action chunks. Lands the model port, deploy config, processor/registry wiring, tests, and user-facing docs.
Test Plan
Unit / e2e tests added (run with the repo's standard pytest commands):
Qualitative integration test:
Test Result
Unit / e2e tests
All three pytest files above pass (transformers 5.8.1).
Qualitative DROID pick rollout
In the rollout, the robot successfully accomplishes the task.
episode_00000000_wrist_camera_batch_1_of_1.mp4
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)