Skip to content

[Diffusion] add GR00T-N1.7 pipeline with OpenPI serving#3798

Open
timzsu wants to merge 8 commits into
vllm-project:mainfrom
timzsu:zsu/gr00t-pipeline
Open

[Diffusion] add GR00T-N1.7 pipeline with OpenPI serving#3798
timzsu wants to merge 8 commits into
vllm-project:mainfrom
timzsu:zsu/gr00t-pipeline

Conversation

@timzsu
Copy link
Copy Markdown
Contributor

@timzsu timzsu commented May 21, 2026

Purpose

Address #3553. Adds NVIDIA GR00T-N1.7 as a vLLM-Omni robot policy pipeline that consumes observations from the OpenPI realtime endpoint (added in #3673) and returns action chunks. Lands the model port, deploy config, processor/registry wiring, tests, and user-facing docs.

Test Plan

Unit / e2e tests added (run with the repo's standard pytest commands):

pytest -xvs tests/diffusion/models/gr00t/test_pipeline.py
pytest -xvs tests/entrypoints/openai_api/test_openpi_serving.py
pytest -xvs tests/e2e/online_serving/test_gr00t_openpi.py

Qualitative integration test:

# Terminal A — start vllm-omni GR00T server
VLLM_WORKER_MULTIPROC_METHOD=spawn \
  uv run --no-sync --with openpi-client \
    vllm serve nvidia/GR00T-N1.7-3B \
      --omni \
      --stage-configs-path vllm_omni/deploy/Gr00tN1d7.yaml \
      --host 127.0.0.1 --port 8000 --disable-log-stats

# Terminal B — run a MuJoCo DROID pick rollout against it
export PYTHONPATH=$PWD MUJOCO_GL=egl PYOPENGL_PLATFORM=egl \
       MLSPACES_ASSETS_DIR=$HOME/.cache/molmospaces/gr00t-assets \
       BENCHMARK_DIR=/tmp/gr00t_molmospaces_droid_benchmark
.venv/bin/python -m molmo_spaces.evaluation.eval_main \
  examples.gr00t_openpi.gr00t_openpi_policy:Gr00tOpenPIEvalConfig \
  --benchmark_dir "$BENCHMARK_DIR" \
  --idx 0 --max_episodes 1 --task_horizon_steps 120 \
  --output_dir /tmp/gr00t_eval --no_wandb

Test Result

Unit / e2e tests

All three pytest files above pass (transformers 5.8.1).

Qualitative DROID pick rollout

In the rollout, the robot successfully accomplishes the task.

episode_00000000_wrist_camera_batch_1_of_1.mp4

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@linyueqian
Copy link
Copy Markdown
Collaborator

@Yuxi1000 ptal

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Quick review noted. CI checks look good.

@TKONIY TKONIY mentioned this pull request May 22, 2026
15 tasks
@Yuxi1000
Copy link
Copy Markdown

Heyyy all things perfect. Some small, non-critical points I found.
1.
When running the test_gr00t_openpi.py, I hit the following error:

TypeError: create_causal_mask() got an unexpected keyword argument 'cache_position'

It looks like line 175 in adapter_qwen3_vl.py calls create_causal_mask with a cache_position argument.
I think this would happen if transformers==5.9.0 is being used when user is using vllm == 0.21.0 (transformer == 5.8.1 is working well)
that function signature no longer accepts this parameter, which causes the e2e test to fail.

I noticed that molmospace_gr00t_eval_demo.py imports Gr00tOpenPIEvalConfig / Gr00tOpenPIPolicyConfig from examples.gr00t_openpi.gr00t_openpi_policy, which appears to live in a MolmoSpaces workspace rather than this repo. I wasn't able to locate a public version of that module, so I haven't been able to run the full MolmoSpaces rollout on my end.

Thank you for your time.

@timzsu
Copy link
Copy Markdown
Contributor Author

timzsu commented May 24, 2026

Hi @Yuxi1000, thanks for the review. I have updated adapter_qwen3_vl.py to not pass the argument as it is legacy. For the second concern, molmospace_gr00t_eval_demo.py depends on MolmoSpaces, a public evaluation framework. I have run the rollout locally against the vLLM-Omni GR00T server and posted the generated video in the PR description.

Here is how to set it up:

1. Clone MolmoSpaces and add the policy bridge

git clone https://github.com/allenai/molmospaces.git
cd molmospaces
mkdir -p examples/gr00t_openpi
curl -L https://gist.github.com/timzsu/a8ac09797fc3fa29ff1a7af84a48a742/raw/gr00t_openpi_policy.py \
  -o examples/gr00t_openpi/gr00t_openpi_policy.py
touch examples/gr00t_openpi/__init__.py

2. Install dependencies and download simulation assets

uv venv .venv
uv pip install -e ".[mujoco]" && uv pip install openpi-client websockets

Set a cache directory and trigger the download of MuJoCo scenes and benchmark JSONs:

export MLSPACES_ASSETS_DIR=$HOME/.cache/molmospaces/gr00t-assets
uv run --no-sync python -m molmo_spaces.molmo_spaces_constants
uv run --no-sync python -c "from molmo_spaces.molmo_spaces_constants import get_resource_manager; get_resource_manager().install_all_for_data_type('benchmarks')"

3. Run the eval

I used the FrankaPickDroidMiniBench benchmark, and launched the vllm-omni server at localhost:8000.

BENCH=$MLSPACES_ASSETS_DIR/benchmarks/molmospaces-bench-v1/procthor-10k/FrankaPickDroidMiniBench/FrankaPickDroidMiniBench_json_benchmark_20251231
VLLM_OMNI=/path/to/vllm-omni
MUJOCO_GL=egl \
PYOPENGL_PLATFORM=egl \
MUJOCO_EGL_DEVICE_ID=<physical-GPU-index> \
PYTHONPATH=$PWD uv run --no-sync python $VLLM_OMNI/examples/online_serving/gr00t/molmospace_gr00t_eval_demo.py \
  --host 127.0.0.1 \
  --port 8000 \
  --benchmark_dir "$BENCH" \
  --output_dir outputs/gr00t/molmospaces \
  --max_episodes 1 \
  --task_horizon_steps 240

4. Reading results

The script prints a summary line to stdout on exit:

[eval] success=1/1 (100.0%)
[eval] output_dir=outputs/gr00t/molmospaces/Gr00tVllmOmniEvalConfig/<timestamp>

Inside that timestamped directory MolmoSpaces writes one .mp4 per camera per
episode, e.g. house_0/episode_00000000_wrist_camera_batch_1_of_1.mp4 and
house_0/episode_00000000_exo_camera_1_batch_1_of_1.mp4. Note that a single episode may fail due to stochasticity.

@Yuxi1000
Copy link
Copy Markdown

Thank for your reply. I clonedallenai/molmospacesbut examples/gr00t_openpi/gr00t_openpi_policy.py doesn't seem to exist in the current main branch — only examples/add_robot/ is present. Could you point to the specific branch or commit that includes the gr00t_openpi module?

@timzsu
Copy link
Copy Markdown
Contributor Author

timzsu commented May 25, 2026

Sorry, it was a local file that I forgot to commit. Please find it on my gist (https://gist.github.com/timzsu/a8ac09797fc3fa29ff1a7af84a48a742/raw/gr00t_openpi_policy.py). I have updated the instructions above.

@Yuxi1000
Copy link
Copy Markdown

Thank you again for sharing thegr00t_openpi_policy.pygist and the updated setup steps and I was able to wire everything up and run the MolmoSpaces eval with it.

With the current GitHub + gist setup, I ran 10 episodes on FrankaPickDroidMiniBench and got:


text
[eval] success=0/10 (0.0%)

These 10 episodes are simply the first 10 episodes from the benchmark, with max_episodes=10 and samples_per_house=2. They cover 7 different houses:


house_0 (val_0_ceiling.xml): “pick up the kettle”

house_1 (val_1_ceiling.xml): “pick up the remote control”

house_10 (val_10_ceiling.xml): “pick up the ladle”; “pick up the tissue”

house_100 (val_100_ceiling.xml): “pick up the tissue”

house_101 (val_101_ceiling.xml): “pick up the spoon”; “pick up the spatula”

house_102 (val_102_ceiling.xml): “pick up the spoon”; “pick up the kettle”

house_104 (val_104_ceiling.xml): “pick up the mug”

I uploaded two of the rollouts for a visual reference:

episode_00000000_exo_camera_1_batch_1_of_1_2.mp4
episode_00000000_exo_camera_1_batch_1_of_1.mp4

In both videos the arm does move toward the target in roughly the right direction, but it never quite reaches or grasps the object, so the episodes end up as failures.

To help narrow things down, could you confirm whether the code you’re running locally (MolmoSpaces branch + gr00t_openpi_policy.py) is exactly the same as what’s currently on GitHub + the gist? Also, roughly what success rate do you see on FrankaPickDroidMiniBench over multiple episodes with this setup?

I’m happy to help debug on my side (e.g., checking observations, video/state wiring, etc.), it would also be great to know what behavior you’re seeing locally so we can tell whether this is a reproducibility gap or an actual regression.

@timzsu
Copy link
Copy Markdown
Contributor Author

timzsu commented May 25, 2026

Hi @Yuxi1000, I haven't run it against multiple episodes. I will try recently and let you know whether I can reproduce the same failure.

@timzsu
Copy link
Copy Markdown
Contributor Author

timzsu commented May 26, 2026

Hi @Yuxi1000, I have reproduced the failure locally, so I think our setup is likely to be the same. After some debugging, I also found that the gripper never closes (gripper_position stays near zero). I also reproduced the same failure on the official codebase (https://github.com/Nvidia/Isaac-GR00T), so the vllm-omni integration is likely not the issue. Can you help take a look?

@Yuxi1000
Copy link
Copy Markdown

Yuxi1000 commented May 27, 2026

Hi @timzsu, thanks for the confirmation.
If my memory goes correctly, GR00t needs post-training and really depends on post-training. My impression is that the 0/10 result is likely because the model hasn’t been post‑trained for this specific benchmark, especially around the gripper channel.
Agree with that vllm-omni is likely not the issue. Everything looks great!

@hsliuustc0106 hsliuustc0106 added ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI labels May 29, 2026
position_ids = position_ids[1:]
else:
text_position_ids = position_ids[0]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was InternVLA e2e tested with these adapter changes? create_causal_mask signature changed (param rename input_embedsinputs_embeds) and cache_position kwarg removed. Also @check_model_inputs removed from Qwen3VLTextModel.forward.

self,
config: Gr00tN1d7Config,
transformers_loading_kwargs: dict = {"trust_remote_code": True},
):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict = {"trust_remote_code": True} as default arg — evaluated once at definition time, shared across all calls. dict = None with a None-guard in the body is safer.



class BasicDataCollator:
def __call__(self, features: list[dict[str, Any]]) -> dict[str, torch.Tensor]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BasicDataCollator is exported in __all__ and re-exported through dataio/collator/__init__.py, but nothing imports or instantiates it. Gr00tN1d7Processor uses Gr00tN1d7DataCollator from processing_gr00t_n1d7.py instead. Dead code.



def get_gr00t_n1d7_post_process_func(od_config: OmniDiffusionConfig):
del od_config
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

del od_config + identity return. Registered in the post-process table but this is a complete no-op. If GR00T never needs post-processing, drop the registration and let the engine skip it.

return ()

def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]) -> set[str]:
for _ in weights:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterates and discards every weight. With weights_sources = () the engine shouldn't call this, but if it ever does, weights silently vanish. At minimum log a warning.

@hsliuustc0106 hsliuustc0106 removed ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI labels May 29, 2026
@timzsu timzsu force-pushed the zsu/gr00t-pipeline branch from 47e3b14 to 439ab31 Compare June 1, 2026 07:28
timzsu added 4 commits June 1, 2026 14:35
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
… for GR00T-N1.7

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
@timzsu timzsu force-pushed the zsu/gr00t-pipeline branch from 439ab31 to 9f5e89a Compare June 1, 2026 07:37
timzsu added 3 commits June 1, 2026 19:03
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
@timzsu timzsu force-pushed the zsu/gr00t-pipeline branch from 4145111 to eb3b6a9 Compare June 1, 2026 16:09
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants