[Diffusion] add GR00T-N1.7 pipeline with OpenPI serving by timzsu · Pull Request #3798 · vllm-project/vllm-omni

timzsu · 2026-05-21T14:07:25Z

Purpose

Address #3553. Adds NVIDIA GR00T-N1.7 as a vLLM-Omni robot policy pipeline that consumes observations from the OpenPI realtime endpoint (added in #3673) and returns action chunks. Lands the model port, deploy config, processor/registry wiring, tests, and user-facing docs.

Test Plan

Unit / e2e tests added (run with the repo's standard pytest commands):

pytest -xvs tests/diffusion/models/gr00t/test_pipeline.py
pytest -xvs tests/entrypoints/openai_api/test_openpi_serving.py
pytest -xvs tests/e2e/online_serving/test_gr00t_openpi.py

Qualitative integration test:

# Terminal A — start vllm-omni GR00T server
VLLM_WORKER_MULTIPROC_METHOD=spawn \
  uv run --no-sync --with openpi-client \
    vllm serve nvidia/GR00T-N1.7-3B \
      --omni \
      --stage-configs-path vllm_omni/deploy/Gr00tN1d7.yaml \
      --host 127.0.0.1 --port 8000 --disable-log-stats

# Terminal B — run a MuJoCo DROID pick rollout against it
export PYTHONPATH=$PWD MUJOCO_GL=egl PYOPENGL_PLATFORM=egl \
       MLSPACES_ASSETS_DIR=$HOME/.cache/molmospaces/gr00t-assets \
       BENCHMARK_DIR=/tmp/gr00t_molmospaces_droid_benchmark
.venv/bin/python -m molmo_spaces.evaluation.eval_main \
  examples.gr00t_openpi.gr00t_openpi_policy:Gr00tOpenPIEvalConfig \
  --benchmark_dir "$BENCHMARK_DIR" \
  --idx 0 --max_episodes 1 --task_horizon_steps 120 \
  --output_dir /tmp/gr00t_eval --no_wandb

Test Result

Unit / e2e tests

All three pytest files above pass (transformers 5.8.1).

Qualitative DROID pick rollout

In the rollout, the robot successfully accomplishes the task.

episode_00000000_wrist_camera_batch_1_of_1.mp4

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector · 2026-05-21T14:07:33Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

linyueqian · 2026-05-21T14:49:11Z

@Yuxi1000 ptal

hsliuustc0106 · 2026-05-21T21:09:16Z

Quick review noted. CI checks look good.

Yuxi1000 · 2026-05-23T18:42:17Z

Heyyy all things perfect. Some small, non-critical points I found.
1.
When running the test_gr00t_openpi.py, I hit the following error:

TypeError: create_causal_mask() got an unexpected keyword argument 'cache_position'

It looks like line 175 in adapter_qwen3_vl.py calls create_causal_mask with a cache_position argument.
I think this would happen if transformers==5.9.0 is being used when user is using vllm == 0.21.0 (transformer == 5.8.1 is working well)
that function signature no longer accepts this parameter, which causes the e2e test to fail.

I noticed that molmospace_gr00t_eval_demo.py imports Gr00tOpenPIEvalConfig / Gr00tOpenPIPolicyConfig from examples.gr00t_openpi.gr00t_openpi_policy, which appears to live in a MolmoSpaces workspace rather than this repo. I wasn't able to locate a public version of that module, so I haven't been able to run the full MolmoSpaces rollout on my end.

Thank you for your time.

timzsu · 2026-05-24T13:20:41Z

Hi @Yuxi1000, thanks for the review. I have updated adapter_qwen3_vl.py to not pass the argument as it is legacy. For the second concern, molmospace_gr00t_eval_demo.py depends on MolmoSpaces, a public evaluation framework. I have run the rollout locally against the vLLM-Omni GR00T server and posted the generated video in the PR description.

Here is how to set it up:

1. Clone MolmoSpaces and add the policy bridge

git clone https://github.com/allenai/molmospaces.git
cd molmospaces
mkdir -p examples/gr00t_openpi
curl -L https://gist.github.com/timzsu/a8ac09797fc3fa29ff1a7af84a48a742/raw/gr00t_openpi_policy.py \
  -o examples/gr00t_openpi/gr00t_openpi_policy.py
touch examples/gr00t_openpi/__init__.py

2. Install dependencies and download simulation assets

uv venv .venv
uv pip install -e ".[mujoco]" && uv pip install openpi-client websockets

Set a cache directory and trigger the download of MuJoCo scenes and benchmark JSONs:

export MLSPACES_ASSETS_DIR=$HOME/.cache/molmospaces/gr00t-assets
uv run --no-sync python -m molmo_spaces.molmo_spaces_constants
uv run --no-sync python -c "from molmo_spaces.molmo_spaces_constants import get_resource_manager; get_resource_manager().install_all_for_data_type('benchmarks')"

3. Run the eval

I used the FrankaPickDroidMiniBench benchmark, and launched the vllm-omni server at localhost:8000.

BENCH=$MLSPACES_ASSETS_DIR/benchmarks/molmospaces-bench-v1/procthor-10k/FrankaPickDroidMiniBench/FrankaPickDroidMiniBench_json_benchmark_20251231
VLLM_OMNI=/path/to/vllm-omni
MUJOCO_GL=egl \
PYOPENGL_PLATFORM=egl \
MUJOCO_EGL_DEVICE_ID=<physical-GPU-index> \
PYTHONPATH=$PWD uv run --no-sync python $VLLM_OMNI/examples/online_serving/gr00t/molmospace_gr00t_eval_demo.py \
  --host 127.0.0.1 \
  --port 8000 \
  --benchmark_dir "$BENCH" \
  --output_dir outputs/gr00t/molmospaces \
  --max_episodes 1 \
  --task_horizon_steps 240

4. Reading results

The script prints a summary line to stdout on exit:

[eval] success=1/1 (100.0%)
[eval] output_dir=outputs/gr00t/molmospaces/Gr00tVllmOmniEvalConfig/<timestamp>

Inside that timestamped directory MolmoSpaces writes one .mp4 per camera per
episode, e.g. house_0/episode_00000000_wrist_camera_batch_1_of_1.mp4 and
house_0/episode_00000000_exo_camera_1_batch_1_of_1.mp4. Note that a single episode may fail due to stochasticity.

Yuxi1000 · 2026-05-24T17:58:03Z

Thank for your reply. I clonedallenai/molmospacesbut examples/gr00t_openpi/gr00t_openpi_policy.py doesn't seem to exist in the current main branch — only examples/add_robot/ is present. Could you point to the specific branch or commit that includes the gr00t_openpi module?

timzsu · 2026-05-25T04:42:25Z

Sorry, it was a local file that I forgot to commit. Please find it on my gist (https://gist.github.com/timzsu/a8ac09797fc3fa29ff1a7af84a48a742/raw/gr00t_openpi_policy.py). I have updated the instructions above.

Yuxi1000 · 2026-05-25T15:19:37Z

Thank you again for sharing thegr00t_openpi_policy.pygist and the updated setup steps and I was able to wire everything up and run the MolmoSpaces eval with it.

With the current GitHub + gist setup, I ran 10 episodes on FrankaPickDroidMiniBench and got:


text
[eval] success=0/10 (0.0%)

These 10 episodes are simply the first 10 episodes from the benchmark, with max_episodes=10 and samples_per_house=2. They cover 7 different houses:


house_0 (val_0_ceiling.xml): “pick up the kettle”

house_1 (val_1_ceiling.xml): “pick up the remote control”

house_10 (val_10_ceiling.xml): “pick up the ladle”; “pick up the tissue”

house_100 (val_100_ceiling.xml): “pick up the tissue”

house_101 (val_101_ceiling.xml): “pick up the spoon”; “pick up the spatula”

house_102 (val_102_ceiling.xml): “pick up the spoon”; “pick up the kettle”

house_104 (val_104_ceiling.xml): “pick up the mug”

I uploaded two of the rollouts for a visual reference:

episode_00000000_exo_camera_1_batch_1_of_1_2.mp4

episode_00000000_exo_camera_1_batch_1_of_1.mp4

In both videos the arm does move toward the target in roughly the right direction, but it never quite reaches or grasps the object, so the episodes end up as failures.

To help narrow things down, could you confirm whether the code you’re running locally (MolmoSpaces branch + gr00t_openpi_policy.py) is exactly the same as what’s currently on GitHub + the gist? Also, roughly what success rate do you see on FrankaPickDroidMiniBench over multiple episodes with this setup?

I’m happy to help debug on my side (e.g., checking observations, video/state wiring, etc.), it would also be great to know what behavior you’re seeing locally so we can tell whether this is a reproducibility gap or an actual regression.

timzsu · 2026-05-25T15:23:17Z

Hi @Yuxi1000, I haven't run it against multiple episodes. I will try recently and let you know whether I can reproduce the same failure.

timzsu · 2026-05-26T11:17:54Z

Hi @Yuxi1000, I have reproduced the failure locally, so I think our setup is likely to be the same. After some debugging, I also found that the gripper never closes (gripper_position stays near zero). I also reproduced the same failure on the official codebase (https://github.com/Nvidia/Isaac-GR00T), so the vllm-omni integration is likely not the issue. Can you help take a look?

Yuxi1000 · 2026-05-27T02:56:03Z

Hi @timzsu, thanks for the confirmation.
If my memory goes correctly, GR00t needs post-training and really depends on post-training. My impression is that the 0/10 result is likely because the model hasn’t been post‑trained for this specific benchmark, especially around the gripper channel.
Agree with that vllm-omni is likely not the issue. Everything looks great!

hsliuustc0106 · 2026-05-29T05:56:04Z

            position_ids = position_ids[1:]
        else:
            text_position_ids = position_ids[0]



Was InternVLA e2e tested with these adapter changes? create_causal_mask signature changed (param rename input_embeds → inputs_embeds) and cache_position kwarg removed. Also @check_model_inputs removed from Qwen3VLTextModel.forward.

hsliuustc0106 · 2026-05-29T05:56:05Z

+        self,
+        config: Gr00tN1d7Config,
+        transformers_loading_kwargs: dict = {"trust_remote_code": True},
+    ):


dict = {"trust_remote_code": True} as default arg — evaluated once at definition time, shared across all calls. dict = None with a None-guard in the body is safer.

hsliuustc0106 · 2026-05-29T06:01:17Z

+
+
+class BasicDataCollator:
+    def __call__(self, features: list[dict[str, Any]]) -> dict[str, torch.Tensor]:


BasicDataCollator is exported in __all__ and re-exported through dataio/collator/__init__.py, but nothing imports or instantiates it. Gr00tN1d7Processor uses Gr00tN1d7DataCollator from processing_gr00t_n1d7.py instead. Dead code.

hsliuustc0106 · 2026-05-29T06:01:19Z

+
+
+def get_gr00t_n1d7_post_process_func(od_config: OmniDiffusionConfig):
+    del od_config


del od_config + identity return. Registered in the post-process table but this is a complete no-op. If GR00T never needs post-processing, drop the registration and let the engine skip it.

hsliuustc0106 · 2026-05-29T06:01:20Z

+        return ()
+
+    def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]) -> set[str]:
+        for _ in weights:


Iterates and discards every weight. With weights_sources = () the engine shouldn't call this, but if it ever does, weights silently vanish. At minimum log a warning.

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

… for GR00T-N1.7 Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

timzsu requested review from Gaohan123, Isotr0py, RuixiangMa, SamitHuang, ZJY0516, ZeldaHuang, david6666666, hsliuustc0106, linyueqian, lishunyang12, princepride, tzhouam, wtomin, yenuo26, yuanheng-zhao and ywang96 as code owners May 21, 2026 14:07

This was referenced May 21, 2026

[Entrypoint] Add realtime OpenPI robot serving API #3673

Merged

[RFC]: Integrate NVIDIA Isaac GR00T #3553

Open

TKONIY mentioned this pull request May 22, 2026

[RFC]: World Model Support #1987

Open

15 tasks

hsliuustc0106 added ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI labels May 29, 2026

hsliuustc0106 reviewed May 29, 2026

View reviewed changes

hsliuustc0106 removed ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI labels May 29, 2026

timzsu force-pushed the zsu/gr00t-pipeline branch from 47e3b14 to 439ab31 Compare June 1, 2026 07:28

timzsu added 4 commits June 1, 2026 14:35

feat(gr00t): add GR00T-N1.7 pipeline with OpenPI serving

e2e2195

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

feat(gr00t): add OpenPI client, MolmoSpaces eval, and server launcher…

f4d1c41

… for GR00T-N1.7 Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

remove the legacy cache_position argument

bb2964a

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

Trim training-related code.

9f5e89a

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

timzsu force-pushed the zsu/gr00t-pipeline branch from 439ab31 to 9f5e89a Compare June 1, 2026 07:37

timzsu added 3 commits June 1, 2026 19:03

fix(gr00t): deterministic inference, vLLM FA, e2e precision test

f840a4e

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

refactor(gr00t): trim training infra and flatten configs

257ac6f

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

feat(gr00t): clean up GR00T-N1.7 serving code

eb3b6a9

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

timzsu force-pushed the zsu/gr00t-pipeline branch from 4145111 to eb3b6a9 Compare June 1, 2026 16:09

trim docstrings

18fd4be

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

timzsu requested a review from hsliuustc0106 June 2, 2026 03:20

yicwang mentioned this pull request Jun 4, 2026

[New Model]: Add π0 / π0.5 VLA model support #4136

Open

1 task



		class BasicDataCollator:
		def __call__(self, features: list[dict[str, Any]]) -> dict[str, torch.Tensor]:



		def get_gr00t_n1d7_post_process_func(od_config: OmniDiffusionConfig):
		del od_config

Conversation

timzsu commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Unit / e2e tests

Qualitative DROID pick rollout

Uh oh!

chatgpt-codex-connector Bot commented May 21, 2026

Uh oh!

linyueqian commented May 21, 2026

Uh oh!

hsliuustc0106 commented May 21, 2026

Uh oh!

Yuxi1000 commented May 23, 2026

Uh oh!

timzsu commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yuxi1000 commented May 24, 2026

Uh oh!

timzsu commented May 25, 2026

Uh oh!

Yuxi1000 commented May 25, 2026

Uh oh!

timzsu commented May 25, 2026

Uh oh!

timzsu commented May 26, 2026

Uh oh!

Yuxi1000 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

timzsu commented May 21, 2026 •

edited

Loading

timzsu commented May 24, 2026 •

edited

Loading

Yuxi1000 commented May 27, 2026 •

edited

Loading