[Entrypoint] Add realtime OpenPI robot serving API by TKONIY · Pull Request #3673 · vllm-project/vllm-omni

TKONIY · 2026-05-17T20:04:03Z

This PR adds a standalone realtime OpenPI-compatible robot serving endpoint at /v1/realtime/robot/openpi.

It includes:

WebSocket msgpack request/response handling compatible with OpenPI clients.
Policy server metadata handshake from policy_server_config.
Per-connection session/reset handling.
Generic forwarding to AsyncOmni.generate() and action extraction from multimodal_output["actions"].
Unit tests for connection handling, config discovery, request construction, and generic action extraction.

This PR intentionally contains only the serving API layer and does not include DreamZero model implementation code.

chatgpt-codex-connector · 2026-05-17T20:04:08Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

timzsu

The contract looks good overall. There is one blocking issue for GR00T. PTAL :)

timzsu

LGTM. I have built #3798 on top of it to integrate GR00T.

amy-why-3459 · 2026-05-22T01:18:50Z



+@router.websocket("/v1/realtime/robot/openpi")
+async def realtime_robot_openpi(websocket: WebSocket):


Can the interface be unified to the standard OpenAI interface v1/realtime? And can different connections be used internally for different models?

These two API seems totally different except that they are all websocket with dict. Therefore I think it is not necessary to unify them.

I am not sure if I correctly understand your question. Can you explain it a little bit? For example, what is the specific case?

What's the difference? The actions are updated in runtime, and model returned the according video frames, am I right?

At lease, we should define a unfied robot API(if it really does not similer to realtime video), other world model could reuse it.

One problem is OpenAI's /realtime only supports Omni understanding (VL) and TTS. So even if we reuse that API, we still need to define extra WebSocket payload formats to it.
UX-wise, users have to learn our custom interface anyway, regardless of the endpoint. And it may confuse users about the actual/typical usage of OpenAI's /realtime.

Plus, engineering-wise, our current /realtime implementation depends on vllm's base implementation, and the extensibility is unknown. Maybe the code is more messy or the extension is too much so that it looks like a complete re-implementation.

matchyc · 2026-05-22T03:29:07Z

This PR fits the standard openpi serving paradim, compatibility has been verified on evaluation tasks.

fhfuih · 2026-05-27T02:51:33Z

This PR adds a standalone realtime OpenPI-compatible robot serving endpoint at /v1/realtime/robot/openpi.

Is there a link to OpenPI's official documentation (about their endpoint design)? And is it only an endpoint thing, what type of models and modalities does it officially support? Appreciate it if you can attach here for our reference 😄

Maybe related: In #3632 and #3737, I have designed and implemented another endpoint purely for diffusion video real-time generation (and in the future, to change prompt during real time video genreation). It is a custom endpoint. So I wonder if this robot protocol is already commonly used by the community and supports real-time video generation.

wtomin

This PR LGTM. Also cc @fake0fan for comments.

fhfuih · 2026-05-28T03:37:32Z

This PR adds a standalone realtime OpenPI-compatible robot serving endpoint at /v1/realtime/robot/openpi.

Is there a link to OpenPI's official documentation (about their endpoint design)? And is it only an endpoint thing, what type of models and modalities does it officially support? Appreciate it if you can attach here for our reference 😄

Maybe related: In #3632 and #3737, I have designed and implemented another endpoint purely for diffusion video real-time generation (and in the future, to change prompt during real time video genreation). It is a custom endpoint. So I wonder if this robot protocol is already commonly used by the community and supports real-time video generation.

Hi @TKONIY saw you reacted to my comment, so could you help attach the link to OpenPI's official documentation? 😂

matchyc · 2026-05-28T06:16:29Z

This PR adds a standalone realtime OpenPI-compatible robot serving endpoint at /v1/realtime/robot/openpi.

Is there a link to OpenPI's official documentation (about their endpoint design)? And is it only an endpoint thing, what type of models and modalities does it officially support? Appreciate it if you can attach here for our reference 😄
Maybe related: In #3632 and #3737, I have designed and implemented another endpoint purely for diffusion video real-time generation (and in the future, to change prompt during real time video genreation). It is a custom endpoint. So I wonder if this robot protocol is already commonly used by the community and supports real-time video generation.

Hi @TKONIY saw you reacted to my comment, so could you help attach the link to OpenPI's official documentation? 😂

FYI: The OpenPI serving paradigm is more like a wire schema rather than a static protocol between models and real robots/simulation, used by the Pi-family model and DreamZero. For the last question, as far as I know, world models that support real/simulated robots have no unified api interface; however, such a schema is used by downstream communities like molmospace, and the common pipeline is that environments send images/prompt/states and the model return actions, no video will be transport (even videos will be generated inside model).

Unified video generation and robot-related API seem to reuse only the websocket, different payload keys are required to be extended, I am not sure if it is a feasible idea. For example, maybe there will be a grpc based endpoint for more robot policy supports based on lerobot.

fake0fan

Overall, I think the serving API code here looks good to me.

And My understanding is that once DreamZero is fully supported end-to-end, we should be able to use these APIs in practice pretty soon, right?

TKONIY · 2026-05-28T10:49:02Z

This PR adds a standalone realtime OpenPI-compatible robot serving endpoint at /v1/realtime/robot/openpi.

Is there a link to OpenPI's official documentation (about their endpoint design)? And is it only an endpoint thing, what type of models and modalities does it officially support? Appreciate it if you can attach here for our reference 😄
Maybe related: In #3632 and #3737, I have designed and implemented another endpoint purely for diffusion video real-time generation (and in the future, to change prompt during real time video genreation). It is a custom endpoint. So I wonder if this robot protocol is already commonly used by the community and supports real-time video generation.

Hi @TKONIY saw you reacted to my comment, so could you help attach the link to OpenPI's official documentation? 😂

FYI: The OpenPI serving paradigm is more like a wire schema rather than a static protocol between models and real robots/simulation, used by the Pi-family model and DreamZero. For the last question, as far as I know, world models that support real/simulated robots have no unified api interface; however, such a schema is used by downstream communities like molmospace, and the common pipeline is that environments send images/prompt/states and the model return actions, no video will be transport (even videos will be generated inside model).

Unified video generation and robot-related API seem to reuse only the websocket, different payload keys are required to be extended, I am not sure if it is a feasible idea. For example, maybe there will be a grpc based endpoint for more robot policy supports based on lerobot.

Thanks @fhfuih, @QiuMike, @fake0fan, I try to address your concern together here. I also cite @matchyc's reply which provides some detailed design concern of this API.

This is already a "unified" robot API by OpenPI. Though it is not an official API, its format and interface are actually reused by most of the robotics community [RFC]: Robotics Evaluation Interface Integrations #3554 and projects. It should be able to easily reused across robotics models (as @timzsu have achieved in [Diffusion] add GR00T-N1.7 pipeline with OpenPI serving #3798 [RFC]: Integrate NVIDIA Isaac GR00T #3553 and @TKONIY achieved in [Diffusion] DreamZero world model integration with CFG parallel + OpenPI serving #2162).
Also, @matchyc has done a lot of work surveying vLLM's integration with robotics community. We have opened a specific RFC to track the integration direction and progress: [RFC]: Robotics Evaluation Interface Integrations #3554. Welcome for discussion and we can keep improving the design of openpi API or develop new API.
For other world model like interactive/streaming video generation. I still not have time to survey about it. I don't doubt that theoretically/at-the-end our API would be merged into carefully-crafted extension of OpenAI chat-completion and realtime api. But I would suggest we prioritize simple and widely-used format API of each specific task (e.g., robotics) first.

Add a generic OpenPI-compatible robot policy websocket endpoint at /v1/realtime/robot/openpi. The endpoint performs msgpack request handling, policy-server metadata handshake, session/reset tracking, and forwards observations to AsyncOmni with robot-specific extra_args. Keep model behavior out of the serving layer by requiring policy_server_config from model config and extracting actions from multimodal_output['actions']. This lets robot policy pipelines implement their own transforms and state without coupling the OpenPI protocol code to a specific model. Add unit coverage for payload validation, missing optional openpi-client dependency, per-connection session state, reset handling, policy_server_config discovery, request construction, and generic actions extraction. Signed-off-by: Yangshen Deng <yangshen.d@outlook.com> Co-authored-by: Meng <meng_chen99@163.com>

Move the OpenPI protocol implementation out of the OpenAI realtime package while keeping the public websocket route unchanged. Make engine request ids unique per inference so robot session ids remain state keys only, and tighten action extraction to require a single result with multimodal_output. Co-authored-by: Yangshen Deng <yangshen.d@outlook.com> Signed-off-by: Meng <meng_chen99@163.com> Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

Replace the pickle-based websocket test serialization mock with a JSON bytes mock that can handle ndarray outputs, and apply the import ordering produced by ruff. Co-authored-by: Meng <meng_chen99@163.com> Signed-off-by: Yangshen Deng <yangshen.d@outlook.com>

fhfuih · 2026-05-28T16:52:47Z

Thanks @matchyc @TKONIY for the extensive explanation! Now I understand that there is no conventional API endpoint naming, nor a conventional websocket payload format, but more like a conventional user flow design.

After a careful reading, now I see that this pure API impl is model-agnostic and modality-agnostic. So, LGTM to merge such an endpoint since it's depended on by other world model development. My previous question on potential endpoint reuse seems irrelevant.

One last minor comment on my side (but forgivable since other people already approved this PR 😁):
This implementation is not quite "real-time"/streaming/duplex in terms of data flow and interaction (explicitly mentioned in code comment: one input -> wait for one output. Websocket only for session persistence). So the endpoint naming /v1/realtime/robot/openpi doesn't perfectly fit. (Unless there is future plan to add streaming input/output or duplex to this endpoint, given any observations on world model trend to support streaming in pipeline-level) Furthermore, for OpenAI users, /v1/realtime is the familiar OpenAI-compatible real-time endpoint, so a subpath endpoint may be intuitively also real-time. I would imagine something like /v1/openpi or /v1/robot/openpi to work as well (depending on whether the "robot use case" needs special attention and whether there are other common protocols for "robot use case")

TKONIY · 2026-05-28T17:31:31Z

Thanks @matchyc @TKONIY for the extensive explanation! Now I understand that there is no conventional API endpoint naming, nor a conventional websocket payload format, but more like a conventional user flow design.

After a careful reading, now I see that this pure API impl is model-agnostic and modality-agnostic. So, LGTM to merge such an endpoint since it's depended on by other world model development. My previous question on potential endpoint reuse seems irrelevant.

One last minor comment on my side (but forgivable since other people already approved this PR 😁): This implementation is not quite "real-time"/streaming/duplex in terms of data flow and interaction (explicitly mentioned in code comment: one input -> wait for one output. Websocket only for session persistence). So the endpoint naming /v1/realtime/robot/openpi doesn't perfectly fit. (Unless there is future plan to add streaming input/output or duplex to this endpoint, given any observations on world model trend to support streaming in pipeline-level) Furthermore, for OpenAI users, /v1/realtime is the familiar OpenAI-compatible real-time endpoint, so a subpath endpoint may be intuitively also real-time. I would imagine something like /v1/openpi or /v1/robot/openpi to work as well (depending on whether the "robot use case" needs special attention and whether there are other common protocols for "robot use case")

@hsliuustc0106 @wtomin @amy-why-3459 PTAL on the url path.

amy-why-3459 · 2026-05-29T05:00:10Z

@Gaohan123 @hsliuustc0106 I think this PR is ready and can be merged.

Gaohan123

In the following PRs, where will be the data preprocessing and postprocessing logic on?

TKONIY · 2026-06-01T00:22:29Z

In the following PRs, where will be the data preprocessing and postprocessing logic on?

We have surveyed the robotics community and different models. The conclusion is that the data processing part seems to be model-and-dataset specific, not only-dataset specific. Therefore, we now put these logic inside each model pipeline instead of the serving interface.

In the future when we have gained more experience in supporting new robotics models, we can try to make a new abstraction layer for that if there is an opportunity of unification.

Ref:

Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>

TKONIY requested review from tzhouam and yenuo26 as code owners May 17, 2026 20:04

TKONIY mentioned this pull request May 17, 2026

[Diffusion] DreamZero world model integration with CFG parallel + OpenPI serving #2162

Merged

TKONIY force-pushed the feature/openpi_serving branch from e7138cc to c065b01 Compare May 17, 2026 20:08

TKONIY mentioned this pull request May 17, 2026

[RFC]: World Model Support #1987

Open

15 tasks

TKONIY force-pushed the feature/openpi_serving branch from c065b01 to d6343f3 Compare May 17, 2026 21:31

Gaohan123 added this to the v0.22.0 milestone May 18, 2026

timzsu suggested changes May 19, 2026

View reviewed changes

Comment thread vllm_omni/entrypoints/openpi/serving.py

TKONIY force-pushed the feature/openpi_serving branch from d6343f3 to 12e27af Compare May 19, 2026 12:43

Srinivasoo7 mentioned this pull request May 19, 2026

[RFC]: Reinforcement Learning for World Models #3747

Open

26 tasks

timzsu mentioned this pull request May 21, 2026

[Diffusion] add GR00T-N1.7 pipeline with OpenPI serving #3798

Open

5 tasks

timzsu approved these changes May 21, 2026

View reviewed changes

amy-why-3459 reviewed May 22, 2026

View reviewed changes

wtomin self-requested a review May 26, 2026 02:09

wtomin reviewed May 26, 2026

View reviewed changes

Comment thread vllm_omni/entrypoints/openpi/connection.py

wtomin reviewed May 26, 2026

View reviewed changes

Comment thread vllm_omni/entrypoints/openai/realtime/robot/openpi_serving.py Outdated

wtomin approved these changes May 28, 2026

View reviewed changes

amy-why-3459 approved these changes May 28, 2026

View reviewed changes

fake0fan reviewed May 28, 2026

View reviewed changes

TKONIY and others added 3 commits May 28, 2026 10:53

TKONIY force-pushed the feature/openpi_serving branch from 97d8711 to b8e89a2 Compare May 28, 2026 10:54

TKONIY mentioned this pull request May 28, 2026

[RFC]: Robotics Evaluation Interface Integrations #3554

Open

1 task

Merge branch 'main' into feature/openpi_serving

bd66851

Gaohan123 added the ready label to trigger buildkite CI label May 31, 2026

Gaohan123 reviewed May 31, 2026

View reviewed changes

amy-why-3459 added 3 commits June 1, 2026 10:55

fix ut

770cfda

Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>

Merge branch 'main' into feature/openpi_serving

0fa0017

Merge branch 'main' into feature/openpi_serving

aa1da43

Gaohan123 merged commit 634af60 into vllm-project:main Jun 1, 2026
6 of 8 checks passed

yicwang mentioned this pull request Jun 4, 2026

[New Model]: Add π0 / π0.5 VLA model support #4136

Open

1 task



		@router.websocket("/v1/realtime/robot/openpi")
		async def realtime_robot_openpi(websocket: WebSocket):

Conversation

TKONIY commented May 17, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 17, 2026

Uh oh!

timzsu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timzsu left a comment

Choose a reason for hiding this comment

Uh oh!

amy-why-3459 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

TKONIY May 22, 2026

Choose a reason for hiding this comment

Uh oh!

QiuMike May 28, 2026

Choose a reason for hiding this comment

Uh oh!

fhfuih May 28, 2026

Choose a reason for hiding this comment

Uh oh!

matchyc commented May 22, 2026

Uh oh!

Uh oh!

Uh oh!

fhfuih commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wtomin left a comment

Choose a reason for hiding this comment

Uh oh!

fhfuih commented May 28, 2026

Uh oh!

matchyc commented May 28, 2026

Uh oh!

fake0fan left a comment

Choose a reason for hiding this comment

Uh oh!

TKONIY commented May 28, 2026

Uh oh!

fhfuih commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TKONIY commented May 28, 2026

Uh oh!

amy-why-3459 commented May 29, 2026

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

TKONIY commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

fhfuih commented May 27, 2026 •

edited

Loading

fhfuih commented May 28, 2026 •

edited

Loading

TKONIY commented Jun 1, 2026 •

edited

Loading