Add `/v1/chat/completions/batch` endpoint for batched chat completions by MatejRojec · Pull Request #38011 · vllm-project/vllm

MatejRojec · 2026-03-24T14:16:28Z

Purpose

This PR adds a new /v1/chat/completions/batch endpoint to vLLM's OpenAI compatible API. The existing /v1/chat/completions endpoint only accepts a single conversation per request and remains unchanged. The new endpoint accepts messages as a list[list[...]].

This is useful for applications that need to process multiple independent prompts with structured output in a single round trip, for example, extracting structured data from multiple documents simultaneously, reducing HTTP
overhead and simplifying result handling since all outputs arrive in one response.

Test Plan

I tested 4 cases:
0. Single conversation (non-batched): verified existing /v1/chat/completions behaviour is unchanged

Batched plain text: sent 2 conversations in one request, verified 2 choices returned with correct index (0 and 1) and non-empty content
Batched with json_schema: sent 2 conversations with a JSON schema constraint, verified each choice parses to valid JSON matching the schema
Batched with regex constraint: sent 2 conversations with (yes|no) regex, verified each choice contains only yes or no
Batched book summary extraction: sent 2 conversations extracting author, num_pages, short_summary, and long_summary from two books in a single request, verified both choices return correctly structured JSON

Test Result

=== Example 1a: single conversation (standard endpoint) ===
  [0] The capital of Japan is Tokyo.

=== Example 1b: batched plain text (2 conversations) ===
  [0] The capital of France is Paris.
  [1] The capital of Japan is Tokyo.

=== Example 2: batch with regex constraint (yes|no) ===
  [0] yes
  [1] no

=== Example 3: batch with json_schema ===
  [0] {'name': 'Alice', 'age': 30}
  [1] {'name': 'Bob', 'age': 25}

=== Example 4: batch book summaries ===
  [0] {'author': 'George Orwell', 'num_pages': 328, 'short_summary': 'A dystopian novel set in a totalitarian society ruled by Big Brother, following Winston Smith as he secretly rebels against the oppressive Party that surveils and controls every aspect of life.', 'long_summary': '1984 is a dystopian novel by George Orwell published in 1949. The story is set in a totalitarian society ruled by Big Brother, where the Party controls every aspect of life. Winston Smith, a low-ranking Party member, secretly rebels against the oppressive regime. The novel explores themes of surveillance, truth, and individuality in a society where the Party enforces its control through mind control and propaganda. '}
  [1] {'author': 'Douglas Adams', 'num_pages': 193, 'short_summary': 'A comedic science fiction novel following Arthur Dent, an ordinary Englishman who is whisked off Earth moments before it is demolished to make way for a hyperspace bypass, and his subsequent absurd adventures across the universe.', 'long_summary': "A humorous take on science fiction, the novel follows Arthur Dent, an ordinary British man, as he's unexpectedly transported to the alien planet of Trillian along with his human family. Here, they encounter a diverse cast of characters, including an eccentric alien named Zaphod Beeblebrox and his hapless wife Ford Prefect, who are all embroiled in a grand cosmic mystery. As the story unfolds, it becomes clear that the universe is far stranger and more chaotic than anything Arthur could have imagined, and his journey is one of self-discovery and unexpected friendships. "}

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2026-03-24T14:17:08Z

Documentation preview: https://vllm--38011.org.readthedocs.build/en/38011/

gemini-code-assist

Code Review

This pull request introduces support for batched chat completion requests, which is a valuable feature. The implementation is mostly solid, with new examples, tests, and protocol/serving logic changes. I've found two issues that should be addressed: one critical issue regarding the handling of the n parameter in batched requests which could lead to malformed responses, and a high-severity bug in the echo=True functionality for these new batched requests. After these are fixed, the PR should be in good shape.

github-actions · 2026-03-24T14:31:02Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2026-03-24T15:49:56Z

Documentation preview: https://vllm--38011.org.readthedocs.build/en/38011/

mergify · 2026-03-24T15:49:57Z

Hi @MatejRojec, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

DarkLight1337 · 2026-03-25T03:44:22Z

This is outside of OpenAI API spec so we will not support this for Chat Completions API to avoid bloating the existing functionality and make the implementation even more complicated. If you want this feature, it would be better to define a separate endpoint and use it.

Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>

…ns-api

Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>

MatejRojec · 2026-03-25T22:07:26Z

I had to make some small changes, because some of the tests were failing and some import errors. I have tested the code again now and all looks good to me.

vllm-project#38011) Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>

vllm-project#38011) Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

vllm-project#38011) Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>

vllm-project#38011) Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

vllm-project#38011) Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>

MatejRojec requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, njhill, robertgshaw2-redhat and russellb as code owners March 24, 2026 14:16

MatejRojec closed this Mar 24, 2026

mergify Bot added documentation Improvements or additions to documentation frontend labels Mar 24, 2026

gemini-code-assist Bot reviewed Mar 24, 2026

View reviewed changes

Comment thread vllm/entrypoints/openai/chat_completion/protocol.py Outdated

Comment thread vllm/entrypoints/openai/chat_completion/serving.py Outdated

MatejRojec reopened this Mar 24, 2026

MatejRojec marked this pull request as draft March 24, 2026 15:50

MatejRojec force-pushed the feature/add-batch-requests-to-chat-completions-api branch 2 times, most recently from 234ec82 to 7e98641 Compare March 24, 2026 16:05

MatejRojec marked this pull request as ready for review March 24, 2026 16:15

MatejRojec marked this pull request as draft March 24, 2026 16:27

MatejRojec marked this pull request as ready for review March 24, 2026 18:56

MatejRojec changed the title ~~Add batch requests to vllm api~~ Add batched messages support to /v1/chat/completions Mar 24, 2026

MatejRojec mentioned this pull request Mar 24, 2026

[Feature]: Add /v1/chat/completions/batch endpoint for batched chat completions with structured output support #37976

Closed

1 task

MatejRojec marked this pull request as draft March 25, 2026 07:34

MatejRojec marked this pull request as ready for review March 25, 2026 08:07

MatejRojec marked this pull request as draft March 25, 2026 08:08

MatejRojec changed the title ~~Add batched messages support to /v1/chat/completions~~ Add /v1/chat/completions/batch endpoint for batched chat completions Mar 25, 2026

MatejRojec marked this pull request as ready for review March 25, 2026 08:31

DarkLight1337 approved these changes Mar 25, 2026

View reviewed changes

DarkLight1337 enabled auto-merge (squash) March 25, 2026 17:13

Fix schemathesis failures for batch chat completion endpoint

eddd17b

Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>

auto-merge was automatically disabled March 25, 2026 18:36
Head branch was pushed to by a user without write access

MatejRojec added 3 commits March 25, 2026 21:21

Merge branch 'main' into feature/add-batch-requests-to-chat-completio…

434a529

…ns-api

Fix ProcessorInputs import path in batch_serving.py

2dc0065

Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>

Fix EngineInput import in batch_serving.py

cc69644

Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>

DarkLight1337 approved these changes Mar 26, 2026

View reviewed changes

DarkLight1337 merged commit 2908094 into vllm-project:main Mar 26, 2026
50 checks passed

MatejRojec deleted the feature/add-batch-requests-to-chat-completions-api branch March 26, 2026 08:05

RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026

Add /v1/chat/completions/batch endpoint for batched chat completions (

8f9082b

vllm-project#38011) Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

Add /v1/chat/completions/batch endpoint for batched chat completions (

76f8940

vllm-project#38011) Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

Add /v1/chat/completions/batch endpoint for batched chat completions (

a1f03df

vllm-project#38011) Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>

ZhanqiuHu mentioned this pull request Apr 27, 2026

[CI Bug 2026-04-27] Entrypoints Integration: test_openapi_stateless returns 501 for 'file' type on batch endpoint ZhanqiuHu/vllm-ci-watch#46

Open

SoluMilken mentioned this pull request Apr 27, 2026

[CI Failure]: Batch chat endpoint returns 501 for file content #41013

Closed

3 tasks

haosdent mentioned this pull request Apr 28, 2026

[CI] Return HTTP 400 for unsupported chat content part type #41121

Merged

raviguptaamd mentioned this pull request Apr 29, 2026

[ROCm] Enable DBO (Dynamic Batch Optimization) on ROCm #34726

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `/v1/chat/completions/batch` endpoint for batched chat completions#38011

Add `/v1/chat/completions/batch` endpoint for batched chat completions#38011
DarkLight1337 merged 24 commits intovllm-project:mainfrom
MatejRojec:feature/add-batch-requests-to-chat-completions-api

MatejRojec commented Mar 24, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Mar 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 24, 2026

Uh oh!

mergify Bot commented Mar 24, 2026

Uh oh!

mergify Bot commented Mar 24, 2026

Uh oh!

DarkLight1337 commented Mar 25, 2026 •

edited

Loading

Uh oh!

MatejRojec commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

MatejRojec commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify Bot commented Mar 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 24, 2026

Uh oh!

mergify Bot commented Mar 24, 2026

Uh oh!

mergify Bot commented Mar 24, 2026

Uh oh!

DarkLight1337 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatejRojec commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MatejRojec commented Mar 24, 2026 •

edited

Loading

DarkLight1337 commented Mar 25, 2026 •

edited

Loading