Skip to content

Add /v1/chat/completions/batch endpoint for batched chat completions#38011

Merged
DarkLight1337 merged 24 commits intovllm-project:mainfrom
MatejRojec:feature/add-batch-requests-to-chat-completions-api
Mar 26, 2026
Merged

Add /v1/chat/completions/batch endpoint for batched chat completions#38011
DarkLight1337 merged 24 commits intovllm-project:mainfrom
MatejRojec:feature/add-batch-requests-to-chat-completions-api

Conversation

@MatejRojec
Copy link
Copy Markdown
Contributor

@MatejRojec MatejRojec commented Mar 24, 2026

Purpose

This PR adds a new /v1/chat/completions/batch endpoint to vLLM's OpenAI compatible API. The existing /v1/chat/completions endpoint only accepts a single conversation per request and remains unchanged. The new endpoint accepts messages as a list[list[...]].

This is useful for applications that need to process multiple independent prompts with structured output in a single round trip, for example, extracting structured data from multiple documents simultaneously, reducing HTTP
overhead and simplifying result handling since all outputs arrive in one response.

Test Plan

I tested 4 cases:
0. Single conversation (non-batched): verified existing /v1/chat/completions behaviour is unchanged

  1. Batched plain text: sent 2 conversations in one request, verified 2 choices returned with correct index (0 and 1) and non-empty content
  2. Batched with json_schema: sent 2 conversations with a JSON schema constraint, verified each choice parses to valid JSON matching the schema
  3. Batched with regex constraint: sent 2 conversations with (yes|no) regex, verified each choice contains only yes or no
  4. Batched book summary extraction: sent 2 conversations extracting author, num_pages, short_summary, and long_summary from two books in a single request, verified both choices return correctly structured JSON

Test Result

=== Example 1a: single conversation (standard endpoint) ===
  [0] The capital of Japan is Tokyo.

=== Example 1b: batched plain text (2 conversations) ===
  [0] The capital of France is Paris.
  [1] The capital of Japan is Tokyo.

=== Example 2: batch with regex constraint (yes|no) ===
  [0] yes
  [1] no

=== Example 3: batch with json_schema ===
  [0] {'name': 'Alice', 'age': 30}
  [1] {'name': 'Bob', 'age': 25}

=== Example 4: batch book summaries ===
  [0] {'author': 'George Orwell', 'num_pages': 328, 'short_summary': 'A dystopian novel set in a totalitarian society ruled by Big Brother, following Winston Smith as he secretly rebels against the oppressive Party that surveils and controls every aspect of life.', 'long_summary': '1984 is a dystopian novel by George Orwell published in 1949. The story is set in a totalitarian society ruled by Big Brother, where the Party controls every aspect of life. Winston Smith, a low-ranking Party member, secretly rebels against the oppressive regime. The novel explores themes of surveillance, truth, and individuality in a society where the Party enforces its control through mind control and propaganda. '}
  [1] {'author': 'Douglas Adams', 'num_pages': 193, 'short_summary': 'A comedic science fiction novel following Arthur Dent, an ordinary Englishman who is whisked off Earth moments before it is demolished to make way for a hyperspace bypass, and his subsequent absurd adventures across the universe.', 'long_summary': "A humorous take on science fiction, the novel follows Arthur Dent, an ordinary British man, as he's unexpectedly transported to the alien planet of Trillian along with his human family. Here, they encounter a diverse cast of characters, including an eccentric alien named Zaphod Beeblebrox and his hapless wife Ford Prefect, who are all embroiled in a grand cosmic mystery. As the story unfolds, it becomes clear that the universe is far stranger and more chaotic than anything Arthur could have imagined, and his journey is one of self-discovery and unexpected friendships. "}

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 24, 2026

Documentation preview: https://vllm--38011.org.readthedocs.build/en/38011/

@mergify mergify Bot added documentation Improvements or additions to documentation frontend labels Mar 24, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for batched chat completion requests, which is a valuable feature. The implementation is mostly solid, with new examples, tests, and protocol/serving logic changes. I've found two issues that should be addressed: one critical issue regarding the handling of the n parameter in batched requests which could lead to malformed responses, and a high-severity bug in the echo=True functionality for these new batched requests. After these are fixed, the PR should be in good shape.

Comment thread vllm/entrypoints/openai/chat_completion/protocol.py Outdated
Comment thread vllm/entrypoints/openai/chat_completion/serving.py Outdated
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@MatejRojec MatejRojec reopened this Mar 24, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 24, 2026

Documentation preview: https://vllm--38011.org.readthedocs.build/en/38011/

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 24, 2026

Hi @MatejRojec, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@MatejRojec MatejRojec marked this pull request as draft March 24, 2026 15:50
@MatejRojec MatejRojec force-pushed the feature/add-batch-requests-to-chat-completions-api branch 2 times, most recently from 234ec82 to 7e98641 Compare March 24, 2026 16:05
@MatejRojec MatejRojec marked this pull request as ready for review March 24, 2026 16:15
@MatejRojec MatejRojec marked this pull request as draft March 24, 2026 16:27
@MatejRojec MatejRojec marked this pull request as ready for review March 24, 2026 18:56
@MatejRojec MatejRojec changed the title Add batch requests to vllm api Add batched messages support to /v1/chat/completions Mar 24, 2026
@DarkLight1337
Copy link
Copy Markdown
Member

DarkLight1337 commented Mar 25, 2026

This is outside of OpenAI API spec so we will not support this for Chat Completions API to avoid bloating the existing functionality and make the implementation even more complicated. If you want this feature, it would be better to define a separate endpoint and use it.

@MatejRojec MatejRojec marked this pull request as draft March 25, 2026 07:34
@MatejRojec MatejRojec marked this pull request as ready for review March 25, 2026 08:07
@MatejRojec MatejRojec marked this pull request as draft March 25, 2026 08:08
@MatejRojec MatejRojec changed the title Add batched messages support to /v1/chat/completions Add /v1/chat/completions/batch endpoint for batched chat completions Mar 25, 2026
@MatejRojec MatejRojec marked this pull request as ready for review March 25, 2026 08:31
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) March 25, 2026 17:13
Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>
auto-merge was automatically disabled March 25, 2026 18:36

Head branch was pushed to by a user without write access

Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>
Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>
@MatejRojec
Copy link
Copy Markdown
Contributor Author

I had to make some small changes, because some of the tests were failing and some import errors. I have tested the code again now and all looks good to me.

@DarkLight1337 DarkLight1337 merged commit 2908094 into vllm-project:main Mar 26, 2026
50 checks passed
@MatejRojec MatejRojec deleted the feature/add-batch-requests-to-chat-completions-api branch March 26, 2026 08:05
RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026
vllm-project#38011)

Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>
nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026
vllm-project#38011)

Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>

Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
vllm-project#38011)

Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>
puririshi98 pushed a commit to puririshi98/vllm that referenced this pull request Apr 7, 2026
vllm-project#38011)

Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>
Signed-off-by: Rishi Puri <riship@nvidia.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
vllm-project#38011)

Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants