Skip to content

Fix rejected request handling for embeddings#1

Merged
samstokes merged 2 commits intorelease-v0.11.2+evefrom
sam/fix-rejection-for-embeddings
Dec 12, 2025
Merged

Fix rejected request handling for embeddings#1
samstokes merged 2 commits intorelease-v0.11.2+evefrom
sam/fix-rejection-for-embeddings

Conversation

@samstokes
Copy link
Copy Markdown
Collaborator

@samstokes samstokes commented Dec 9, 2025

Purpose

PR vllm-project#27064 that adds max-waiting-queue-length does not work for pooling/embedding requests. If one of those requests is rejected due to a full waiting queue, they trigger an assertion failure in the output processor that crashes the output handler and marks the engine as dead, causing subsequent /health requests to return 503. In k8s this leads to the pod being replaced, but that's an expensive way to recover from a temporary overload.

This extends the handling of rejected requests added in the above PR to also handle rejected pooling requests, so it will return 503s only from the rejected requests instead of poisoning the server.

In order to do that correctly, this extends PoolingOutput to contain a finish_reason.

Test Plan

Tested manually using local test scripts.

Test Result

Sending a flood of embedding requests with --max-waiting-queue-length returns 200 until the queue hits max, then returns 503 until the queue drains again (e.g. with appropriate backoff from the client), then returns 200 again.

samstokes and others added 2 commits December 5, 2025 14:59
- Add finish_reason field to PoolingOutput and PoolingRequestOutput classes
- Update output processor to pass finish_reason through pipeline
- Add _handle_pooling_error_finish_reason helper to serving engine
- Update all pooling endpoints to check finish_reason and return 503 for rejections
- Fixes crash when pooling/embedding requests are rejected due to full queue
- Ensures rejected pooling requests return proper 503 errors instead of 200 with empty data

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@samstokes samstokes changed the base branch from custom-v0.11.2-pr27064 to release-v0.11.2+eve December 10, 2025 00:10
@samstokes samstokes marked this pull request as ready for review December 10, 2025 01:31
@samstokes samstokes requested a review from kevinzwang December 10, 2025 01:32
@samstokes samstokes changed the title Sam/fix rejection for embeddings Fix rejected request handling for embeddings Dec 10, 2025
@samstokes samstokes requested a review from srilman December 10, 2025 01:32
@samstokes samstokes merged commit 7e31dda into release-v0.11.2+eve Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants