[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing by chaunceyjiang · Pull Request #37148 · vllm-project/vllm

chaunceyjiang · 2026-03-16T07:16:37Z

Purpose

PR #26813 introduced a FINISHED_ERROR error for P/D and converted it into a 500 HTTP error. However, PR #31164 removed this handling, which caused a large number of logs like the following to appear. This may lead users to mistakenly believe that vLLM has encountered an error.

(APIServer pid=134834) ERROR:    Exception in ASGI application
(APIServer pid=134834) Traceback (most recent call last):
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 416, in run_asgi
(APIServer pid=134834)     result = await app(  # type: ignore[func-returns-value]
(APIServer pid=134834)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
(APIServer pid=134834)     return await self.app(scope, receive, send)
(APIServer pid=134834)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/fastapi/applications.py", line 1160, in __call__
(APIServer pid=134834)     await super().__call__(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/applications.py", line 107, in __call__
(APIServer pid=134834)     await self.middleware_stack(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 186, in __call__
(APIServer pid=134834)     raise exc
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 164, in __call__
(APIServer pid=134834)     await self.app(scope, receive, _send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/middleware/cors.py", line 87, in __call__
(APIServer pid=134834)     await self.app(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 177, in __call__
(APIServer pid=134834)     raise exc
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 175, in __call__
(APIServer pid=134834)     await self.app(scope, receive, send_wrapper)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 63, in __call__
(APIServer pid=134834)     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
(APIServer pid=134834)     raise exc
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
(APIServer pid=134834)     await app(scope, receive, sender)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
(APIServer pid=134834)     await self.app(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/routing.py", line 716, in __call__
(APIServer pid=134834)     await self.middleware_stack(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/routing.py", line 736, in app
(APIServer pid=134834)     await route.handle(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/routing.py", line 290, in handle
(APIServer pid=134834)     await self.app(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/fastapi/routing.py", line 130, in app
(APIServer pid=134834)     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
(APIServer pid=134834)     raise exc
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
(APIServer pid=134834)     await app(scope, receive, sender)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/fastapi/routing.py", line 116, in app
(APIServer pid=134834)     response = await f(request)
(APIServer pid=134834)                ^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/fastapi/routing.py", line 670, in app
(APIServer pid=134834)     raw_response = await run_endpoint_function(
(APIServer pid=134834)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/fastapi/routing.py", line 324, in run_endpoint_function
(APIServer pid=134834)     return await dependant.call(**values)
(APIServer pid=134834)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/vllm/vllm/entrypoints/utils.py", line 95, in wrapper
(APIServer pid=134834)     return handler_task.result()
(APIServer pid=134834)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/vllm/vllm/entrypoints/utils.py", line 116, in wrapper
(APIServer pid=134834)     return await func(*args, **kwargs)
(APIServer pid=134834)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/vllm/vllm/entrypoints/openai/chat_completion/api_router.py", line 55, in create_chat_completion
(APIServer pid=134834)     generator = await handler.create_chat_completion(request, raw_request)
(APIServer pid=134834)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/vllm/vllm/entrypoints/openai/chat_completion/serving.py", line 346, in create_chat_completion
(APIServer pid=134834)     return await self.chat_completion_full_generator(
(APIServer pid=134834)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/vllm/vllm/entrypoints/openai/chat_completion/serving.py", line 1304, in chat_completion_full_generator
(APIServer pid=134834)     self._raise_if_error(output.finish_reason, request_id)
(APIServer pid=134834)   File "/mnt/data4/jxy/vllm/vllm/entrypoints/openai/engine/serving.py", line 601, in _raise_if_error
(APIServer pid=134834)     raise GenerationError("Internal server error")
(APIServer pid=134834) vllm.entrypoints.openai.engine.protocol.GenerationError: Internal server error

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2026-03-16T07:22:29Z

Hi @chaunceyjiang, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

gemini-code-assist

Code Review

This pull request aims to fix an issue where FINISHED_ERROR from the engine was causing unhandled exceptions and noisy logs. The approach of re-introducing error handling to convert GenerationError into a proper HTTP 500 error response is correct. The changes in chat_completion and completion endpoints, along with the new tests, are well-implemented. However, I've found a critical issue in the responses endpoint where the generated error response is not being returned, which would defeat the purpose of the fix for that endpoint.

vllm/entrypoints/openai/responses/serving.py

DarkLight1337 · 2026-03-16T08:30:11Z

Had no idea this was a thing, can you add a code comment explaining why this is needed? I really thought GenerationError indicated a genuine internal error.

andyxning · 2026-03-16T09:45:35Z

Had no idea this was a thing, can you add a code comment explaining why this is needed? I really thought GenerationError indicated a genuine internal error.

+1.

andyxning · 2026-03-16T09:47:34Z

Btw, please take a look at pr #37157.

Exception handler but it will be converted to ServerErrorMiddleware and Exception in ASGI application is logged in the middleware.

chaunceyjiang · 2026-03-16T10:17:05Z

Test

vllm serve /mnt/data3/models/Qwen/Qwen3.5-35B-A3B --enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3 
...
...
(APIServer pid=3561573) ERROR 03-16 18:08:11 [serving.py:597] Request chatcmpl-99261bbff102270e failed with an internal error during generation
(APIServer pid=3561573) INFO:     127.0.0.1:33610 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=3561573) INFO 03-16 18:08:18 [loggers.py:259] Engine 000: Avg prompt throughput: 1.8 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%

client 
{"error":{"message":"Internal server error","type":"InternalServerError","param":null,"code":500}

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

chaunceyjiang · 2026-03-16T13:10:32Z

Exception handler but it will be converted to ServerErrorMiddleware and Exception in ASGI application is logged in the middleware.

@andyxning PTAL.

I believe this is necessary. When request.stream = true, the GenerationError will be caught by _convert_generation_error_to_streaming_response, so no stack trace appears in the logs.

Therefore, I think the behavior should be consistent regardless of whether request.stream is true or false.

DarkLight1337

This is cleaner, thanks

andyxning · 2026-03-16T15:36:14Z

/lgtm

…-project#37148) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

chaunceyjiang requested review from DarkLight1337, NickLucche, aarnphm, robertgshaw2-redhat and russellb as code owners March 16, 2026 07:16

chaunceyjiang requested a review from njhill March 16, 2026 07:16

mergify bot added frontend bug Something isn't working labels Mar 16, 2026

gemini-code-assist bot reviewed Mar 16, 2026

View reviewed changes

vllm/entrypoints/openai/responses/serving.py Outdated Show resolved Hide resolved

chaunceyjiang mentioned this pull request Mar 16, 2026

[openai api] log exception in exception handler (1/N) #31164

Merged

5 tasks

[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing

c7c0ddd

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

chaunceyjiang force-pushed the finished_error branch from 1216be6 to c7c0ddd Compare March 16, 2026 13:05

DarkLight1337 approved these changes Mar 16, 2026

View reviewed changes

DarkLight1337 enabled auto-merge (squash) March 16, 2026 14:23

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 16, 2026

DarkLight1337 merged commit 6682c23 into vllm-project:main Mar 16, 2026
48 checks passed

chaunceyjiang deleted the finished_error branch March 17, 2026 08:30

DarkLight1337 mentioned this pull request Mar 17, 2026

[openapi] remove redundant exception stack trace[4/N] #37157

Merged

5 tasks

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026

[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing (vllm…

7da4a4a

…-project#37148) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing (vllm…

be5c332

…-project#37148) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing (vllm…

b3589f8

…-project#37148) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing (vllm…

9624e05

…-project#37148) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing#37148

[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing#37148
DarkLight1337 merged 1 commit intovllm-project:mainfrom
chaunceyjiang:finished_error

chaunceyjiang commented Mar 16, 2026 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Mar 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

DarkLight1337 commented Mar 16, 2026 •

edited

Loading

Uh oh!

andyxning commented Mar 16, 2026

Uh oh!

andyxning commented Mar 16, 2026

Uh oh!

chaunceyjiang commented Mar 16, 2026

Uh oh!

chaunceyjiang commented Mar 16, 2026

Uh oh!

DarkLight1337 left a comment

Uh oh!

andyxning commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

chaunceyjiang commented Mar 16, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Mar 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andyxning commented Mar 16, 2026

Uh oh!

andyxning commented Mar 16, 2026

Uh oh!

chaunceyjiang commented Mar 16, 2026

Uh oh!

chaunceyjiang commented Mar 16, 2026

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

andyxning commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chaunceyjiang commented Mar 16, 2026 •

edited by github-actions bot

Loading

DarkLight1337 commented Mar 16, 2026 •

edited

Loading