Skip to content

[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing#37148

Merged
DarkLight1337 merged 1 commit intovllm-project:mainfrom
chaunceyjiang:finished_error
Mar 16, 2026
Merged

[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing#37148
DarkLight1337 merged 1 commit intovllm-project:mainfrom
chaunceyjiang:finished_error

Conversation

@chaunceyjiang
Copy link
Collaborator

@chaunceyjiang chaunceyjiang commented Mar 16, 2026

Purpose

PR #26813 introduced a FINISHED_ERROR error for P/D and converted it into a 500 HTTP error. However, PR #31164 removed this handling, which caused a large number of logs like the following to appear. This may lead users to mistakenly believe that vLLM has encountered an error.

(APIServer pid=134834) ERROR:    Exception in ASGI application
(APIServer pid=134834) Traceback (most recent call last):
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 416, in run_asgi
(APIServer pid=134834)     result = await app(  # type: ignore[func-returns-value]
(APIServer pid=134834)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
(APIServer pid=134834)     return await self.app(scope, receive, send)
(APIServer pid=134834)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/fastapi/applications.py", line 1160, in __call__
(APIServer pid=134834)     await super().__call__(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/applications.py", line 107, in __call__
(APIServer pid=134834)     await self.middleware_stack(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 186, in __call__
(APIServer pid=134834)     raise exc
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 164, in __call__
(APIServer pid=134834)     await self.app(scope, receive, _send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/middleware/cors.py", line 87, in __call__
(APIServer pid=134834)     await self.app(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 177, in __call__
(APIServer pid=134834)     raise exc
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 175, in __call__
(APIServer pid=134834)     await self.app(scope, receive, send_wrapper)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 63, in __call__
(APIServer pid=134834)     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
(APIServer pid=134834)     raise exc
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
(APIServer pid=134834)     await app(scope, receive, sender)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
(APIServer pid=134834)     await self.app(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/routing.py", line 716, in __call__
(APIServer pid=134834)     await self.middleware_stack(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/routing.py", line 736, in app
(APIServer pid=134834)     await route.handle(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/routing.py", line 290, in handle
(APIServer pid=134834)     await self.app(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/fastapi/routing.py", line 130, in app
(APIServer pid=134834)     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
(APIServer pid=134834)     raise exc
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
(APIServer pid=134834)     await app(scope, receive, sender)
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/fastapi/routing.py", line 116, in app
(APIServer pid=134834)     response = await f(request)
(APIServer pid=134834)                ^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/fastapi/routing.py", line 670, in app
(APIServer pid=134834)     raw_response = await run_endpoint_function(
(APIServer pid=134834)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/venv/lib/python3.12/site-packages/fastapi/routing.py", line 324, in run_endpoint_function
(APIServer pid=134834)     return await dependant.call(**values)
(APIServer pid=134834)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/vllm/vllm/entrypoints/utils.py", line 95, in wrapper
(APIServer pid=134834)     return handler_task.result()
(APIServer pid=134834)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/vllm/vllm/entrypoints/utils.py", line 116, in wrapper
(APIServer pid=134834)     return await func(*args, **kwargs)
(APIServer pid=134834)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/vllm/vllm/entrypoints/openai/chat_completion/api_router.py", line 55, in create_chat_completion
(APIServer pid=134834)     generator = await handler.create_chat_completion(request, raw_request)
(APIServer pid=134834)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/vllm/vllm/entrypoints/openai/chat_completion/serving.py", line 346, in create_chat_completion
(APIServer pid=134834)     return await self.chat_completion_full_generator(
(APIServer pid=134834)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=134834)   File "/mnt/data4/jxy/vllm/vllm/entrypoints/openai/chat_completion/serving.py", line 1304, in chat_completion_full_generator
(APIServer pid=134834)     self._raise_if_error(output.finish_reason, request_id)
(APIServer pid=134834)   File "/mnt/data4/jxy/vllm/vllm/entrypoints/openai/engine/serving.py", line 601, in _raise_if_error
(APIServer pid=134834)     raise GenerationError("Internal server error")
(APIServer pid=134834) vllm.entrypoints.openai.engine.protocol.GenerationError: Internal server error

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link

mergify bot commented Mar 16, 2026

Hi @chaunceyjiang, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an issue where FINISHED_ERROR from the engine was causing unhandled exceptions and noisy logs. The approach of re-introducing error handling to convert GenerationError into a proper HTTP 500 error response is correct. The changes in chat_completion and completion endpoints, along with the new tests, are well-implemented. However, I've found a critical issue in the responses endpoint where the generated error response is not being returned, which would defeat the purpose of the fix for that endpoint.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Mar 16, 2026

Had no idea this was a thing, can you add a code comment explaining why this is needed? I really thought GenerationError indicated a genuine internal error.

@andyxning
Copy link
Contributor

Had no idea this was a thing, can you add a code comment explaining why this is needed? I really thought GenerationError indicated a genuine internal error.

+1.

@andyxning
Copy link
Contributor

Btw, please take a look at pr #37157.

Exception handler but it will be converted to ServerErrorMiddleware and Exception in ASGI application is logged in the middleware.

@chaunceyjiang
Copy link
Collaborator Author

Test

vllm serve /mnt/data3/models/Qwen/Qwen3.5-35B-A3B --enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3 
...
...
(APIServer pid=3561573) ERROR 03-16 18:08:11 [serving.py:597] Request chatcmpl-99261bbff102270e failed with an internal error during generation
(APIServer pid=3561573) INFO:     127.0.0.1:33610 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=3561573) INFO 03-16 18:08:18 [loggers.py:259] Engine 000: Avg prompt throughput: 1.8 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
client 
{"error":{"message":"Internal server error","type":"InternalServerError","param":null,"code":500}

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@chaunceyjiang
Copy link
Collaborator Author

Exception handler but it will be converted to ServerErrorMiddleware and Exception in ASGI application is logged in the middleware.

@andyxning PTAL.

I believe this is necessary. When request.stream = true, the GenerationError will be caught by _convert_generation_error_to_streaming_response, so no stack trace appears in the logs.

Therefore, I think the behavior should be consistent regardless of whether request.stream is true or false.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cleaner, thanks

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) March 16, 2026 14:23
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 16, 2026
@andyxning
Copy link
Contributor

/lgtm

@DarkLight1337 DarkLight1337 merged commit 6682c23 into vllm-project:main Mar 16, 2026
48 checks passed
@chaunceyjiang chaunceyjiang deleted the finished_error branch March 17, 2026 08:30
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants