Skip to content

[openai api] log exception in exception handler (1/N)#31164

Merged
DarkLight1337 merged 1 commit intovllm-project:mainfrom
andyxning:log_http_exception_in_handler
Mar 5, 2026
Merged

[openai api] log exception in exception handler (1/N)#31164
DarkLight1337 merged 1 commit intovllm-project:mainfrom
andyxning:log_http_exception_in_handler

Conversation

@andyxning
Copy link
Contributor

@andyxning andyxning commented Dec 22, 2025

Since this PR has been complicated enough. It is XXXL. I will make another PR to refactor _convert_generation_error_to_streaming_response.

Purpose

  1. Log exception in handler to enhance debug ability by logging the exact exception stack trace.
  2. Refactor try...except around create_error_response to simplify the code and enhance code readibility.

example:

(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] HTTPException caught
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] Traceback (most recent call last):
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] File "/data00/vllm/vllm/entrypoints/openai/api_server.py", line 756, in create_chat_completion
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] generator = await handler.create_chat_completion(request, raw_request)
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] File "/data00/vllm/vllm/entrypoints/openai/serving_chat.py", line 373, in create_chat_completion
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] return await self.chat_completion_full_generator(
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] File "/data00/vllm/vllm/entrypoints/openai/serving_chat.py", line 1330, in chat_completion_full_generator
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] logprobs = self._create_chat_logprobs(
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] File "/data00/vllm/vllm/entrypoints/openai/serving_chat.py", line 1664, in _create_chat_logprobs
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] step_top_logprobs = top_logprobs[i]
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] ~~~~~~~~~~~~^^^
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] IndexError: list index out of range
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673]
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] The above exception was the direct cause of the following exception:
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673]
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] Traceback (most recent call last):
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] File "/root/miniconda3/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] await app(scope, receive, sender)
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] File "/root/miniconda3/lib/python3.12/site-packages/fastapi/routing.py", line 111, in app
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] response = await f(request)
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] ^^^^^^^^^^^^^^^^
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] File "/root/miniconda3/lib/python3.12/site-packages/fastapi/routing.py", line 391, in app
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] raw_response = await run_endpoint_function(
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] File "/root/miniconda3/lib/python3.12/site-packages/fastapi/routing.py", line 290, in run_endpoint_function
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] return await dependant.call(**values)
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] File "/data00/vllm/vllm/entrypoints/utils.py", line 100, in wrapper
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] return handler_task.result()
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] File "/data00/vllm/vllm/entrypoints/utils.py", line 121, in wrapper
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] return await func(*args, **kwargs)
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] File "/data00/vllm/vllm/entrypoints/openai/api_server.py", line 758, in create_chat_completion
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] raise HTTPException(
(APIServer pid=7971) ERROR 12-22 16:43:06 [api_server.py:1673] fastapi.exceptions.HTTPException: 500: list index out of range
(APIServer pid=7971) INFO: 127.0.0.1:36332 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

Test Plan

NA

Test Result

NA

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

Improves debuggability and simplifies logging configuration.

  • Add conditional stack-trace logging (when args.log_error_stack is true) in FastAPI handlers for HTTPException and RequestValidationError, and in launcher runtime exception handler
  • Remove log_error_stack parameter/usage across serving classes (OpenAIServing, chat/completion/responses/embedding/pooling/classify/score/tokenize/transcription/translation); stop manual traceback printing in serving layer
  • Update app state initialization to no longer pass log_error_stack; keep structured error responses unchanged and avoid duplicate logs in messages error path

Written by Cursor Bugbot for commit c19e018427f080d3ed27e387e2aeefde1e3de64a. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit cbb7ff7349e0e8b974e7893b9b0c211b48982b59. Configure here.


Note

Improves debuggability while simplifying logging configuration.

  • Add conditional stack-trace logging (guarded by args.log_error_stack) in FastAPI handlers for HTTPException and RequestValidationError, and in launcher runtime exception handler
  • Remove log_error_stack parameter/usage across serving classes (OpenAIServing and all OpenAI/pooling/speech/tokenize derivatives) and delete manual traceback printing in serving_engine
  • Update app initialization to stop passing log_error_stack; error response payloads remain the same; drop extra exception log in create_messages to avoid duplicates
  • OpenAIServingChat API change: replace log_error_stack with exclude_log_deltas; wire args.exclude_log_deltas during construction

Written by Cursor Bugbot for commit cbb7ff7349e0e8b974e7893b9b0c211b48982b59. This will update automatically on new commits. Configure here.


Note

Improves debuggability while simplifying logging configuration.

  • Add conditional stacktrace logging (guarded by args.log_error_stack) in HTTPException and RequestValidationError handlers and in the launcher runtime exception handler
  • Remove log_error_stack plumbing from serving classes (OpenAIServing and all chat/completion/responses/pooling/embed/classify/score/tokenize/transcription/translation variants) and stop manual traceback printing in serving_engine
  • Update app initialization to stop passing log_error_stack; keep error payloads unchanged and drop duplicate logging in create_messages error path

Written by Cursor Bugbot for commit 8905b7e1df3ead3c1cd1b83155522dc4b2985561. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit 80d180879b44e789d25b1472709dd67e0fca3917. Configure here.


Note

Improves debuggability while simplifying logging.

  • Add conditional stack-trace logging (guarded by args.log_error_stack) in HTTPException and RequestValidationError handlers and in the launcher runtime exception handler
  • Remove log_error_stack parameter/usage from serving classes (OpenAIServing and all chat/completion/responses/pooling/embed/classify/score/tokenize/transcription/translation) and delete manual traceback printing in serving_engine
  • Update app initialization to stop passing log_error_stack; keep error response payloads unchanged and drop duplicate logging in create_messages error path

Written by Cursor Bugbot for commit 4e77554346d2d5deef3e4053303b0ad7172112c1. This will update automatically on new commits. Configure here.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies exception handling and logging across vllm/entrypoints/launcher.py and vllm/entrypoints/openai/api_server.py. In launcher.py, a logger.exception call was added to the runtime_exception_handler for RuntimeError, EngineDeadError, and EngineGenerateError, and the exception parameter was explicitly named exc. In api_server.py, a logger.exception call was removed from an except block handling create_messages errors, while new logger.exception calls were added to the http_exception_handler and validation_exception_handler for HTTPException and RequestValidationError respectively.

@mergify
Copy link

mergify bot commented Dec 24, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @andyxning.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 24, 2025
@andyxning andyxning force-pushed the log_http_exception_in_handler branch from 1c67435 to 81c268f Compare December 24, 2025 08:41
@mergify mergify bot removed the needs-rebase label Dec 24, 2025
@andyxning andyxning force-pushed the log_http_exception_in_handler branch from 81c268f to 7dc44f9 Compare December 24, 2025 08:44
@andyxning
Copy link
Contributor Author

@aarnphm @chaunceyjiang

@robertgshaw2-redhat
Copy link
Collaborator

I dont think this can be on by default --- there could be a lot of logs from users creating malformed requests

@andyxning
Copy link
Contributor Author

andyxning commented Dec 30, 2025

How about adding a new env VLLM_LOG_API_SERVER_EXCEPTION to control this logic. By default, it will be false and the exception log will be disabled. But, it can be enabled by setting the env.

@andyxning
Copy link
Contributor Author

This is quite useful when used in debug.

@andyxning andyxning force-pushed the log_http_exception_in_handler branch from a52e66b to 2a099ba Compare December 30, 2025 03:17
@andyxning
Copy link
Contributor Author

@robertgshaw2-redhat PTAL.

@andyxning
Copy link
Contributor Author

Friendly ping. @robertgshaw2-redhat

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this controlled by FrontendArgs.log_error_stack already?

@andyxning
Copy link
Contributor Author

andyxning commented Jan 4, 2026

Thanks for the tips. Some api does not correctly handle handler exception thus the exception will not be logged. The code blocks shows that the exception is recorded with print. I have tested running vllm with logging to stderr, which is the default behavior, the exception will not be recorded.

if self.log_error_stack:
            exc_type, _, _ = sys.exc_info()
            if exc_type is not None:
                traceback.print_exc()
            else:
                traceback.print_stack()

Will dig this later.

@andyxning andyxning force-pushed the log_http_exception_in_handler branch 2 times, most recently from 3f0c514 to fc56b85 Compare January 4, 2026 13:23
@andyxning
Copy link
Contributor Author

Only OpenAIServing sub class openai methods can use create_error_response. Some debug openai methods does not inherit from OpenAIServing, such as /scale_elastic_ep.

@andyxning
Copy link
Contributor Author

Guard logger.exception with log_error_stack.

@DarkLight1337
Copy link
Member

I am ok with this change but let's have a second opinion from @njhill @robertgshaw2-redhat

@andyxning andyxning force-pushed the log_http_exception_in_handler branch from fc56b85 to 92187e8 Compare January 4, 2026 13:48
@andyxning andyxning requested a review from noooop as a code owner January 4, 2026 13:48
@mergify
Copy link

mergify bot commented Jan 4, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @andyxning.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 4, 2026
@andyxning andyxning force-pushed the log_http_exception_in_handler branch from 92187e8 to e2188f6 Compare January 4, 2026 13:49
@andyxning andyxning force-pushed the log_http_exception_in_handler branch 3 times, most recently from 5b1cb62 to 6ac7c29 Compare March 2, 2026 05:52
Comment on lines 34 to 36

try:
generator = await handler.create_classify(request, raw_request)
except Exception as e:
generator = handler.create_error_response(e)
generator = await handler.create_classify(request, raw_request)

Copy link
Collaborator

@noooop noooop Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work!

I am currently refactoring the pooling entrypoints #35604, which may conflict with this PR. Below are three options.

  1. This PR does not modify the pooling entrypoints, waiting for [Frontend][1/n] Improve pooling entrypoints | classify. #35604 to land, then another PR will make the modifications for pooling entrypoints.
    2. [Frontend][1/n] Improve pooling entrypoints | classify. #35604 wait for this PR to land.
    3. First, merge [Frontend][1/n] Improve pooling entrypoints | classify. #35604, then refactor the exception handler of the pooling entrypoints, followed by refactoring the other parts of this pr.

I prefer option 3 because this PR has a large scope and is difficult to land. If we only modify the pooling entrypoints, it might be simpler. (Also, the exception handler in #35604 is handled in one place, and I believe it is the direction for future API server optimization.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I am currently refactoring the pooling entrypoints #35604, which may conflict with this PR. Please do not modify the vllm/entrypoints/pooling/ section for the time being to avoid conflicts.

I have separated a create_error_response in vllm/entrypoints/utils.py, which you can use.

https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/utils.py#L305

@andyxning
Copy link
Contributor Author

/cc @njhill

@mergify
Copy link

mergify bot commented Mar 3, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @andyxning.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 3, 2026
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andyxning, looks like a good simplification.

serving._raise_if_error(None, "test-request-id") # should not raise


@pytest.mark.asyncio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ask claude or codex to add it?

@andyxning andyxning changed the title [openai api] log exception in exception handler (1/3) [openai api] log exception in exception handler (1/N) Mar 4, 2026
@andyxning andyxning force-pushed the log_http_exception_in_handler branch from 1d6ceef to 7cf20e9 Compare March 4, 2026 03:14
@mergify mergify bot removed the needs-rebase label Mar 4, 2026
Comment on lines 43 to 45

from ...utils import create_error_response
from .io_processor import PoolingIOProcessor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

@mergify
Copy link

mergify bot commented Mar 4, 2026

Hi @andyxning, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@andyxning andyxning force-pushed the log_http_exception_in_handler branch from 7cf20e9 to 3a14559 Compare March 4, 2026 03:24
@andyxning
Copy link
Contributor Author

All except ut comments has been addressed. PTAL. @DarkLight1337 @njhill @noooop

Others changes will be made in the follow-up prs.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) March 4, 2026 03:53
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 4, 2026
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change BadReqeustError to InternalServerError is because NotImplementedError is inherited from RuntimeError. RuntimeError has been remove from create_error_response and thus NotImplementedError will result in 501 http response status code.

Image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTPStatus.BAD_REQUEST.value: {"model": ErrorResponse},
HTTPStatus.NOT_FOUND.value: {"model": ErrorResponse},
HTTPStatus.INTERNAL_SERVER_ERROR.value: {"model": ErrorResponse},
HTTPStatus.NOT_IMPLEMENTED.value: {"model": ErrorResponse},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding Not Implemented is because NotImplementedError is inherited from RuntimeError. RuntimeError has been remove from create_error_response and thus NotImplementedError will result in 501 http response status code which is NOT_IMPLEMENTED.

Image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTPStatus.BAD_REQUEST.value: {"model": ErrorResponse},
HTTPStatus.NOT_FOUND.value: {"model": ErrorResponse},
HTTPStatus.INTERNAL_SERVER_ERROR.value: {"model": ErrorResponse},
HTTPStatus.NOT_IMPLEMENTED.value: {"model": ErrorResponse},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding Not Implemented is because NotImplementedError is inherited from RuntimeError. RuntimeError has been remove from create_error_response and thus NotImplementedError will result in 501 http response status code which is NOT_IMPLEMENTED.

Image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return strategy.filter(no_invalid_types)


def customized_not_a_server_error(
Copy link
Contributor Author

@andyxning andyxning Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New changes. By default all server error will cause an exception or failure in schemathesis. We need to customize this by allowing /v1/chat/completions/render and /v1/chat/completions openapi return 501 status code.

Signed-off-by: Andy Xie <andy.xning@gmail.com>
reasoning_parser,
)
except GenerationError as e:
return self._convert_generation_error_to_response(e)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be removed. This is a custom error defined by vLLM and should not produce an error log.
@DarkLight1337 @andyxning @noooop @njhill

see #37148

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will invest later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants