[openapi server] log exception in exception handler(2/N) by andyxning · Pull Request #36201 · vllm-project/vllm

andyxning · 2026-03-06T02:55:11Z

Purpose

This is a follow-up PR for #31164

Refactor models related exception handle just like other openapi
Refactor create_error_response to the vllm.entrypoints.utils.create_error_response.
Refactor all the handler is None openapi response status code from BadRequestError to NotImplementedError.

Test Plan

NA

Test Result

NA

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request refactors exception handling by centralizing it in vllm/entrypoints/utils.py, which is a positive step towards better maintainability. My feedback focuses on improving the robustness of this new centralized logic. The current implementation relies on string matching on exception messages, which is fragile. I've suggested using custom exception types instead, which would make the error handling more explicit and resilient. I also pointed out an instance where overly broad exception catching could lead to mis-classifying server-side errors as client errors, and recommended more granular handling.

gemini-code-assist · 2026-03-06T03:00:18Z

vllm/entrypoints/openai/run_batch.py

            except Exception as e:
                operation = "translation" if is_translation else "transcription"
-                return ErrorResponse(
-                    error=ErrorInfo(
-                        message=f"Failed to process {operation}: {str(e)}",
-                        type="BadRequestError",
-                        code=HTTPStatus.BAD_REQUEST.value,
-                    )
-                )
+                raise Exception(f"Failed to process {operation}: {str(e)}") from e


Catching a broad Exception and re-raising it to be classified as a BadRequestError can be problematic. This try block covers operations like downloading from a URL, which can fail for various reasons (e.g., network issues, server-side problems with the URL host) that are not client errors. Classifying all such failures as a 400 Bad Request can be misleading and hide the true cause of the error. It would be better to have more granular exception handling to differentiate between client errors (like an invalid URL) and server-side or transient errors, and map them to appropriate HTTP status codes (e.g., 4xx vs 5xx).

gemini-code-assist · 2026-03-06T03:00:18Z

vllm/entrypoints/utils.py

+        elif "No adapter found" in str(exc):
+            err_type = "NotFoundError"
+            status_code = HTTPStatus.NOT_FOUND
+            param = None
+        elif "translation" in str(exc) or "transcription" in str(exc):
+            err_type = "BadRequestError"
+            status_code = HTTPStatus.BAD_REQUEST
+            param = None


Using string matching on exception messages to determine error types is brittle and can lead to incorrect error classification. If the underlying error messages change, this logic will break. A more robust approach is to use custom exception types.

For example:

For the 'No adapter found' case, a specific LoRAAdapterNotFoundError could be raised from the LoRA loading logic.

For transcription/translation failures, a custom AudioProcessingError could be raised from run_batch.py.

Then, you could check for these specific exception types here:

# from vllm.lora.exception import LoRAAdapterNotFoundError # from vllm.entrypoints.openai.run_batch import AudioProcessingError # ... elif isinstance(exc, LoRAAdapterNotFoundError): err_type = "NotFoundError" status_code = HTTPStatus.NOT_FOUND param = None elif isinstance(exc, AudioProcessingError): err_type = "BadRequestError" status_code = HTTPStatus.BAD_REQUEST param = None

This would make the error handling more explicit, maintainable, and resilient to changes.

noooop · 2026-03-06T03:33:13Z

Can create_error_response here be replaced with NotImplementedError? If yes, please do it.

vllm/vllm/entrypoints/pooling/classify/api_router.py

Lines 32 to 39 in c012a8c

    
           if handler is None: 
        
               error_response = create_error_response( 
        
                   message="The model does not support Classification API" 
        
               ) 
        
               return JSONResponse( 
        
                   content=error_response.model_dump(), 
        
                   status_code=error_response.error.code, 
        
               )

vllm/entrypoints/openai/models/serving.py

andyxning · 2026-03-09T02:31:41Z

vllm/entrypoints/openai/models/serving.py

-            try:
-                await self.engine_client.add_lora(lora_request)
-            except Exception as e:
-                error_type = "BadRequestError"


Normal Exception has changed from BadRequestError to InternalServerError, i.e., http response status code from 404 to 500

For example, if --enable-lora arg is not set, exception occurred during load lora adapter will be:

{"error":{"message":"Call to add_lora method failed: LoRA is not enabled. Use --enable-lora to enable LoRA.","type":"InternalServerError","param":null,"code":500}}

noooop · 2026-03-09T02:58:00Z

vllm/entrypoints/openai/speech_to_text/protocol.py

+class AudioProcessingError(Exception):
+    """Exception raised when audio processing encounters an error.
+
+    This exception is used to handle various error conditions that may occur


I now think it's better to put exceptions in vllm/exceptions.py rather than keeping them separate.

This code has been deleted.

andyxning · 2026-03-09T03:26:31Z

Can create_error_response here be replaced with NotImplementedError? If yes, please do it.

vllm/vllm/entrypoints/pooling/classify/api_router.py

Lines 32 to 39 in c012a8c

if handler is None:

error_response = create_error_response(

message="The model does not support Classification API"

)

return JSONResponse(

content=error_response.model_dump(),

status_code=error_response.error.code,

)

@noooop I have refactored all the occurances about handler is None in the api from message to NotImplementedError exception. Thus, the response will be refactored from

{
    "error": {
        "message": "The model does not support Chat Completions API",
        "type": "BadRequestError",
        "param": null,
        "code": 400
    }
}

to

{
    "error": {
        "message": "The model does not support Chat Completions API",
        "type": "NotImplementedError",
        "param": null,
        "code": 501
    }
}

noooop · 2026-03-09T03:31:20Z

Can create_error_response here be replaced with NotImplementedError? If yes, please do it.

vllm/vllm/entrypoints/pooling/classify/api_router.py

Lines 32 to 39 in c012a8c

if handler is None:

error_response = create_error_response(

message="The model does not support Classification API"

)

return JSONResponse(

content=error_response.model_dump(),

status_code=error_response.error.code,

)

@noooop I have refactored all the occurances about handler is None in the api from message to NotImplementedError exception. Thus, the response will be refactored from
{
    "error": {
        "message": "The model does not support Chat Completions API",
        "type": "BadRequestError",
        "param": null,
        "code": 400
    }
}
to
{
    "error": {
        "message": "The model does not support Chat Completions API",
        "type": "NotImplementedError",
        "param": null,
        "code": 501
    }
}

I think at least for pooling entrypoints, this change is acceptable. Because now the corresponding pooling entrypoints will only be mounted on entrypoints that support pooling tasks, I think this exception will hardly ever be triggered.

vllm/vllm/entrypoints/pooling/__init__.py

Lines 21 to 47 in bd2659a

    
           def register_pooling_api_routers( 
        
               app: FastAPI, supported_tasks: tuple["SupportedTask", ...] 
        
           ): 
        
               from vllm.entrypoints.pooling.pooling.api_router import router as pooling_router 
        
               app.include_router(pooling_router) 
        
               if "classify" in supported_tasks: 
        
                   from vllm.entrypoints.pooling.classify.api_router import ( 
        
                       router as classify_router, 
        
                   ) 
        
                   app.include_router(classify_router) 
        
               if "embed" in supported_tasks: 
        
                   from vllm.entrypoints.pooling.embed.api_router import router as embed_router 
        
                   app.include_router(embed_router) 
        
               # Score/rerank endpoints are available for: 
        
               # - "score" task (cross-encoder models) 
        
               # - "embed" task (bi-encoder models) 
        
               # - "token_embed" task (late interaction models like ColBERT) 
        
               if any(t in supported_tasks for t in ("score", "embed", "token_embed")): 
        
                   from vllm.entrypoints.pooling.score.api_router import router as score_router 
        
                   app.include_router(score_router)

mergify · 2026-03-09T03:34:08Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @andyxning.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

andyxning · 2026-03-10T02:12:39Z

/cc @DarkLight1337 @noooop

vllm/entrypoints/pooling/classify/api_router.py

vllm/lora/exceptions.py

Signed-off-by: Andy Xie <andy.xning@gmail.com>

andyxning · 2026-03-11T02:21:26Z

@DarkLight1337 Seems the ci has nothing to do with this PR. can you help force merge this PR.

…t#36201) Signed-off-by: Andy Xie <andy.xning@gmail.com>

andyxning requested review from DarkLight1337, aarnphm, chaunceyjiang, njhill and russellb as code owners March 6, 2026 02:55

mergify bot added the frontend label Mar 6, 2026

gemini-code-assist bot reviewed Mar 6, 2026

View reviewed changes

andyxning force-pushed the log_http_exception_in_handler branch from d4d8bf8 to 9c1bfd8 Compare March 6, 2026 03:59

andyxning requested a review from jeejeelee as a code owner March 6, 2026 03:59

andyxning force-pushed the log_http_exception_in_handler branch from 9c1bfd8 to 9c412c1 Compare March 6, 2026 04:12

andyxning requested a review from NickLucche as a code owner March 6, 2026 04:12

DarkLight1337 reviewed Mar 6, 2026

View reviewed changes

vllm/entrypoints/openai/models/serving.py Outdated Show resolved Hide resolved

andyxning commented Mar 9, 2026

View reviewed changes

andyxning force-pushed the log_http_exception_in_handler branch 2 times, most recently from 8da82a3 to e943ed1 Compare March 9, 2026 02:57

noooop reviewed Mar 9, 2026

View reviewed changes

andyxning force-pushed the log_http_exception_in_handler branch from e943ed1 to b17623b Compare March 9, 2026 03:26

andyxning requested a review from mgoin as a code owner March 9, 2026 03:26

mergify bot added the needs-rebase label Mar 9, 2026

andyxning force-pushed the log_http_exception_in_handler branch from b17623b to f1c17b0 Compare March 9, 2026 03:35

mergify bot removed the needs-rebase label Mar 9, 2026

andyxning force-pushed the log_http_exception_in_handler branch from f1c17b0 to f085cfe Compare March 9, 2026 03:47

noooop reviewed Mar 10, 2026

View reviewed changes

vllm/entrypoints/pooling/classify/api_router.py Outdated Show resolved Hide resolved

andyxning force-pushed the log_http_exception_in_handler branch 2 times, most recently from 4e44d3c to 6bbf10d Compare March 10, 2026 03:25

noooop reviewed Mar 10, 2026

View reviewed changes

vllm/lora/exceptions.py Outdated Show resolved Hide resolved

andyxning force-pushed the log_http_exception_in_handler branch 2 times, most recently from 99c8d63 to 5a44de0 Compare March 10, 2026 03:55

DarkLight1337 approved these changes Mar 10, 2026

View reviewed changes

DarkLight1337 enabled auto-merge (squash) March 10, 2026 03:58

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 10, 2026

auto-merge was automatically disabled March 10, 2026 04:02
Head branch was pushed to by a user without write access

andyxning force-pushed the log_http_exception_in_handler branch from 5a44de0 to 203ed93 Compare March 10, 2026 04:02

DarkLight1337 enabled auto-merge (squash) March 10, 2026 04:08

auto-merge was automatically disabled March 10, 2026 09:34
Head branch was pushed to by a user without write access

andyxning force-pushed the log_http_exception_in_handler branch 2 times, most recently from d76e966 to a298288 Compare March 10, 2026 09:50

andyxning requested a review from robertgshaw2-redhat as a code owner March 10, 2026 09:50

noooop enabled auto-merge (squash) March 10, 2026 10:45

[openapi server] log exception in exception handler

25bdf2e

Signed-off-by: Andy Xie <andy.xning@gmail.com>

auto-merge was automatically disabled March 10, 2026 11:46
Head branch was pushed to by a user without write access

andyxning force-pushed the log_http_exception_in_handler branch from a298288 to 25bdf2e Compare March 10, 2026 11:46

vllm-bot merged commit fe714dd into vllm-project:main Mar 11, 2026
59 of 61 checks passed

andyxning deleted the log_http_exception_in_handler branch March 11, 2026 03:27

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[openapi server] log exception in exception handler(2/N) (vllm-projec…

83fa209

…t#36201) Signed-off-by: Andy Xie <andy.xning@gmail.com>

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[openapi server] log exception in exception handler(2/N) (vllm-projec…

1ba853a

…t#36201) Signed-off-by: Andy Xie <andy.xning@gmail.com>

Uh oh!

Conversation

andyxning commented Mar 6, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

andyxning Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

noooop commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

andyxning Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

andyxning Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

noooop Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

andyxning Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

andyxning commented Mar 9, 2026

Uh oh!

noooop commented Mar 9, 2026

Uh oh!

mergify bot commented Mar 9, 2026

Uh oh!

andyxning commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

andyxning commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andyxning commented Mar 6, 2026 •

edited by github-actions bot

Loading

noooop commented Mar 6, 2026 •

edited

Loading