-
Notifications
You must be signed in to change notification settings - Fork 638
feat: catch Trtllm engine exceptions #3544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances error handling in the TensorRT-LLM request handlers to properly distinguish between different types of errors and respond appropriately, including graceful shutdowns for fatal engine errors.
- Adds exception handling to differentiate between per-request errors (RequestError), fatal engine errors (generic exceptions), and client cancellations (CancelledError)
- Implements graceful shutdown mechanism for fatal errors while maintaining service availability for per-request errors
- Creates comprehensive test coverage for all three error scenarios using mocking to avoid heavy dependencies
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
components/src/dynamo/trtllm/request_handlers/handler_base.py | Enhanced error handling with specific exception handling and graceful shutdown mechanism |
components/src/dynamo/trtllm/main.py | Updated handler configuration to pass runtime reference for shutdown capability |
components/src/dynamo/trtllm/test_handler_base.py | Added comprehensive test suite covering all error handling scenarios |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: [email protected] <[email protected]>
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughPropagates a DistributedRuntime through main to request handling, adds graceful shutdown and robust error handling in HandlerBase.generate_locally, and introduces tests validating RequestError, RuntimeError, and CancelledError behaviors, including runtime shutdown, engine cleanup, and process exit on fatal errors. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Handler as HandlerBase
participant Engine
participant Runtime as DistributedRuntime
participant OS as Process
Note over Handler: Normal generation flow
Client->>Handler: generate_locally(ctx)
Handler->>Engine: generate(...)
loop Streaming
Engine-->>Handler: outputs (tokens, status)
Handler-->>Client: delta tokens / status
end
opt Finish without reason
Handler-->>Client: final chunk (finish_reason="unknown")
end
sequenceDiagram
autonumber
participant Client
participant Handler as HandlerBase
participant Engine
participant Runtime as DistributedRuntime
participant OS as Process
Note over Handler: Error handling and shutdown
Client->>Handler: generate_locally(ctx)
Handler->>Engine: generate(...)
alt RequestError
Engine-->>Handler: raise RequestError
Handler-->>Client: final error payload (no shutdown)
else CancelledError
Engine-->>Handler: raise CancelledError
Handler-->>Client: stop silently (no error, no shutdown)
else Other Exception
Engine-->>Handler: raise Exception
Handler-->>Client: error payload
Handler->>Runtime: shutdown()
Handler->>Engine: cleanup()
Handler->>OS: os._exit(1)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
components/src/dynamo/trtllm/main.py
(2 hunks)components/src/dynamo/trtllm/request_handlers/handler_base.py
(5 hunks)components/src/dynamo/trtllm/test_handler_base.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
components/src/dynamo/trtllm/request_handlers/handler_base.py (5)
components/src/dynamo/trtllm/test_handler_base.py (1)
RequestError
(32-35)components/src/dynamo/trtllm/engine.py (2)
llm
(36-39)cleanup
(26-33)lib/bindings/python/src/dynamo/_core.pyi (1)
DistributedRuntime
(34-64)components/src/dynamo/trtllm/multimodal_processor.py (2)
get_stop_response
(245-259)create_response_chunk
(210-243)components/src/dynamo/trtllm/utils/disagg_utils.py (2)
DisaggregatedParamsCodec
(21-64)encode
(47-64)
components/src/dynamo/trtllm/test_handler_base.py (1)
components/src/dynamo/trtllm/request_handlers/handler_base.py (4)
HandlerBase
(84-367)RequestHandlerConfig
(61-81)DisaggregationStrategy
(55-57)generate_locally
(176-367)
🪛 GitHub Actions: Copyright Checks
components/src/dynamo/trtllm/test_handler_base.py
[error] 1-1: Copyright header check failed. Missing or invalid header detected by copyright-check.ps1. Ensure file has SPDX header per policy. (Command: pwsh /workspace/.github/workflows/copyright-check.ps1)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3544/merge) by tzulingk.
components/src/dynamo/trtllm/main.py
[error] 1-1: Black: reformatted main.py. Run 'black' to reformat and commit changes.
components/src/dynamo/trtllm/request_handlers/handler_base.py
[error] 363-363: Ruff: Do not use bare 'except' (E722).
components/src/dynamo/trtllm/test_handler_base.py
[error] 1-1: check-shebang-scripts-are-executable: has a shebang but is not marked executable. Run 'chmod +x components/src/dynamo/trtllm/test_handler_base.py'.
[error] 73-73: Ruff: Module level import not at top of file (E402).
[error] 73-73: Ruff: Do not use bare 'except' (E722).
[error] 1-1: Isort: files were modified by this hook.
[error] 1-1: Pre-commit: check-shebang-scripts-are-executable failed. Some scripts need to be made executable.
🪛 Ruff (0.13.3)
components/src/dynamo/trtllm/request_handlers/handler_base.py
170-170: Do not catch blind exception: Exception
(BLE001)
171-171: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
363-363: Do not use bare except
(E722)
363-364: try
-except
-pass
detected, consider logging the exception
(S110)
components/src/dynamo/trtllm/test_handler_base.py
1-1: Shebang is present but file is not executable
(EXE001)
122-122: Unused function argument: self
(ARG001)
164-164: Unused lambda argument: args
(ARG005)
164-164: Unused lambda argument: kwargs
(ARG005)
197-197: Unused lambda argument: args
(ARG005)
197-197: Unused lambda argument: kwargs
(ARG005)
231-231: Unused lambda argument: args
(ARG005)
231-231: Unused lambda argument: kwargs
(ARG005)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: trtllm (amd64)
- GitHub Check: sglang
- GitHub Check: Build and Test - dynamo
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds runtime-aware shutdown handling to HandlerBase, updates RequestHandlerConfig to accept a runtime, wires the runtime from main, and introduces tests validating behavior for RequestError, generic exceptions, and cancellation. The generation loop now includes expanded finish/stop handling, error emission guards, and normalization for missing finish reasons. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant H as HandlerBase
participant E as TRT-LLM Engine
participant R as DistributedRuntime
C->>H: Request (start generation)
H->>E: generate() iterator
E-->>H: first chunk / tokens
alt Normal flow
H-->>C: deltas / finish with reason
else RequestError (handled)
E-->>H: raises RequestError
H-->>C: error finish_reason (service stays up)
else CancelledError (client cancel)
E-->>H: raises asyncio.CancelledError
H-->>C: cancel acknowledged (no shutdown)
else Unexpected Exception
E-->>H: raises RuntimeError/Exception
H-->>C: error chunk (service restarting)
note right of H: Initiate graceful shutdown
H->>R: runtime.shutdown() (if available)
H->>E: engine cleanup
H->>H: os._exit(1)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
components/src/dynamo/trtllm/main.py
(2 hunks)components/src/dynamo/trtllm/request_handlers/handler_base.py
(5 hunks)components/src/dynamo/trtllm/test_handler_base.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
components/src/dynamo/trtllm/request_handlers/handler_base.py (5)
components/src/dynamo/trtllm/test_handler_base.py (1)
RequestError
(35-38)components/src/dynamo/trtllm/engine.py (2)
llm
(36-39)cleanup
(26-33)lib/bindings/python/src/dynamo/_core.pyi (1)
DistributedRuntime
(34-64)components/src/dynamo/trtllm/multimodal_processor.py (2)
get_stop_response
(245-259)create_response_chunk
(210-243)components/src/dynamo/trtllm/utils/disagg_utils.py (2)
DisaggregatedParamsCodec
(21-64)encode
(47-64)
components/src/dynamo/trtllm/test_handler_base.py (2)
components/src/dynamo/trtllm/request_handlers/handler_base.py (4)
HandlerBase
(84-367)RequestHandlerConfig
(61-81)DisaggregationStrategy
(55-57)generate_locally
(176-367)components/src/dynamo/trtllm/engine.py (1)
llm
(36-39)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3544/merge) by tzulingk.
components/src/dynamo/trtllm/request_handlers/handler_base.py
[error] 363-363: ruff: Do not use bare 'except' (E722).
components/src/dynamo/trtllm/main.py
[error] 1-1: Black formatting check failed. File reformatted; run 'black' to apply formatting changes.
components/src/dynamo/trtllm/test_handler_base.py
[error] 1-1: Check-shebang-scripts-are-executable: has a shebang but is not marked executable! If it is supposed to be executable, run 'chmod +x components/src/dynamo/trtllm/test_handler_base.py'.
[error] 76-76: E402: Module level import not at top of file
[error] 78-78: E402: Module level import not at top of file
🪛 Ruff (0.13.3)
components/src/dynamo/trtllm/request_handlers/handler_base.py
170-170: Do not catch blind exception: Exception
(BLE001)
171-171: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
363-363: Do not use bare except
(E722)
363-364: try
-except
-pass
detected, consider logging the exception
(S110)
components/src/dynamo/trtllm/test_handler_base.py
1-1: Shebang is present but file is not executable
(EXE001)
125-125: Unused function argument: self
(ARG001)
167-167: Unused lambda argument: args
(ARG005)
167-167: Unused lambda argument: kwargs
(ARG005)
200-200: Unused lambda argument: args
(ARG005)
200-200: Unused lambda argument: kwargs
(ARG005)
234-234: Unused lambda argument: args
(ARG005)
234-234: Unused lambda argument: kwargs
(ARG005)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: vllm (amd64)
- GitHub Check: sglang
- GitHub Check: Build and Test - dynamo
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Overview:
This PR adds proper exception handling for TensorRT-LLM engine errors in the request handlers, ensuring the service responds appropriately to different error types.
Details:
Catch trtllm engine exceptions
HandlerBase.generate_locally()
to distinguish between:RequestError
): Send error response to client, keep service runningException
): Send error response, then trigger graceful shutdownCancelledError
): Handle gracefully without error responseAdd test cases
test_handler_base.py
with comprehensive tests for all three error scenariosHandlerBase
logicRequestError
does NOT trigger shutdown (service remains available)CancelledError
is handled silently without shutdownTest Implementation Note:
The test file (test_handler_base.py) uses a pre-import mocking strategy to avoid heavy dependencies like PyTorch and TensorRT-LLM. Before importing handler_base, we inject MagicMock objects into sys.modules for all heavy dependencies. This allows us to test the actual HandlerBase error handling logic without requiring a full TensorRT-LLM installation. The imports are deliberately placed after the mocking setup (with # noqa: E402 to suppress linter warnings) because Python would otherwise attempt to import the real, unavailable packages when handler_base.py is loaded. This approach ensures the tests can run in any CI/CD environment without GPU or TensorRT dependencies while still validating the critical error handling paths.
Where should the reviewer start?
components/src/dynamo/trtllm/request_handlers/handler_base.py
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
DIS-703
Summary by CodeRabbit
New Features
Bug Fixes
Tests