Skip to content

Conversation

tzulingk
Copy link
Contributor

@tzulingk tzulingk commented Oct 10, 2025

Overview:

This PR adds proper exception handling for TensorRT-LLM engine errors in the request handlers, ensuring the service responds appropriately to different error types.

Details:

Catch trtllm engine exceptions

  • Enhanced error handling in HandlerBase.generate_locally() to distinguish between:
    • Per-request errors (RequestError): Send error response to client, keep service running
    • Fatal engine errors (Exception): Send error response, then trigger graceful shutdown
    • Client cancellations (CancelledError): Handle gracefully without error response

Add test cases

  • Created test_handler_base.py with comprehensive tests for all three error scenarios
  • Tests use mocking to avoid heavy dependencies while testing real HandlerBase logic
  • Structured as pytest test class for automatic test discovery
  • Verifies that:
    • RequestError does NOT trigger shutdown (service remains available)
    • Generic exceptions DO trigger graceful shutdown with cleanup
    • CancelledError is handled silently without shutdown

Test Implementation Note:
The test file (test_handler_base.py) uses a pre-import mocking strategy to avoid heavy dependencies like PyTorch and TensorRT-LLM. Before importing handler_base, we inject MagicMock objects into sys.modules for all heavy dependencies. This allows us to test the actual HandlerBase error handling logic without requiring a full TensorRT-LLM installation. The imports are deliberately placed after the mocking setup (with # noqa: E402 to suppress linter warnings) because Python would otherwise attempt to import the real, unavailable packages when handler_base.py is loaded. This approach ensures the tests can run in any CI/CD environment without GPU or TensorRT dependencies while still validating the critical error handling paths.

Where should the reviewer start?

components/src/dynamo/trtllm/request_handlers/handler_base.py

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-703

Summary by CodeRabbit

  • New Features

    • Graceful, coordinated shutdown/restart on unexpected errors to improve reliability.
    • Enhanced token streaming with clearer stop/finish handling, including PREFILL mode and multimodal completions.
  • Bug Fixes

    • Client cancellations no longer produce extraneous errors or trigger shutdown.
    • Per-request errors are returned to clients without affecting service availability.
    • Guarded against empty responses; an error is emitted when no output is produced.
    • Missing finish reasons are normalized to “unknown” with warnings.
  • Tests

    • Added comprehensive tests for error handling and shutdown behavior.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances error handling in the TensorRT-LLM request handlers to properly distinguish between different types of errors and respond appropriately, including graceful shutdowns for fatal engine errors.

  • Adds exception handling to differentiate between per-request errors (RequestError), fatal engine errors (generic exceptions), and client cancellations (CancelledError)
  • Implements graceful shutdown mechanism for fatal errors while maintaining service availability for per-request errors
  • Creates comprehensive test coverage for all three error scenarios using mocking to avoid heavy dependencies

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
components/src/dynamo/trtllm/request_handlers/handler_base.py Enhanced error handling with specific exception handling and graceful shutdown mechanism
components/src/dynamo/trtllm/main.py Updated handler configuration to pass runtime reference for shutdown capability
components/src/dynamo/trtllm/test_handler_base.py Added comprehensive test suite covering all error handling scenarios

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

coderabbitai bot commented Oct 10, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Propagates a DistributedRuntime through main to request handling, adds graceful shutdown and robust error handling in HandlerBase.generate_locally, and introduces tests validating RequestError, RuntimeError, and CancelledError behaviors, including runtime shutdown, engine cleanup, and process exit on fatal errors.

Changes

Cohort / File(s) Summary
Runtime propagation in main
components/src/dynamo/trtllm/main.py
Passes runtime into constructor and RequestHandlerConfig, enabling runtime-aware shutdown.
HandlerBase error handling and shutdown
components/src/dynamo/trtllm/request_handlers/handler_base.py
Adds runtime to RequestHandlerConfig; wraps generate path in try/except; handles CancelledError, RequestError, and generic exceptions; yields error payloads; introduces _initiate_shutdown to orderly stop runtime, cleanup engine, and exit.
Tests for HandlerBase behaviors
components/src/dynamo/trtllm/test_handler_base.py
New tests with heavy dependency mocking to verify: no-shutdown on RequestError/CancelledError, shutdown+cleanup+exit on fatal errors, and emission ordering of initial output then error responses.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Handler as HandlerBase
  participant Engine
  participant Runtime as DistributedRuntime
  participant OS as Process

  Note over Handler: Normal generation flow
  Client->>Handler: generate_locally(ctx)
  Handler->>Engine: generate(...)
  loop Streaming
    Engine-->>Handler: outputs (tokens, status)
    Handler-->>Client: delta tokens / status
  end
  opt Finish without reason
    Handler-->>Client: final chunk (finish_reason="unknown")
  end
Loading
sequenceDiagram
  autonumber
  participant Client
  participant Handler as HandlerBase
  participant Engine
  participant Runtime as DistributedRuntime
  participant OS as Process

  Note over Handler: Error handling and shutdown
  Client->>Handler: generate_locally(ctx)
  Handler->>Engine: generate(...)
  alt RequestError
    Engine-->>Handler: raise RequestError
    Handler-->>Client: final error payload (no shutdown)
  else CancelledError
    Engine-->>Handler: raise CancelledError
    Handler-->>Client: stop silently (no error, no shutdown)
  else Other Exception
    Engine-->>Handler: raise Exception
    Handler-->>Client: error payload
    Handler->>Runtime: shutdown()
    Handler->>Engine: cleanup()
    Handler->>OS: os._exit(1)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I twitch my ears at runtime’s call,
Stream tokens, then catch a fall—
If storms arise, I fold the show,
Shut down burrows, out we go.
For soft requests, I calmly stay,
A gentle hop, then back to hay. 🐇✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.75% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly describes the primary change—adding handling for TensorRT-LLM engine exceptions—and uses clear, concise phrasing aligned with the repository’s conventional “feat:” prefix. It accurately reflects the main feature without extraneous details or jargon.
Description Check ✅ Passed The pull request description includes the required sections — Overview, Details, Where should the reviewer start, and Related Issues — matching the repository’s template structure. The content is clear and explains the changes and testing approach thoroughly. However, the Related Issues entry does not use a recognized action keyword or the exact formatting specified in the template.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a2a820 and 4c6f286.

📒 Files selected for processing (3)
  • components/src/dynamo/trtllm/main.py (2 hunks)
  • components/src/dynamo/trtllm/request_handlers/handler_base.py (5 hunks)
  • components/src/dynamo/trtllm/test_handler_base.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
components/src/dynamo/trtllm/request_handlers/handler_base.py (5)
components/src/dynamo/trtllm/test_handler_base.py (1)
  • RequestError (32-35)
components/src/dynamo/trtllm/engine.py (2)
  • llm (36-39)
  • cleanup (26-33)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • DistributedRuntime (34-64)
components/src/dynamo/trtllm/multimodal_processor.py (2)
  • get_stop_response (245-259)
  • create_response_chunk (210-243)
components/src/dynamo/trtllm/utils/disagg_utils.py (2)
  • DisaggregatedParamsCodec (21-64)
  • encode (47-64)
components/src/dynamo/trtllm/test_handler_base.py (1)
components/src/dynamo/trtllm/request_handlers/handler_base.py (4)
  • HandlerBase (84-367)
  • RequestHandlerConfig (61-81)
  • DisaggregationStrategy (55-57)
  • generate_locally (176-367)
🪛 GitHub Actions: Copyright Checks
components/src/dynamo/trtllm/test_handler_base.py

[error] 1-1: Copyright header check failed. Missing or invalid header detected by copyright-check.ps1. Ensure file has SPDX header per policy. (Command: pwsh /workspace/.github/workflows/copyright-check.ps1)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3544/merge) by tzulingk.
components/src/dynamo/trtllm/main.py

[error] 1-1: Black: reformatted main.py. Run 'black' to reformat and commit changes.

components/src/dynamo/trtllm/request_handlers/handler_base.py

[error] 363-363: Ruff: Do not use bare 'except' (E722).

components/src/dynamo/trtllm/test_handler_base.py

[error] 1-1: check-shebang-scripts-are-executable: has a shebang but is not marked executable. Run 'chmod +x components/src/dynamo/trtllm/test_handler_base.py'.


[error] 73-73: Ruff: Module level import not at top of file (E402).


[error] 73-73: Ruff: Do not use bare 'except' (E722).


[error] 1-1: Isort: files were modified by this hook.


[error] 1-1: Pre-commit: check-shebang-scripts-are-executable failed. Some scripts need to be made executable.

🪛 Ruff (0.13.3)
components/src/dynamo/trtllm/request_handlers/handler_base.py

170-170: Do not catch blind exception: Exception

(BLE001)


171-171: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


363-363: Do not use bare except

(E722)


363-364: try-except-pass detected, consider logging the exception

(S110)

components/src/dynamo/trtllm/test_handler_base.py

1-1: Shebang is present but file is not executable

(EXE001)


122-122: Unused function argument: self

(ARG001)


164-164: Unused lambda argument: args

(ARG005)


164-164: Unused lambda argument: kwargs

(ARG005)


197-197: Unused lambda argument: args

(ARG005)


197-197: Unused lambda argument: kwargs

(ARG005)


231-231: Unused lambda argument: args

(ARG005)


231-231: Unused lambda argument: kwargs

(ARG005)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: sglang
  • GitHub Check: Build and Test - dynamo

Copy link
Contributor

coderabbitai bot commented Oct 10, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds runtime-aware shutdown handling to HandlerBase, updates RequestHandlerConfig to accept a runtime, wires the runtime from main, and introduces tests validating behavior for RequestError, generic exceptions, and cancellation. The generation loop now includes expanded finish/stop handling, error emission guards, and normalization for missing finish reasons.

Changes

Cohort / File(s) Summary of Changes
Runtime wiring
components/src/dynamo/trtllm/main.py
Passes runtime into RequestHandlerConfig; minor formatting around kv_cache_config.event_buffer_max_size when metrics/events are enabled.
Graceful shutdown and error handling
components/src/dynamo/trtllm/request_handlers/handler_base.py
Adds runtime to config and stores on HandlerBase; introduces _initiate_shutdown(error) to coordinate runtime shutdown, engine cleanup, and forced exit; augments generate_locally with structured handling for RequestError, generic exceptions, and asyncio.CancelledError; expands token emission logic, finish/stop normalization, and guards for empty outputs.
Tests for handler base
components/src/dynamo/trtllm/test_handler_base.py
New tests mocking dependencies to verify: no-shutdown on RequestError, shutdown and exit on generic error, and no-shutdown on cancellation; provides helper constructors and a mock generator to simulate control-flow paths.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant C as Client
    participant H as HandlerBase
    participant E as TRT-LLM Engine
    participant R as DistributedRuntime

    C->>H: Request (start generation)
    H->>E: generate() iterator
    E-->>H: first chunk / tokens

    alt Normal flow
        H-->>C: deltas / finish with reason
    else RequestError (handled)
        E-->>H: raises RequestError
        H-->>C: error finish_reason (service stays up)
    else CancelledError (client cancel)
        E-->>H: raises asyncio.CancelledError
        H-->>C: cancel acknowledged (no shutdown)
    else Unexpected Exception
        E-->>H: raises RuntimeError/Exception
        H-->>C: error chunk (service restarting)
        note right of H: Initiate graceful shutdown
        H->>R: runtime.shutdown() (if available)
        H->>E: engine cleanup
        H->>H: os._exit(1)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I twitch my ears at tokens’ flow,
When storms arise, I’m first to know—
I thump the ground: “Shut down with grace!”
Then hop back up, a steady pace.
Errors nibbled, queues aligned,
Carrots logged, requests refined. 🥕

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description follows the template’s structure with Overview, Details, and reviewer start sections, but the Related Issues section does not include an action keyword like Closes or Fixes and is missing the bullet syntax and issue link formatting. Please update the Related Issues section to include an action keyword and bullet point, for example "- closes DIS-703", to match the template format.
Docstring Coverage ⚠️ Warning Docstring coverage is 68.75% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly indicates the feature to catch exceptions from the TensorRT-LLM engine, which is the main change in this PR. It uses the project’s conventional commit prefix and remains concise and specific.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a2a820 and 93feac5.

📒 Files selected for processing (3)
  • components/src/dynamo/trtllm/main.py (2 hunks)
  • components/src/dynamo/trtllm/request_handlers/handler_base.py (5 hunks)
  • components/src/dynamo/trtllm/test_handler_base.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
components/src/dynamo/trtllm/request_handlers/handler_base.py (5)
components/src/dynamo/trtllm/test_handler_base.py (1)
  • RequestError (35-38)
components/src/dynamo/trtllm/engine.py (2)
  • llm (36-39)
  • cleanup (26-33)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • DistributedRuntime (34-64)
components/src/dynamo/trtllm/multimodal_processor.py (2)
  • get_stop_response (245-259)
  • create_response_chunk (210-243)
components/src/dynamo/trtllm/utils/disagg_utils.py (2)
  • DisaggregatedParamsCodec (21-64)
  • encode (47-64)
components/src/dynamo/trtllm/test_handler_base.py (2)
components/src/dynamo/trtllm/request_handlers/handler_base.py (4)
  • HandlerBase (84-367)
  • RequestHandlerConfig (61-81)
  • DisaggregationStrategy (55-57)
  • generate_locally (176-367)
components/src/dynamo/trtllm/engine.py (1)
  • llm (36-39)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3544/merge) by tzulingk.
components/src/dynamo/trtllm/request_handlers/handler_base.py

[error] 363-363: ruff: Do not use bare 'except' (E722).

components/src/dynamo/trtllm/main.py

[error] 1-1: Black formatting check failed. File reformatted; run 'black' to apply formatting changes.

components/src/dynamo/trtllm/test_handler_base.py

[error] 1-1: Check-shebang-scripts-are-executable: has a shebang but is not marked executable! If it is supposed to be executable, run 'chmod +x components/src/dynamo/trtllm/test_handler_base.py'.


[error] 76-76: E402: Module level import not at top of file


[error] 78-78: E402: Module level import not at top of file

🪛 Ruff (0.13.3)
components/src/dynamo/trtllm/request_handlers/handler_base.py

170-170: Do not catch blind exception: Exception

(BLE001)


171-171: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


363-363: Do not use bare except

(E722)


363-364: try-except-pass detected, consider logging the exception

(S110)

components/src/dynamo/trtllm/test_handler_base.py

1-1: Shebang is present but file is not executable

(EXE001)


125-125: Unused function argument: self

(ARG001)


167-167: Unused lambda argument: args

(ARG005)


167-167: Unused lambda argument: kwargs

(ARG005)


200-200: Unused lambda argument: args

(ARG005)


200-200: Unused lambda argument: kwargs

(ARG005)


234-234: Unused lambda argument: args

(ARG005)


234-234: Unused lambda argument: kwargs

(ARG005)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: vllm (amd64)
  • GitHub Check: sglang
  • GitHub Check: Build and Test - dynamo

@tzulingk tzulingk enabled auto-merge (squash) October 10, 2025 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant