Skip to content

Conversation

@tzulingk
Copy link
Contributor

@tzulingk tzulingk commented Sep 19, 2025

Overview:

Trtllm health check payload use bos_token_id

Details:

Get bos_token_id from tokenizer.
Default token id 1 isn’t model‑agnostic. Use the engine/tokenizer BOS id at call‑site to prevent OOV or unintended tokens.

Where should the reviewer start?

components/backends/trtllm/src/dynamo/trtllm/health_check.py: get bos_token_id

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-649

Summary by CodeRabbit

  • New Features
    • Health checks now derive the BOS token from the active tokenizer, improving multi-model compatibility.
  • Bug Fixes
    • Replaces hardcoded BOS token ID with model-specific value to prevent false health-check failures on certain models.
  • Chores
    • Added detailed logging around BOS token retrieval to aid diagnostics.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 19, 2025

Walkthrough

Adds BOS token ID extraction from the tokenizer for TRT-LLM health checks. Updates the health-check payload to use the derived BOS ID instead of a hardcoded value and passes the tokenizer when constructing the payload in main initialization. Introduces logging around BOS retrieval and fallback.

Changes

Cohort / File(s) Summary of Changes
Health-check BOS handling
components/backends/trtllm/src/dynamo/trtllm/health_check.py
Added helper _get_bos_token_id_from_tokenizer(tokenizer) -> int. Updated TrtllmHealthCheckPayload.__init__(self, tokenizer=None) to derive bos_token_id and set default_payload.token_ids = [bos_token_id]. Added module-level logger and debug logs. Updated docstrings.
Main health-check construction
components/backends/trtllm/src/dynamo/trtllm/main.py
Now constructs payload with tokenizer: TrtllmHealthCheckPayload(tokenizer=tokenizer).to_dict(). No other control-flow changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Operator
  participant Main as trtllm/main.py
  participant Tokenizer as Tokenizer
  participant HC as TrtllmHealthCheckPayload
  participant Logger as Logger

  Operator->>Main: Initialize TRT-LLM
  Main->>Tokenizer: Create tokenizer(model)
  Main->>HC: new(tokenizer=Tokenizer)
  activate HC
  HC->>HC: _get_bos_token_id_from_tokenizer(tokenizer)
  alt BOS ID available
    HC->>Logger: debug("Using BOS token id: <id>")
  else Failure/missing
    HC->>Logger: debug("Falling back to BOS id: 1")
  end
  HC->>Main: to_dict() with token_ids: [bos_id]
  deactivate HC
  Main->>Operator: Proceed with health check using payload
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • feat: Trtllm canary health check #3082 — Earlier changes to TRT-LLM health-check payload introducing token_ids default; this PR extends it by deriving BOS from the tokenizer and wiring it through main.

Poem

A whisk of ears, I peek and see,
The BOS now fetched quite eagerly.
No longer “1” by fiat decree—
We ask the tokenizer, courteously.
Logs softly thump like paws in glee,
Health checks hop along, error-free. 🐇✨

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat: Trtllm health check payload use bos_token_id" concisely describes the primary change — switching the Trtllm health-check payload to use the tokenizer's BOS token id instead of a hard-coded default. It directly matches the modifications in components/backends/trtllm/src/dynamo/trtllm/health_check.py and the call-site update in main.py, and is clear enough for a teammate scanning history. The phrasing is slightly awkward but not misleading.
Description Check ✅ Passed The PR description follows the repository template by providing Overview, Details, and Where to start, and it references a related issue (DIS-649). The Details clearly explain the change to derive the BOS id from the tokenizer and the reviewer start file is called out, but the Related Issues entry does not use the template's action keyword or a GitHub issue number format (for example "Closes #123") which would improve traceability. Overall the description is mostly complete and actionable for reviewers.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
components/backends/trtllm/src/dynamo/trtllm/health_check.py (2)

17-49: Harden BOS-id extraction: avoid getattr (ruff B009), narrow exceptions (BLE001), validate non-negative IDs, and warn on fallback

Current code uses getattr and a broad except Exception. Prefer direct attribute access with targeted exceptions, ensure IDs are non-negative, and log fallback at warning level so misconfigurations surface.

Apply:

 def _get_bos_token_id_from_tokenizer(tokenizer) -> int:
-    if tokenizer is None:
-        return 1
-
-    try:
-        if hasattr(tokenizer, "tokenizer"):
-            inner_tokenizer = getattr(tokenizer, "tokenizer")
-            bos_token_id = getattr(inner_tokenizer, "bos_token_id", None)
-            if bos_token_id is not None:
-                logger.info(
-                    f"Using model's BOS token ID for health check: {bos_token_id}"
-                )
-                return int(bos_token_id)
-    except Exception as e:
-        logger.debug(f"Failed to get BOS token from tokenizer: {e}")
-
-    logger.debug("Using default BOS token ID (1) for health check")
-    return 1
+    if tokenizer is None:
+        logger.warning("Tokenizer is None; using default BOS token ID (1) for health check")
+        return 1
+
+    # Prefer direct attribute access with targeted exceptions
+    try:
+        inner_tokenizer = tokenizer.tokenizer  # may raise AttributeError
+        bos = inner_tokenizer.bos_token_id     # may raise AttributeError
+    except AttributeError:
+        bos = None
+
+    if isinstance(bos, int) and bos >= 0:
+        logger.info("Using model's BOS token ID for health check: %d", bos)
+        return bos
+
+    logger.warning("Using default BOS token ID (1) for health check")
+    return 1

40-43: Use parameterized logging instead of f-strings

Switch to logger formatting args to avoid unnecessary string interpolation on disabled levels.

Example already reflected in the diff above:

-logger.info(
-    f"Using model's BOS token ID for health check: {bos_token_id}"
-)
+logger.info("Using model's BOS token ID for health check: %d", bos)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5915a69 and 21f0d94.

📒 Files selected for processing (2)
  • components/backends/trtllm/src/dynamo/trtllm/health_check.py (2 hunks)
  • components/backends/trtllm/src/dynamo/trtllm/main.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
components/backends/trtllm/src/dynamo/trtllm/main.py (2)
components/backends/trtllm/src/dynamo/trtllm/health_check.py (1)
  • TrtllmHealthCheckPayload (51-91)
lib/bindings/python/src/dynamo/health_check.py (1)
  • to_dict (86-96)
components/backends/trtllm/src/dynamo/trtllm/health_check.py (2)
lib/bindings/python/src/dynamo/health_check.py (1)
  • HealthCheckPayload (61-96)
lib/llm/src/tokenizers.rs (1)
  • tokenizer (340-342)
🪛 Ruff (0.12.2)
components/backends/trtllm/src/dynamo/trtllm/health_check.py

37-37: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


44-44: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - sglang
  • GitHub Check: Build and Test - vllm
  • GitHub Check: Build and Test - trtllm
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
components/backends/trtllm/src/dynamo/trtllm/health_check.py (2)

71-71: Good: payload now uses BOS-derived token_id

This aligns the health check with model-specific tokenization.


58-67: Constructor order LGTM — add unit tests for tokenizer behavior

Default payload set before super().init; optional tokenizer keeps API backward-compatible.

  • Add tests: tokenizer=None → bos_token_id falls back to 1.
  • Add tests: tokenizer with bos_token_id >= 0 → payload uses that ID.

File: components/backends/trtllm/src/dynamo/trtllm/health_check.py

components/backends/trtllm/src/dynamo/trtllm/main.py (1)

321-321: Passing tokenizer into health-check payload is correct and low-risk

tokenizer is initialized earlier; helper handles None. Change is localized.

@tzulingk tzulingk merged commit 7b43b3a into main Sep 23, 2025
13 checks passed
@tzulingk tzulingk deleted the tzulingk/bos_trtllm branch September 23, 2025 02:54
jasonqinzhou pushed a commit that referenced this pull request Sep 24, 2025
jasonqinzhou pushed a commit that referenced this pull request Sep 24, 2025
kylehh pushed a commit that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants