-
Notifications
You must be signed in to change notification settings - Fork 652
feat: Trtllm health check payload use bos_token_id #3145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds BOS token ID extraction from the tokenizer for TRT-LLM health checks. Updates the health-check payload to use the derived BOS ID instead of a hardcoded value and passes the tokenizer when constructing the payload in main initialization. Introduces logging around BOS retrieval and fallback. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Operator
participant Main as trtllm/main.py
participant Tokenizer as Tokenizer
participant HC as TrtllmHealthCheckPayload
participant Logger as Logger
Operator->>Main: Initialize TRT-LLM
Main->>Tokenizer: Create tokenizer(model)
Main->>HC: new(tokenizer=Tokenizer)
activate HC
HC->>HC: _get_bos_token_id_from_tokenizer(tokenizer)
alt BOS ID available
HC->>Logger: debug("Using BOS token id: <id>")
else Failure/missing
HC->>Logger: debug("Falling back to BOS id: 1")
end
HC->>Main: to_dict() with token_ids: [bos_id]
deactivate HC
Main->>Operator: Proceed with health check using payload
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
components/backends/trtllm/src/dynamo/trtllm/health_check.py (2)
17-49: Harden BOS-id extraction: avoid getattr (ruff B009), narrow exceptions (BLE001), validate non-negative IDs, and warn on fallbackCurrent code uses getattr and a broad
except Exception. Prefer direct attribute access with targeted exceptions, ensure IDs are non-negative, and log fallback at warning level so misconfigurations surface.Apply:
def _get_bos_token_id_from_tokenizer(tokenizer) -> int: - if tokenizer is None: - return 1 - - try: - if hasattr(tokenizer, "tokenizer"): - inner_tokenizer = getattr(tokenizer, "tokenizer") - bos_token_id = getattr(inner_tokenizer, "bos_token_id", None) - if bos_token_id is not None: - logger.info( - f"Using model's BOS token ID for health check: {bos_token_id}" - ) - return int(bos_token_id) - except Exception as e: - logger.debug(f"Failed to get BOS token from tokenizer: {e}") - - logger.debug("Using default BOS token ID (1) for health check") - return 1 + if tokenizer is None: + logger.warning("Tokenizer is None; using default BOS token ID (1) for health check") + return 1 + + # Prefer direct attribute access with targeted exceptions + try: + inner_tokenizer = tokenizer.tokenizer # may raise AttributeError + bos = inner_tokenizer.bos_token_id # may raise AttributeError + except AttributeError: + bos = None + + if isinstance(bos, int) and bos >= 0: + logger.info("Using model's BOS token ID for health check: %d", bos) + return bos + + logger.warning("Using default BOS token ID (1) for health check") + return 1
40-43: Use parameterized logging instead of f-stringsSwitch to logger formatting args to avoid unnecessary string interpolation on disabled levels.
Example already reflected in the diff above:
-logger.info( - f"Using model's BOS token ID for health check: {bos_token_id}" -) +logger.info("Using model's BOS token ID for health check: %d", bos)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/backends/trtllm/src/dynamo/trtllm/health_check.py(2 hunks)components/backends/trtllm/src/dynamo/trtllm/main.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
components/backends/trtllm/src/dynamo/trtllm/main.py (2)
components/backends/trtllm/src/dynamo/trtllm/health_check.py (1)
TrtllmHealthCheckPayload(51-91)lib/bindings/python/src/dynamo/health_check.py (1)
to_dict(86-96)
components/backends/trtllm/src/dynamo/trtllm/health_check.py (2)
lib/bindings/python/src/dynamo/health_check.py (1)
HealthCheckPayload(61-96)lib/llm/src/tokenizers.rs (1)
tokenizer(340-342)
🪛 Ruff (0.12.2)
components/backends/trtllm/src/dynamo/trtllm/health_check.py
37-37: Do not call getattr with a constant attribute value. It is not any safer than normal property access.
Replace getattr with attribute access
(B009)
44-44: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - sglang
- GitHub Check: Build and Test - vllm
- GitHub Check: Build and Test - trtllm
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
components/backends/trtllm/src/dynamo/trtllm/health_check.py (2)
71-71: Good: payload now uses BOS-derived token_idThis aligns the health check with model-specific tokenization.
58-67: Constructor order LGTM — add unit tests for tokenizer behaviorDefault payload set before super().init; optional tokenizer keeps API backward-compatible.
- Add tests: tokenizer=None → bos_token_id falls back to 1.
- Add tests: tokenizer with bos_token_id >= 0 → payload uses that ID.
File: components/backends/trtllm/src/dynamo/trtllm/health_check.py
components/backends/trtllm/src/dynamo/trtllm/main.py (1)
321-321: Passing tokenizer into health-check payload is correct and low-risk
tokenizeris initialized earlier; helper handlesNone. Change is localized.
Signed-off-by: [email protected] <[email protected]> Signed-off-by: Jason Zhou <[email protected]>
Signed-off-by: [email protected] <[email protected]> Signed-off-by: Jason Zhou <[email protected]>
Signed-off-by: [email protected] <[email protected]> Signed-off-by: Kyle H <[email protected]>
Overview:
Trtllm health check payload use bos_token_id
Details:
Get bos_token_id from tokenizer.
Default token id 1 isn’t model‑agnostic. Use the engine/tokenizer BOS id at call‑site to prevent OOV or unintended tokens.
Where should the reviewer start?
components/backends/trtllm/src/dynamo/trtllm/health_check.py: get bos_token_id
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
DIS-649
Summary by CodeRabbit