fix(sglang): use external_trace_header API for distributed tracing#5346
fix(sglang): use external_trace_header API for distributed tracing#5346ishandhanani merged 2 commits intomainfrom
Conversation
Replace internal SGLang trace propagation with the public external_trace_header parameter. This simplifies trace context propagation by: - Using SGLang's official external_trace_header API parameter - Removing dependency on internal sglang.srt.tracing module - Eliminating base64 encoding/decoding complexity - Aligning with W3C trace context standard (traceparent header) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
WalkthroughRefactors trace propagation across sglang request handlers by replacing a complex base64-encoded JSON approach with a simpler external_trace_header parameter mechanism. Removes Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (1)
122-124: Consider extracting trace header computation to reduce duplication.The trace header logic is duplicated in both branches. Since
trace_headerdoesn't depend onserving_mode, it could be computed once before the conditional.♻️ Suggested refactor
) + trace_header = self._get_trace_header(context) if self.enable_trace else None + if self.serving_mode == DisaggregationMode.DECODE: # Check if bootstrap_info is pre-computed in the request (from frontend) bootstrap_info = request.get("bootstrap_info") @@ -119,10 +121,6 @@ f"room={bootstrap_info['bootstrap_room']}" ) - trace_header = ( - self._get_trace_header(context) if self.enable_trace else None - ) - decode = await self.engine.async_generate( **input_param, sampling_params=sampling_params, @@ -137,10 +135,6 @@ else: async for out in self._process_text_stream(decode, context): yield out else: - trace_header = ( - self._get_trace_header(context) if self.enable_trace else None - ) - agg = await self.engine.async_generate(Also applies to: 144-146
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
components/src/dynamo/sglang/request_handlers/handler_base.pycomponents/src/dynamo/sglang/request_handlers/llm/decode_handler.pycomponents/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-19T07:32:44.210Z
Learnt from: ishandhanani
Repo: ai-dynamo/dynamo PR: 0
File: :0-0
Timestamp: 2025-09-19T07:32:44.210Z
Learning: The skip_tokenizer_init=True path in SGLang backend bypasses tokenization but has array slicing overhead in _process_token_stream that creates O(n) memory copying on every stream chunk, potentially causing quadratic behavior for long sequences.
Applied to files:
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
🧬 Code graph analysis (2)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (1)
components/src/dynamo/sglang/request_handlers/handler_base.py (1)
_get_trace_header(143-156)
components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py (1)
components/src/dynamo/sglang/request_handlers/handler_base.py (1)
_get_trace_header(143-156)
🔇 Additional comments (3)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (1)
126-135: LGTM!The
external_trace_headerparameter is correctly passed toasync_generate()in both disaggregated and aggregated modes, properly propagating trace context.Also applies to: 148-154
components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py (1)
116-127: LGTM!The trace header extraction and propagation via
external_trace_headeris cleanly integrated. The implementation is consistent with the decode handler and properly conditional onenable_trace.components/src/dynamo/sglang/request_handlers/handler_base.py (1)
143-156: Clean implementation of W3C traceparent header generation.The method correctly constructs the W3C Trace Context
traceparentheader format (00-trace_id-parent_id-trace_flags), which is the exact format SGLang'sexternal_trace_headerparameter expects. The hardcoded flags value01indicates sampled=true, appropriate for traces being actively propagated.
When spawn_prefill_task uses tokio::spawn, the spawned task loses the current span context. This causes distributed tracing context to be lost, preventing trace headers from being properly propagated. Changes: - Add tracing::Instrument import - Capture current span and use .instrument(span) to propagate trace context to the spawned prefill task - Add prefill_routing span to track prefill routing timing - Add kv_find_best_match span to track KV worker selection time Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Session WorklogContextUser had an existing PR #5122 ( AnalysisCompared branches and found:
Implementation
Performance VerificationConfirmed
Files Changed
Commits
|
|
/ok to test 0198fbd |
Summary
trace_set_remote_propagate_context) with the publicexternal_trace_headerparametersglang.srt.tracingmoduleBackground
This PR extracts the still-needed portions from #5122 after #5283 merged the Rust TCP tracing support. The Python SGLang changes use SGLang's official
external_trace_headerAPI instead of the internal tracing module, which is cleaner and more maintainable.Changes
handler_base.py: Replace_propagate_trace_context_to_sglang()with_get_trace_header()decode_handler.py: Passexternal_trace_headertoasync_generate()callsprefill_handler.py: Passexternal_trace_headertoasync_generate()calls