Skip to content

fix(sglang): use external_trace_header API for distributed tracing#5346

Merged
ishandhanani merged 2 commits intomainfrom
ishan/sglang-external-trace-header
Jan 12, 2026
Merged

fix(sglang): use external_trace_header API for distributed tracing#5346
ishandhanani merged 2 commits intomainfrom
ishan/sglang-external-trace-header

Conversation

@ishandhanani
Copy link
Contributor

@ishandhanani ishandhanani commented Jan 12, 2026

Summary

  • Replace internal SGLang trace propagation (trace_set_remote_propagate_context) with the public external_trace_header parameter
  • Remove dependency on internal sglang.srt.tracing module
  • Simplify trace context by using W3C traceparent header directly

Background

This PR extracts the still-needed portions from #5122 after #5283 merged the Rust TCP tracing support. The Python SGLang changes use SGLang's official external_trace_header API instead of the internal tracing module, which is cleaner and more maintainable.

Changes

  • handler_base.py: Replace _propagate_trace_context_to_sglang() with _get_trace_header()
  • decode_handler.py: Pass external_trace_header to async_generate() calls
  • prefill_handler.py: Pass external_trace_header to async_generate() calls
Screenshot 2026-01-13 at 12 05 52 AM

Replace internal SGLang trace propagation with the public external_trace_header
parameter. This simplifies trace context propagation by:

- Using SGLang's official external_trace_header API parameter
- Removing dependency on internal sglang.srt.tracing module
- Eliminating base64 encoding/decoding complexity
- Aligning with W3C trace context standard (traceparent header)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ishandhanani ishandhanani requested review from a team as code owners January 12, 2026 06:35
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added fix backend::sglang Relates to the sglang backend labels Jan 12, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

Walkthrough

Refactors trace propagation across sglang request handlers by replacing a complex base64-encoded JSON approach with a simpler external_trace_header parameter mechanism. Removes _propagate_trace_context_to_sglang() method and introduces _get_trace_header() returning a dict with traceparent key, eliminating encoding overhead.

Changes

Cohort / File(s) Summary
Core trace handler implementation
components/src/dynamo/sglang/request_handlers/handler_base.py
Removes _propagate_trace_context_to_sglang() method; adds _get_trace_header() returning Optional[Dict[str, str]]. Removes imports for base64, json, and sglang_trace. Simplifies trace context from encoded JSON payload to plain traceparent dict.
Request handler integrations
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py, components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
Updates trace handling to call _get_trace_header(context) and pass result via new external_trace_header parameter to async_generate(). Removes direct trace propagation calls. Applied consistently across disaggregated and aggregated modes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Trace paths simplified, no more encoding schemes,
Headers flow directly through our generation dreams,
Base64 gone, complexity shed,
A cleaner hop forward instead! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: replacing internal SGLang trace propagation with the public external_trace_header API for distributed tracing.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The pull request description includes all required sections: Overview (Summary), Details (Changes and Background), and Related Issues context.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (1)

122-124: Consider extracting trace header computation to reduce duplication.

The trace header logic is duplicated in both branches. Since trace_header doesn't depend on serving_mode, it could be computed once before the conditional.

♻️ Suggested refactor
         )

+        trace_header = self._get_trace_header(context) if self.enable_trace else None
+
         if self.serving_mode == DisaggregationMode.DECODE:
             # Check if bootstrap_info is pre-computed in the request (from frontend)
             bootstrap_info = request.get("bootstrap_info")
@@ -119,10 +121,6 @@
                 f"room={bootstrap_info['bootstrap_room']}"
             )

-            trace_header = (
-                self._get_trace_header(context) if self.enable_trace else None
-            )
-
             decode = await self.engine.async_generate(
                 **input_param,
                 sampling_params=sampling_params,
@@ -137,10 +135,6 @@
             else:
                 async for out in self._process_text_stream(decode, context):
                     yield out
         else:
-            trace_header = (
-                self._get_trace_header(context) if self.enable_trace else None
-            )
-
             agg = await self.engine.async_generate(

Also applies to: 144-146

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c94d097 and ab315ee.

📒 Files selected for processing (3)
  • components/src/dynamo/sglang/request_handlers/handler_base.py
  • components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
  • components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-19T07:32:44.210Z
Learnt from: ishandhanani
Repo: ai-dynamo/dynamo PR: 0
File: :0-0
Timestamp: 2025-09-19T07:32:44.210Z
Learning: The skip_tokenizer_init=True path in SGLang backend bypasses tokenization but has array slicing overhead in _process_token_stream that creates O(n) memory copying on every stream chunk, potentially causing quadratic behavior for long sequences.

Applied to files:

  • components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
🧬 Code graph analysis (2)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (1)
components/src/dynamo/sglang/request_handlers/handler_base.py (1)
  • _get_trace_header (143-156)
components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py (1)
components/src/dynamo/sglang/request_handlers/handler_base.py (1)
  • _get_trace_header (143-156)
🔇 Additional comments (3)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (1)

126-135: LGTM!

The external_trace_header parameter is correctly passed to async_generate() in both disaggregated and aggregated modes, properly propagating trace context.

Also applies to: 148-154

components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py (1)

116-127: LGTM!

The trace header extraction and propagation via external_trace_header is cleanly integrated. The implementation is consistent with the decode handler and properly conditional on enable_trace.

components/src/dynamo/sglang/request_handlers/handler_base.py (1)

143-156: Clean implementation of W3C traceparent header generation.

The method correctly constructs the W3C Trace Context traceparent header format (00-trace_id-parent_id-trace_flags), which is the exact format SGLang's external_trace_header parameter expects. The hardcoded flags value 01 indicates sampled=true, appropriate for traces being actively propagated.

When spawn_prefill_task uses tokio::spawn, the spawned task loses the
current span context. This causes distributed tracing context to be
lost, preventing trace headers from being properly propagated.

Changes:
- Add tracing::Instrument import
- Capture current span and use .instrument(span) to propagate trace
  context to the spawned prefill task
- Add prefill_routing span to track prefill routing timing
- Add kv_find_best_match span to track KV worker selection time

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ishandhanani ishandhanani requested a review from a team as a code owner January 12, 2026 06:59
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jan 12, 2026
@github-actions github-actions bot added the router Relates to routing, KV-aware routing, etc. label Jan 12, 2026
@ishandhanani
Copy link
Contributor Author

Session Worklog

Context

User had an existing PR #5122 (ishan/tracingupdate branch) for distributed tracing improvements. A separate PR #5283 was merged that added TCP tracing support with headers in the wire protocol. Task was to identify what portions of #5122 were still needed.

Analysis

Compared branches and found:

  1. Rust changes in fix: distributed tracing propagation for TCP transport and SGLang integration #5122 were NOT needed - PR fix: distributed tracing propagation for TCP transport #5283 already merged better versions:

    • logging.rs: Main has optimized static TRACE_PROPAGATOR, the old branch created new propagators each time
    • tcp_client.rs tests: Main has proper header reading code
  2. Python SGLang changes WERE still needed:

    • handler_base.py: Replace internal trace_set_remote_propagate_context with public external_trace_header API
    • decode_handler.py / prefill_handler.py: Pass external_trace_header to async_generate()
  3. Additional Rust fix discovered:

    • Commit ade6a7e3e on original branch fixed span context loss in spawn_prefill_task
    • When tokio::spawn is used, the spawned task loses current span context
    • Fix: Capture span with tracing::Span::current() and use .instrument(span)

Implementation

  1. Created new branch ishan/sglang-external-trace-header from main
  2. Applied Python changes for external_trace_header API
  3. Cherry-picked Rust tracing fix (resolved merge conflicts manually due to code divergence)
  4. Added prefill_routing and kv_find_best_match spans for visibility
  5. Verified compilation with cargo check -p dynamo-llm
  6. Rebuilt dynamo and tested with disaggregated serving

Performance Verification

Confirmed .instrument() pattern is safe for high concurrency:

  • tracing::Span::current() - thread-local read, O(1)
  • .instrument(span) - zero-cost when tracing disabled
  • Same pattern used in http_endpoint.rs, shared_tcp_endpoint.rs, push_endpoint.rs
  • Spans use static names with no dynamic field allocation

Files Changed

File Lines Change
handler_base.py -20/+8 Simplified trace header generation
decode_handler.py +4/-3 Use external_trace_header param
prefill_handler.py +2/-3 Use external_trace_header param
prefill_router.rs +66/-47 Span propagation + new spans

Commits

  1. ab315ee8a - fix(sglang): use external_trace_header API for distributed tracing
  2. 0198fbde8 - fix(tracing): propagate span context in prefill background task

@ishandhanani
Copy link
Contributor Author

/ok to test 0198fbd

@ishandhanani ishandhanani enabled auto-merge (squash) January 12, 2026 16:07
@ishandhanani ishandhanani merged commit cbbaa6b into main Jan 12, 2026
41 checks passed
@ishandhanani ishandhanani deleted the ishan/sglang-external-trace-header branch January 12, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::sglang Relates to the sglang backend fix router Relates to routing, KV-aware routing, etc. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants