SGLang Tracing: Supports propagating trace headers through sgl.Engine…#15814
Conversation
Summary of ChangesHello @sufeng-buaa, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances SGLang's tracing capabilities by enabling explicit propagation of trace context when the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for propagating trace headers through sgl.Engine by adding an external_trace_header parameter to generate, async_generate, encode, and async_encode methods. The changes are well-implemented and correctly propagate the trace context down to the tracing logic. I've also pointed out a pre-existing but related issue regarding tracing for embedding requests that could be addressed to make the tracing functionality more robust.
| # Propagates trace context via Engine.encode/async_encode | ||
| external_trace_header: Optional[Dict] = None |
There was a problem hiding this comment.
Renaming trace_context to external_trace_header is a good change for consistency with GenerateReqInput.
However, I noticed a related pre-existing issue. TokenizedEmbeddingReqInput is missing the trace_context field, which TokenizedGenerateReqInput has. This will likely cause an AttributeError in tokenizer_manager.py inside the _send_one_request method when it tries to set tokenized_obj.trace_context for embedding requests. Since this PR is focused on improving tracing, it would be a good opportunity to fix this to ensure tracing works correctly for embedding requests as well.
You could fix this by adding trace_context: Optional[Dict] = None to the TokenizedEmbeddingReqInput class.
There was a problem hiding this comment.
I will fix the issue you mentioned in #13152.
5a5c421 to
16e497b
Compare
|
/tag-run-ci-label |
… calls Signed-off-by: Feng Su <sufeng@linux.alibaba.com>
16e497b to
701ca46
Compare
|
Relevant tests have passed. I will test this with Dynamo and then merge |
#15814) Signed-off-by: Feng Su <sufeng@linux.alibaba.com>
sgl-project#15814) Signed-off-by: Feng Su <sufeng@linux.alibaba.com>
| # tracing context | ||
| trace_context: Optional[Dict] = None | ||
| # Propagates trace context via Engine.encode/async_encode | ||
| external_trace_header: Optional[Dict] = None |
Motivation
Dynamo directly invokes SGLang through
sgl.Engine, and thus cannot propagatetrace contextviarequest headersas in the HTTP API approach. Therefore,trace contextmust be explicitly propagated through thesgl.Engineinterface.Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist