Skip to content

SGLang Tracing: Add trace-level, trace-module, and unify tracing/request-stage-metrics#13152

Closed
sufeng-buaa wants to merge 53 commits intosgl-project:mainfrom
openanolis:sufeng-buaa/unify-trace-metric
Closed

SGLang Tracing: Add trace-level, trace-module, and unify tracing/request-stage-metrics#13152
sufeng-buaa wants to merge 53 commits intosgl-project:mainfrom
openanolis:sufeng-buaa/unify-trace-metric

Conversation

@sufeng-buaa
Copy link
Collaborator

@sufeng-buaa sufeng-buaa commented Nov 12, 2025

Motivation

The PR is response to #10916. For details on the motivation and visual output, please refer to the issue.

To reduce span overhead, we added the trace-level feature. To support broader use cases beyond request tracing, we introduced trace-module.

Modifications

  1. Refactored tracing package from global-state functions to a class-based design with instance storage. This facilitates integration with request stage metrics and provides a hook for future dynamic instrumentation.
  2. Implemented a wrapper class "SglangStageContext" that internally aggregates trace context and metric collector uniformly collect timestamps and route to different export paths based on configuration.
  3. Added trace level mechanism to assign levels to each RequestStage, helping reduce excessive trace data in production environments.
  4. Added trace module mechanism to extend the trace package beyond request tracing, enabling its use in other modules such as hicache.

I thought about unifying TimeStat too, but it would require too many changes, so I gave up on that. May I will push a draft patch later.

Instrumentation Overhead Evaluation

The overhead of each instrumentation point remains almost unchanged compared to before. See #9962 and #10804

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sufeng-buaa, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly overhauls the tracing and metrics collection system to enhance flexibility, reduce overhead, and broaden its applicability. By transitioning to a class-based RequestTimeRecorder and introducing granular control via trace levels and modules, the system can now more efficiently capture performance data and trace execution flows across various components, including cross-process and cross-node interactions. This foundational change aims to provide richer observability while allowing users to tailor the verbosity of tracing to their specific needs.

Highlights

  • Tracing System Refactor: The tracing package has been refactored from global-state functions to a class-based design, specifically introducing the RequestTimeRecorder class. This change centralizes trace context and metric collection, allowing for more flexible integration and future dynamic instrumentation.
  • Unified Metrics and Tracing: A new RequestTimeRecorder wrapper class is implemented to uniformly aggregate trace context and metric collection. This class routes timestamps to different export paths based on configuration, simplifying how both tracing and request-stage metrics are handled.
  • Trace Level Mechanism: A trace level mechanism has been added, allowing users to assign different levels (1 to 3) to each RequestStage. This enables more granular control over the amount of trace data collected, helping to reduce overhead in production environments by only capturing necessary details.
  • Trace Module Mechanism: A trace module mechanism is introduced to extend the tracing package's applicability beyond just request tracing. This allows other modules, such as hicache, to leverage the tracing framework, making it more versatile.
  • Command-Line Argument Changes: The --enable-trace command-line argument has been replaced with --trace-level (an integer from 0-3) and a new --trace-module argument to specify which module to trace (e.g., 'request'). This provides more precise control over tracing activation and scope.
  • Documentation Updates: The documentation for production request tracing (docs/references/production_request_trace.md) has been updated to reflect the new --trace-level and --trace-module options, as well as the revised API for marking request stages and propagating trace contexts using the RequestTimeRecorder.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring of the tracing system in sglang. It replaces the global tracing functions with a class-based design centered around RequestTimeRecorder, unifying tracing with request stage metrics. It also adds --trace-level and --trace-module for more granular control over tracing, replacing the old --enable-trace flag. The changes are extensive, touching documentation, server arguments, and core scheduler logic. My review focused on ensuring the new API is used consistently, the documentation is accurate, and the refactoring is sound. I've identified a few issues in the documentation that need correction and a critical typo in a function name that would lead to a runtime error. Overall, this is a solid enhancement to the project's observability features.

@sufeng-buaa sufeng-buaa force-pushed the sufeng-buaa/unify-trace-metric branch from c12a958 to e75d844 Compare November 13, 2025 04:34
@sufeng-buaa
Copy link
Collaborator Author

All feedback from Bot Assist has been addressed.

@zhanghaotong
Copy link
Contributor

Hi~ I'm running your code with the following command:

python -m sglang.launch_server --trace-level 3 --otlp-traces-endpoint 0.0.0.0:4317  --model-path /mnt/modelops/models/Qwen3-8B/ --host 0.0.0.0 --log-level info  --port 8001

However, I forgot to install the OpenTelemetry packages. As a result, the engine crashed with the error shown below:
image
And perhaps we should explicitly check for the required OpenTelemetry dependencies when tracing is enabled, and raise a clear error to inform users if they are missing?

@sufeng-buaa
Copy link
Collaborator Author

Hi~ I'm running your code with the following command:

python -m sglang.launch_server --trace-level 3 --otlp-traces-endpoint 0.0.0.0:4317  --model-path /mnt/modelops/models/Qwen3-8B/ --host 0.0.0.0 --log-level info  --port 8001

However, I forgot to install the OpenTelemetry packages. As a result, the engine crashed with the error shown below: image And perhaps we should explicitly check for the required OpenTelemetry dependencies when tracing is enabled, and raise a clear error to inform users if they are missing?

I did forget to verify the case where OpenTelemetry is not installed but tracing is enabled. I'll fix it as soon as possible.



@dataclass
class SglangTraceEvent:
Copy link
Collaborator

@ShangmingCai ShangmingCai Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class SglangTraceEvent:
class SGLangTraceEvent:

nit: we should probably use the correct uppercase and lowercase of SGLang.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed all 'Sglang***' to 'SGLang***'

@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Nov 13, 2025
@sufeng-buaa
Copy link
Collaborator Author

Hi~ I'm running your code with the following command:

python -m sglang.launch_server --trace-level 3 --otlp-traces-endpoint 0.0.0.0:4317  --model-path /mnt/modelops/models/Qwen3-8B/ --host 0.0.0.0 --log-level info  --port 8001

However, I forgot to install the OpenTelemetry packages. As a result, the engine crashed with the error shown below: image And perhaps we should explicitly check for the required OpenTelemetry dependencies when tracing is enabled, and raise a clear error to inform users if they are missing?

Fixed

@sufeng-buaa
Copy link
Collaborator Author

Could you please take a look at my code when you have time @fzyzcjy ? They still seem very busy.

Copy link
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's wait for the CI and see if other reviewers have any other comments.

global get_cur_time_ns
if not opentelemetry_imported:
tracing_enabled = False
logger.warning("opentelemetry package is not installed!!! audo disable tracing")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a minor typo: “auto”.

More importantly, I’m wondering whether we should raise an exception and terminate the main process to clearly alert users when required dependencies are missing—ensuring they understand that tracing will not function. We’ve encountered this issue repeatedly: users forget to install the necessary dependencies and later contact us confused about why no traces are being uploaded.

Moreover, I found that both vLLM and TensorRT-LLM adopt this approach—they raise exceptions in such scenarios to prevent silent failures.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I'll make the changes.

@stmatengss
Copy link
Collaborator

/rerun-failed-ci

@sufeng-buaa
Copy link
Collaborator Author

/rerun-failed-ci

@sufeng-buaa
Copy link
Collaborator Author

Deprecated. Please see the new PR: #17862

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants