Improve metrics, observability, and PD deploy tooling#24521
Merged
Conversation
- Add finer-grained queueing time histogram buckets (sub-100ms) - Add uncached prompt tokens histogram metric - Move positional_embed_overrides field after custom_logit_processor - Include cached_input_len in ReqTimeStats log prefix - Fix decode_throughput to use (completion_tokens - 1) - Remove unused prefill_run_batch timing fields and methods - Guard spec tracing calls behind tracing_enable flag - Format entry_time as wallclock in duration strings - Simplify stdout log target to use default logger - Add video_data to request logger skip lists - Reset fwd_occupancy in device timer window reset - Rename total_retractions to num_retractions in meta_info
Contributor
Author
|
/tag-and-rerun-ci |
Contributor
There was a problem hiding this comment.
Code Review
This pull request refactors request time statistics, observability metrics, and logging utilities, including the addition of an uncached prompt tokens histogram and the inclusion of video data in request logging. A critical issue was identified in the metrics collector where the use of an undefined variable, cached_tokens, would result in a runtime NameError.
Contributor
Author
|
/rerun-stage stage-c-test-8-gpu-h20 |
Contributor
|
✅ Triggered |
Contributor
Author
|
/rerun-stage stage-c-test-8-gpu-h20 |
Contributor
|
✅ Triggered |
ltcs11
added a commit
to ltcs11/sglang
that referenced
this pull request
May 7, 2026
* main: (894 commits) [Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (sgl-project#22715) [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (sgl-project#24268) propagate pytest exit code from test __main__ entries (sgl-project#24487) [R3] Avoid implicit CUDA sync in routed experts DP slicing (sgl-project#24550) Add ChatCompletionRequest-style support to /v1/tokenize (sgl-project#23981) Support Triton MLA FP8 KV cache (sgl-project#20479) [diffusion] chore: align LTX-2 with official (sgl-project#24313) Expand support matrix for pypi wheel release (sgl-project#24565) [codex] Optimize Z-Image packed QKV (sgl-project#24117) [Misc] Fix breaking weight checker test (sgl-project#24553) [LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (sgl-project#24420) ci: bump test_mimo_models.py est_time 330 → 610 (sgl-project#24551) [CI] Temporarily disable marco/mcdse-2b-v1 in test_embedding_models (sgl-project#24279) Improve metrics, observability, and PD deploy tooling (sgl-project#24521) Fix diffusion fallback guards and validation (sgl-project#23335) [PD] Prevent update_status to Failed from cleared entries (sgl-project#24539) [CP] Register KV cache allgather buffer with symmetric memory (sgl-project#24040) Support getting checksums in weight checker (sgl-project#24537) Refactor buffer patterns in weight checker (sgl-project#24538) Add unit and end-to-end tests for weight checker (sgl-project#24536) ... # Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/model_executor/model_runner.py
cursor Bot
pushed a commit
that referenced
this pull request
May 8, 2026
Same root cause as the previous commit: after #24521 (b859f7f) _create_logger_with_handler emits '[YYYY-MM-DD HH:MM:SS] {json...}', so the 'startswith("{")' filter at lines 193 and 230 discards every event and TestRequestLoggerJson.test_logging / test_openai_chat_logging fail with 'AssertionError: False is not true : request.received event not found in stdout' on nightly-amd-1-gpu-unit (confirmed in run 25534700106 log lines 3689 / 3712 / 3722; the actual stdout did contain the events, just under the new prefix). Strip the prefix with the same _LOG_PREFIX_RE pattern introduced in #24521's test_log_utils.py. Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com>
cursor Bot
pushed a commit
that referenced
this pull request
May 8, 2026
Same root cause as the previous two commits: after #24521 (b859f7f) _create_logger_with_handler emits '[YYYY-MM-DD HH:MM:SS] {json...}', so the 'startswith("{")' check at line 67 never matches and _find_log_events yields nothing. TestSchedulerStatusLogger. test_scheduler_status_dump fails with 'AssertionError: 0 not greater than 0 : scheduler.status event not found' on nightly-amd-1-gpu-unit (confirmed in run 25534700106 log lines 4136 / 4153 / 4165, with events=[] dumped right above). Strip the prefix with the same _LOG_PREFIX_RE pattern. Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com>
4 tasks
LLThomas
pushed a commit
to LLThomas/sglang
that referenced
this pull request
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Original commits
b6e4fd639Test plan