Improve metrics, observability, and PD deploy tooling by merrymercy · Pull Request #24521 · sgl-project/sglang

merrymercy · 2026-05-06T11:06:11Z

Summary

Add finer-grained queueing time histogram buckets (sub-100ms granularity)
Add uncached prompt tokens histogram metric
Move positional_embed_overrides field after custom_logit_processor for field ordering consistency
Include cached_input_len in ReqTimeStats log prefix
Fix decode_throughput calculation to use (completion_tokens - 1) for accurate per-token throughput
Remove unused prefill_run_batch timing fields and methods
Guard spec tracing calls behind tracing_enable flag to avoid overhead when tracing is disabled
Format entry_time as wallclock timestamp in duration strings
Simplify stdout log target to use default logger
Add video_data to request logger skip lists
Reset fwd_occupancy in device timer window reset
Rename total_retractions to num_retractions in meta_info

Original commits

b6e4fd639

Test plan

Verify metrics endpoint includes new uncached_prompt_tokens_histogram
Verify ReqTimeStats log output format
Run existing unit tests

- Add finer-grained queueing time histogram buckets (sub-100ms) - Add uncached prompt tokens histogram metric - Move positional_embed_overrides field after custom_logit_processor - Include cached_input_len in ReqTimeStats log prefix - Fix decode_throughput to use (completion_tokens - 1) - Remove unused prefill_run_batch timing fields and methods - Guard spec tracing calls behind tracing_enable flag - Format entry_time as wallclock in duration strings - Simplify stdout log target to use default logger - Add video_data to request logger skip lists - Reset fwd_occupancy in device timer window reset - Rename total_retractions to num_retractions in meta_info

merrymercy · 2026-05-06T11:06:19Z

/tag-and-rerun-ci

gemini-code-assist

Code Review

This pull request refactors request time statistics, observability metrics, and logging utilities, including the addition of an uncached prompt tokens histogram and the inclusion of video data in request logging. A critical issue was identified in the metrics collector where the use of an undefined variable, cached_tokens, would result in a runtime NameError.

merrymercy · 2026-05-06T12:10:47Z

/rerun-stage stage-c-test-8-gpu-h20

github-actions · 2026-05-06T12:11:22Z

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies). View workflow run

…tests

merrymercy · 2026-05-06T12:27:33Z

/rerun-stage stage-c-test-8-gpu-h20

github-actions · 2026-05-06T12:28:04Z

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies). View workflow run

* main: (894 commits) [Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (sgl-project#22715) [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (sgl-project#24268) propagate pytest exit code from test __main__ entries (sgl-project#24487) [R3] Avoid implicit CUDA sync in routed experts DP slicing (sgl-project#24550) Add ChatCompletionRequest-style support to /v1/tokenize (sgl-project#23981) Support Triton MLA FP8 KV cache (sgl-project#20479) [diffusion] chore: align LTX-2 with official (sgl-project#24313) Expand support matrix for pypi wheel release (sgl-project#24565) [codex] Optimize Z-Image packed QKV (sgl-project#24117) [Misc] Fix breaking weight checker test (sgl-project#24553) [LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (sgl-project#24420) ci: bump test_mimo_models.py est_time 330 → 610 (sgl-project#24551) [CI] Temporarily disable marco/mcdse-2b-v1 in test_embedding_models (sgl-project#24279) Improve metrics, observability, and PD deploy tooling (sgl-project#24521) Fix diffusion fallback guards and validation (sgl-project#23335) [PD] Prevent update_status to Failed from cleared entries (sgl-project#24539) [CP] Register KV cache allgather buffer with symmetric memory (sgl-project#24040) Support getting checksums in weight checker (sgl-project#24537) Refactor buffer patterns in weight checker (sgl-project#24538) Add unit and end-to-end tests for weight checker (sgl-project#24536) ... # Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/model_executor/model_runner.py

Same root cause as the previous commit: after #24521 (b859f7f) _create_logger_with_handler emits '[YYYY-MM-DD HH:MM:SS] {json...}', so the 'startswith("{")' filter at lines 193 and 230 discards every event and TestRequestLoggerJson.test_logging / test_openai_chat_logging fail with 'AssertionError: False is not true : request.received event not found in stdout' on nightly-amd-1-gpu-unit (confirmed in run 25534700106 log lines 3689 / 3712 / 3722; the actual stdout did contain the events, just under the new prefix). Strip the prefix with the same _LOG_PREFIX_RE pattern introduced in #24521's test_log_utils.py. Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com>

Same root cause as the previous two commits: after #24521 (b859f7f) _create_logger_with_handler emits '[YYYY-MM-DD HH:MM:SS] {json...}', so the 'startswith("{")' check at line 67 never matches and _find_log_events yields nothing. TestSchedulerStatusLogger. test_scheduler_status_dump fails with 'AssertionError: 0 not greater than 0 : scheduler.status event not found' on nightly-amd-1-gpu-unit (confirmed in run 25534700106 log lines 4136 / 4153 / 4165, with events=[] dumped right above). Strip the prefix with the same _LOG_PREFIX_RE pattern. Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com>

)

merrymercy requested review from Ying1123, fzyzcjy, hnyls2002, sufeng-buaa and xiezhq-hermann as code owners May 6, 2026 11:06

github-actions Bot added the run-ci label May 6, 2026

gemini-code-assist Bot reviewed May 6, 2026

View reviewed changes

Comment thread python/sglang/srt/observability/metrics_collector.py

merrymercy added 3 commits May 6, 2026 04:23

Fix black formatting in req_time_stats.py

f5ea8e8

Fix stdout logger in log_utils to use explicit StreamHandler

c039c6c

Use standard log format in log_utils and update test to parse prefix

f312b8e

Enable metrics and request time stats logging in disagg different TP …

d742621

…tests

merrymercy added the high priority label May 6, 2026

merrymercy merged commit b859f7f into main May 6, 2026
164 of 212 checks passed

merrymercy deleted the sync-metrics-observability-improvements branch May 6, 2026 18:27

bingxche mentioned this pull request May 8, 2026

[AMD] fix ci: strip [asctime] prefix in JSON-log parsing tests broken by #24521 #24679

Open

4 tasks

LLThomas pushed a commit to LLThomas/sglang that referenced this pull request May 8, 2026

Improve metrics, observability, and PD deploy tooling (sgl-project#24521

2e09405

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve metrics, observability, and PD deploy tooling#24521

Improve metrics, observability, and PD deploy tooling#24521
merrymercy merged 5 commits into
mainfrom
sync-metrics-observability-improvements

merrymercy commented May 6, 2026

Uh oh!

merrymercy commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

merrymercy commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

merrymercy commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

merrymercy commented May 6, 2026

Summary

Original commits

Test plan

Uh oh!

merrymercy commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

merrymercy commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

merrymercy commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant