Skip to content

Improve metrics, observability, and PD deploy tooling#24521

Merged
merrymercy merged 5 commits into
mainfrom
sync-metrics-observability-improvements
May 6, 2026
Merged

Improve metrics, observability, and PD deploy tooling#24521
merrymercy merged 5 commits into
mainfrom
sync-metrics-observability-improvements

Conversation

@merrymercy
Copy link
Copy Markdown
Contributor

Summary

  • Add finer-grained queueing time histogram buckets (sub-100ms granularity)
  • Add uncached prompt tokens histogram metric
  • Move positional_embed_overrides field after custom_logit_processor for field ordering consistency
  • Include cached_input_len in ReqTimeStats log prefix
  • Fix decode_throughput calculation to use (completion_tokens - 1) for accurate per-token throughput
  • Remove unused prefill_run_batch timing fields and methods
  • Guard spec tracing calls behind tracing_enable flag to avoid overhead when tracing is disabled
  • Format entry_time as wallclock timestamp in duration strings
  • Simplify stdout log target to use default logger
  • Add video_data to request logger skip lists
  • Reset fwd_occupancy in device timer window reset
  • Rename total_retractions to num_retractions in meta_info

Original commits

  • b6e4fd639

Test plan

  • Verify metrics endpoint includes new uncached_prompt_tokens_histogram
  • Verify ReqTimeStats log output format
  • Run existing unit tests

- Add finer-grained queueing time histogram buckets (sub-100ms)
- Add uncached prompt tokens histogram metric
- Move positional_embed_overrides field after custom_logit_processor
- Include cached_input_len in ReqTimeStats log prefix
- Fix decode_throughput to use (completion_tokens - 1)
- Remove unused prefill_run_batch timing fields and methods
- Guard spec tracing calls behind tracing_enable flag
- Format entry_time as wallclock in duration strings
- Simplify stdout log target to use default logger
- Add video_data to request logger skip lists
- Reset fwd_occupancy in device timer window reset
- Rename total_retractions to num_retractions in meta_info
@merrymercy
Copy link
Copy Markdown
Contributor Author

/tag-and-rerun-ci

@github-actions github-actions Bot added the run-ci label May 6, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors request time statistics, observability metrics, and logging utilities, including the addition of an uncached prompt tokens histogram and the inclusion of video data in request logging. A critical issue was identified in the metrics collector where the use of an undefined variable, cached_tokens, would result in a runtime NameError.

Comment thread python/sglang/srt/observability/metrics_collector.py
@merrymercy
Copy link
Copy Markdown
Contributor Author

/rerun-stage stage-c-test-8-gpu-h20

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies). View workflow run

@merrymercy
Copy link
Copy Markdown
Contributor Author

/rerun-stage stage-c-test-8-gpu-h20

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies). View workflow run

@merrymercy merrymercy merged commit b859f7f into main May 6, 2026
164 of 212 checks passed
@merrymercy merrymercy deleted the sync-metrics-observability-improvements branch May 6, 2026 18:27
ltcs11 added a commit to ltcs11/sglang that referenced this pull request May 7, 2026
* main: (894 commits)
  [Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (sgl-project#22715)
  [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (sgl-project#24268)
  propagate pytest exit code from test __main__ entries (sgl-project#24487)
  [R3] Avoid implicit CUDA sync in routed experts DP slicing (sgl-project#24550)
  Add ChatCompletionRequest-style support to /v1/tokenize (sgl-project#23981)
  Support Triton MLA FP8 KV cache (sgl-project#20479)
  [diffusion] chore: align LTX-2 with official (sgl-project#24313)
  Expand support matrix for pypi wheel release (sgl-project#24565)
  [codex] Optimize Z-Image packed QKV (sgl-project#24117)
  [Misc] Fix breaking weight checker test (sgl-project#24553)
  [LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (sgl-project#24420)
  ci: bump test_mimo_models.py est_time 330 → 610 (sgl-project#24551)
  [CI] Temporarily disable marco/mcdse-2b-v1 in test_embedding_models (sgl-project#24279)
  Improve metrics, observability, and PD deploy tooling (sgl-project#24521)
  Fix diffusion fallback guards and validation (sgl-project#23335)
  [PD] Prevent update_status to Failed from cleared entries (sgl-project#24539)
  [CP] Register KV cache allgather buffer with symmetric memory (sgl-project#24040)
  Support getting checksums in weight checker (sgl-project#24537)
  Refactor buffer patterns in weight checker (sgl-project#24538)
  Add unit and end-to-end tests for weight checker (sgl-project#24536)
  ...

# Conflicts:
#	python/sglang/srt/managers/scheduler.py
#	python/sglang/srt/model_executor/model_runner.py
cursor Bot pushed a commit that referenced this pull request May 8, 2026
Same root cause as the previous commit: after #24521 (b859f7f)
_create_logger_with_handler emits '[YYYY-MM-DD HH:MM:SS] {json...}',
so the 'startswith("{")' filter at lines 193 and 230 discards every
event and TestRequestLoggerJson.test_logging /
test_openai_chat_logging fail with
'AssertionError: False is not true : request.received event not found
in stdout' on nightly-amd-1-gpu-unit (confirmed in run 25534700106
log lines 3689 / 3712 / 3722; the actual stdout did contain the
events, just under the new prefix).

Strip the prefix with the same _LOG_PREFIX_RE pattern introduced in
#24521's test_log_utils.py.

Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com>
cursor Bot pushed a commit that referenced this pull request May 8, 2026
Same root cause as the previous two commits: after #24521 (b859f7f)
_create_logger_with_handler emits '[YYYY-MM-DD HH:MM:SS] {json...}',
so the 'startswith("{")' check at line 67 never matches and
_find_log_events yields nothing. TestSchedulerStatusLogger.
test_scheduler_status_dump fails with
'AssertionError: 0 not greater than 0 : scheduler.status event not
found' on nightly-amd-1-gpu-unit (confirmed in run 25534700106 log
lines 4136 / 4153 / 4165, with events=[] dumped right above).

Strip the prefix with the same _LOG_PREFIX_RE pattern.

Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com>
LLThomas pushed a commit to LLThomas/sglang that referenced this pull request May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant