Skip to content

Fix Grafana dashboard metrics naming to match Prometheus colon format#20821

Open
karanb192 wants to merge 1 commit intosgl-project:mainfrom
karanb192:fix/grafana-metrics-naming
Open

Fix Grafana dashboard metrics naming to match Prometheus colon format#20821
karanb192 wants to merge 1 commit intosgl-project:mainfrom
karanb192:fix/grafana-metrics-naming

Conversation

@karanb192
Copy link

Summary

  • Updates all 14 metric references in the Grafana dashboard JSON (examples/monitoring/grafana/dashboards/json/sglang-dashboard.json) from underscore format (sglang_metric_name) to colon namespace format (sglang:metric_name)
  • This aligns the dashboard queries with the actual Prometheus metrics exported since PR [Fix] A followup fix for TRTLLM BF16 MoE #15303, which switched to industry-standard colon namespacing
  • Without this fix, all Grafana dashboard panels show "No data"

Affected Metrics (6 metrics, 14 references)

Old Name (dashboard) New Name (Prometheus)
sglang_e2e_request_latency_seconds_* sglang:e2e_request_latency_seconds_*
sglang_time_to_first_token_seconds_* sglang:time_to_first_token_seconds_*
sglang_num_running_reqs sglang:num_running_reqs
sglang_gen_throughput sglang:gen_throughput
sglang_cache_hit_rate sglang:cache_hit_rate
sglang_num_queue_reqs sglang:num_queue_reqs

Fixes #20752

Test Plan

  • Deploy SGLang with Prometheus + Grafana monitoring (examples/monitoring/)
  • Import the updated Grafana dashboard JSON
  • Verify all panels display data correctly (no more "No data")
  • Confirm metric names in dashboard match python/sglang/srt/metrics/collector.py definitions

Since PR sgl-project#15303, Prometheus metrics use colon namespace format
(sglang:metric_name) instead of underscore format (sglang_metric_name).
Update all 14 metric references in the Grafana dashboard JSON to use
the colon format so dashboard panels display data correctly.

Fixes sgl-project#20752

Signed-off-by: karanb192 <karan@example.com>
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@karanb192
Copy link
Author

Hi maintainers, could you please add the run-ci label to trigger CI? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Grafana dashboard metrics naming out of sync with Prometheus metrics (underscore vs colon format)

1 participant