Conversation
Add two always-on KV-canary observability add-ons, both pure observers of CanaryDeviceState counters and the sweep orchestrator (nothing depends on their output): - Periodic stats logging (python/sglang/srt/kv_canary/runner/stats_logger.py): a PeriodicCanaryStatsLogger that prints, every N forward steps, the number of protected tokens, sweep passes, cumulative violations and the count of active launch tags, driven once per outer step via a delayed D2H handler so the read never blocks the forward. Gated by CanaryConfig.stats_print_every_n_steps (env SGLANG_KV_CANARY_STATS_PRINT_EVERY_N_STEPS, 0 disables). - Kernel-run-counter health check (python/sglang/srt/kv_canary/runner/health_checker.py): a KernelRunCounterHealthChecker that watches the per-tag kernel run counters and warns when a canary kernel stops advancing (e.g. a tag silently goes un-launched), catching wiring regressions that would otherwise pass silently. Unit test in test/registered/kv_canary/test_self_unit_runner_health.py. Both are constructed alongside the other per-step runners in CanaryManager and ticked once per outer step.
Collaborator
Author
|
/tag-and-rerun-ci |
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Collaborator
Author
✅
|
| SHA | Tree | |
|---|---|---|
Rebased source (tag verify/rebased/20260531T015946Z) |
34bc9ecc86ed |
c643a57b595e822a34161d550bb4043d0c06addb |
PR head (tom/pr_chain/tom/kv_canary_revert_reversed/add-periodic-kv-canary-stats-logging-and-kernel-run-counter-health-check) |
24a51e3faf13 |
c643a57b595e822a34161d550bb4043d0c06addb |
upstream/main |
7dd19ae3d8ba |
35c87847a6bdf8cec6fa63a2eeea9740061a68d6 |
Reproduce locally (the rebase tag persists after this run):
git fetch upstream 24a51e3faf13dd35365e39542677cb4496c93dcd
REB_TREE=$(git rev-parse 'verify/rebased/20260531T015946Z^{tree}')
PR_TREE=$(git rev-parse '24a51e3faf13dd35365e39542677cb4496c93dcd^{tree}')
MAIN_TREE=$(git rev-parse 'upstream/main^{tree}')
echo "REB_TREE = $REB_TREE"
echo "PR_TREE = $PR_TREE"
echo "MAIN_TREE = $MAIN_TREE"Generated by single_commit_pr_chain.py verify-rebased.
4 tasks
xjpang
pushed a commit
to xjpang/sglang
that referenced
this pull request
Jun 2, 2026
mqhc2020
pushed a commit
to mqhc2020/sglang
that referenced
this pull request
Jun 2, 2026
hanming-lu
pushed a commit
that referenced
this pull request
Jun 3, 2026
alphabetc1
pushed a commit
to alphabetc1/sglang
that referenced
this pull request
Jun 4, 2026
jeynmann
pushed a commit
to jeynmann/sglang
that referenced
this pull request
Jun 4, 2026
edwingao28
pushed a commit
to edwingao28/sglang
that referenced
this pull request
Jun 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add two always-on KV-canary observability add-ons, both pure observers
of CanaryDeviceState counters and the sweep orchestrator (nothing
depends on their output):
Periodic stats logging
(python/sglang/srt/kv_canary/runner/stats_logger.py): a
PeriodicCanaryStatsLogger that prints, every N forward steps, the
number of protected tokens, sweep passes, cumulative violations and
the count of active launch tags, driven once per outer step via a
delayed D2H handler so the read never blocks the forward. Gated by
CanaryConfig.stats_print_every_n_steps (env
SGLANG_KV_CANARY_STATS_PRINT_EVERY_N_STEPS, 0 disables).
Kernel-run-counter health check
(python/sglang/srt/kv_canary/runner/health_checker.py): a
KernelRunCounterHealthChecker that watches the per-tag kernel run
counters and warns when a canary kernel stops advancing (e.g. a tag
silently goes un-launched), catching wiring regressions that would
otherwise pass silently. Unit test in
test/registered/kv_canary/test_self_unit_runner_health.py.
Both are constructed alongside the other per-step runners in
CanaryManager and ticked once per outer step.
CI States
Latest PR Test (Base): 🚫 Run #26700542488
Latest PR Test (Extra): ❌ Run #26700542417