Skip to content

Revert "fix(trtllm_mha): clamp page_table to k_cache page range"#7

Open
pyc96 wants to merge 1 commit into
pyc/sota-gemma4-mtp-patch-e-rootcausefrom
pyc/sota-gemma4-mtp-revert-clamp
Open

Revert "fix(trtllm_mha): clamp page_table to k_cache page range"#7
pyc96 wants to merge 1 commit into
pyc/sota-gemma4-mtp-patch-e-rootcausefrom
pyc/sota-gemma4-mtp-revert-clamp

Conversation

@pyc96
Copy link
Copy Markdown
Owner

@pyc96 pyc96 commented May 23, 2026

Summary

Revert the defensive clamp from #5. It is no longer
needed now that #6 (Patch E) eliminates the source of OOB
page_table values entirely.

Stacked on #6.

Why revert

The clamp was a safety net that traded correctness for liveness:
when it actually triggered, it replaced an OOB page index with the
LAST valid SWA page, producing slightly wrong attention values for
that position and lowering MTP draft acceptance (1.67 instead of
2.84 on 26B summarization).

Patch E (#6) fixes the actual root cause — the
FROZEN_KV_MTP draft backend not swapping its SWA-aware attributes
when token_to_kv_pool is swapped to the target's SWA pool — so
OOB values never occur and the clamp never fires. It's now dead
code that adds one .clamp() per kernel call for no benefit.

Verification (Gemma-4-E4B-IT, trtllm_mha, MTP, summarization 8 k/1 k × 80, 1× B200)

metric Patch E (with clamp) After revert (Patch E only)
outcome OK OK
trap events (PR #3) 0 0
output tok/s 4022 4049 (within noise)
accept length 2.13 2.13 (identical)
TPOT median 9.99 ms 9.94 ms
benchmark duration 19.89 s 19.76 s

Same results. The clamp truly was dead code.

Safety net

If a future code change ever reintroduces an OOB page_table value,
the opt-in bounds-check trap from #3
(SGLANG_TRTLLM_MHA_DEBUG=1) will still catch it with a
deterministic Python exception + dump for triage.


CI States

Latest PR Test (Base): ❌ Missing run-ci label -- add it to run CI tests.
Latest PR Test (Extra): ❌ Blocked -- run-ci is required first.

This reverts commit 5547e41.

PR #5 (the clamp) is no longer needed because PR
#6 (Patch E) eliminates the source of OOB page_table
values entirely.  The clamp's only side-effect was a known quality
limitation -- when the clamp actually triggered, it replaced an OOB
page index with the LAST valid SWA page, producing slightly wrong
attention values for that position and lowering MTP draft acceptance.
With Patch E in place those OOB values never occur and the clamp
never fires, so it's dead code that adds one .clamp() per kernel call
for no benefit.

Verified after this revert (Gemma-4-E4B-IT + trtllm_mha + MTP +
summarization 8 k/1 k x 80 on 1x B200):

  outcome:        OK (zero trap events from PR #3 debug)
  accept length:  matches the pre-revert PR #6 run
  TPOT:           matches the pre-revert PR #6 run

If a future code change reintroduces an OOB page_table value, the
opt-in bounds-check trap from PR #3
(SGLANG_TRTLLM_MHA_DEBUG=1) will still catch it with a deterministic
Python exception + dump for triage.

Co-authored-by: Claude
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant