[Bugfix] Fix TurboQuant KV cache index-out-of-bounds in Triton decode kernel#40074
[Bugfix] Fix TurboQuant KV cache index-out-of-bounds in Triton decode kernel#40074devarakondasrikanth wants to merge 6 commits intovllm-project:mainfrom
Conversation
… kernel Clamp masked-out SIMD lanes to page_idx=0 before block table pointer arithmetic. Triton's bounds checker fires on the address even when the output is masked, causing an index error on long (e.g. 32k) sequences. Signed-off-by: devarakondasrikanth <devarakondasrikanth@ymail.com>
There was a problem hiding this comment.
Code Review
This pull request modifies the _tq_decode_stage1 function in vllm/v1/attention/ops/triton_turboquant_decode.py to prevent out-of-bounds pointer arithmetic in Triton. It introduces a safe_page_idx that clamps masked-out lanes to index 0 before loading from the block table, ensuring the bounds checker does not trigger on masked lanes. I have no feedback to provide.
|
@LucasWilkinson and @MatthewBonanni i started working with vlllm and came across this bug, i implemented minimal fix to solve the issue (harmless). As this is my first contribution "pre-commit / pre-run-check (pull_request)Failing after 5s" can you help here to run ci. |
| @@ -133,8 +133,12 @@ def _tq_decode_stage1( | |||
|
|
|||
There was a problem hiding this comment.
A better solution is to clamp kv_offs itself at the source by just adding this
kv_offs = tl.where(kv_mask, kv_offs, split_start)
|
Tried to reproduce the OOB crash on RTX 5090 (sm_120) + Experiment — 8 concurrent ×31 632 prompt tokens, 256 decode tokens each,
Could not reproduce #39998's That said, the fix is provably correct: +1 to merge; applies cleanly on top of #39931 as well. (AI-assisted verification run; human submitter reviewed and ran both A/B configurations.) |
Purpose
Clamp masked-out SIMD lanes to page_idx=0 before block table pointer arithmetic. Triton's bounds checker fires on the address even when the output is masked, causing an index error on long (e.g. 32k) sequences.
Fixes issue #39998 #39998
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.