[Bugfix] Fix CPU backend crash in KV cache block zeroing by DorBernsohn · Pull Request #37550 · vllm-project/vllm

DorBernsohn · 2026-03-19T10:48:09Z

Override _zero_block_ids in CPUModelRunner with a pure PyTorch implementation to avoid calling the Triton GPU kernel (_zero_kv_blocks_kernel), which crashes on CPU nodes without an active GPU driver.
- The Triton block-zeroing kernel was introduced in [BUGFIX][Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU #35219 (March 10), but CPUModelRunner lacked a CPU-safe fallback. This caused a TypeError: 'function' object is not subscriptable on the first inference request for all models using the CPU backend.

Test plan

Verified syntax and pre-commit hooks pass
Implemented a minimal override using PyTorch (tensor.zero_()) to replace the Triton kernel path only for CPU
Existing CPU CI tests cover the integration path

Override _zero_block_ids in CPUModelRunner with a pure PyTorch implementation to avoid calling the Triton kernel that fails when Triton has no active GPU driver. Closes vllm-project#37546 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>

gemini-code-assist

Code Review

The changes effectively address the reported bug by providing a CPU-specific implementation for zeroing KV cache blocks. This prevents the TypeError that occurred when GPU-specific kernels were invoked on CPU-only environments. The solution is straightforward and uses standard PyTorch operations, ensuring compatibility and correctness for the CPU backend.

bigPYJ1151 · 2026-03-23T09:13:32Z

vllm/v1/worker/cpu_model_runner.py

+        if not block_ids:
+            return
+        for kv_cache in self.kv_caches:
+            # CPU attention backend shape: (2, num_blocks, heads, block_sz, head_sz)
+            # block_dim = 1
+            kv_cache[:, block_ids].zero_()


For CPU attention backend the zeroing is not required. Different from FlashAttention, in CPU attention logits of invalid postions will be assigned to -INF, so invalid KV cache will not affect computation. Therefor _zero_block_ids can be passed.

vllm/csrc/cpu/cpu_attn_impl.hpp

Lines 1129 to 1134 in 35141a7

for (int32_t i = 0; i < left_invalid_token_num; ++i) {

curr_logits_buffer[i] = neg_inf;

}

for (int32_t i = 0; i < right_invalid_token_num; ++i) {

curr_logits_buffer_tail[i] = neg_inf;

}

@bigPYJ1151 youre right - updated _zero_block_ids to be a no-op since the cpu attention backend already handles invalid positions by assigning -inf to their logits.

…tions CPU attention backend assigns -INF to logits at invalid KV cache positions, so zeroing is unnecessary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>

bigPYJ1151

Thanks! LGTM

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

DorBernsohn requested a review from bigPYJ1151 as a code owner March 19, 2026 10:48

mergify bot added cpu Related to CPU backends v1 bug Something isn't working labels Mar 19, 2026

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

bigPYJ1151 reviewed Mar 23, 2026

View reviewed changes

bigPYJ1151 approved these changes Mar 23, 2026

View reviewed changes

bigPYJ1151 enabled auto-merge (squash) March 23, 2026 09:28

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 23, 2026

Merge branch 'main' into fix/cpu-kv-block-zero-triton-crash

0638380

bigPYJ1151 merged commit 7938d12 into vllm-project:main Mar 23, 2026
51 checks passed

yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Mar 23, 2026

[Bugfix] Fix CPU backend crash in KV cache block zeroing (vllm-projec…

6bc6f8c

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>

RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026

[Bugfix] Fix CPU backend crash in KV cache block zeroing (vllm-projec…

b170e09

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>

HenryTangDev pushed a commit to HenryTangMain/vllm that referenced this pull request Mar 27, 2026

[Bugfix] Fix CPU backend crash in KV cache block zeroing (vllm-projec…

7ea1d11

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>

SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026

[Bugfix] Fix CPU backend crash in KV cache block zeroing (vllm-projec…

013fcd8

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[Bugfix] Fix CPU backend crash in KV cache block zeroing (vllm-projec…

549abf0

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>

nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026

[Bugfix] Fix CPU backend crash in KV cache block zeroing (vllm-projec…

7d8d69c

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

[Bugfix] Fix CPU backend crash in KV cache block zeroing (vllm-projec…

e5a1799

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>

vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026

[Bugfix] Fix CPU backend crash in KV cache block zeroing (vllm-projec…

3121295

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026

[Bugfix] Fix CPU backend crash in KV cache block zeroing (vllm-projec…

cac9e2a

…t#37550) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix CPU backend crash in KV cache block zeroing#37550

[Bugfix] Fix CPU backend crash in KV cache block zeroing#37550
bigPYJ1151 merged 3 commits intovllm-project:mainfrom
DorBernsohn:fix/cpu-kv-block-zero-triton-crash

DorBernsohn commented Mar 19, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

bigPYJ1151 Mar 23, 2026

Uh oh!

DorBernsohn Mar 23, 2026

Uh oh!

bigPYJ1151 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	for (int32_t i = 0; i < left_invalid_token_num; ++i) {
	curr_logits_buffer[i] = neg_inf;
	}
	for (int32_t i = 0; i < right_invalid_token_num; ++i) {
	curr_logits_buffer_tail[i] = neg_inf;
	}

Uh oh!

Conversation

DorBernsohn commented Mar 19, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

bigPYJ1151 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

DorBernsohn Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

bigPYJ1151 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DorBernsohn commented Mar 19, 2026 •

edited by github-actions bot

Loading