[Fix] prefix cache hit rate == 0 bug with gpt-oss style models by ivanium · Pull Request #33524 · vllm-project/vllm

ivanium · 2026-02-01T23:13:18Z

Purpose

With the same purpose as PR #33270, this PR is another simple workaround for issue #32802.

This PR checks GPT-oss style models, which consist of 1 Full Attn group and 1 SWA group, and handles it as a special case where the while loop for convergence check is unnecessary. This addresses the EAGLE spiral block drop bug, and also helps slightly with the efficiency because the while loop is not needed for such simple hybrid models anyway.

However, it is worth noting that for more complicated models with multiple attention groups, this PR does not fully address the EAGLE spiral block drop issue either. A general fix to this issue cannot directly cache the hit_blocks list returned by each attention type, because SWA attn and Mamba-style attn do not follow the downward-closed property (cache hit at token j does not indicate cache hit at i where i < j). So we need some more fundamental changes there.

Fortunately, we don't have such complex models yet, so this is not a huge issue for now.

Test Plan

The test case is adopted from PR #33270, but removes the complicated cases that enable EAGLE for complex models with multiple attn groups.

pytest -q tests/v1/core/test_prefix_caching.py

Test Result

Passed.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces a fix for a prefix cache hit rate bug affecting gpt-oss style models when EAGLE speculative decoding is enabled. The core change in vllm/v1/core/kv_cache_coordinator.py correctly identifies these simple hybrid models and bypasses the iterative convergence loop for cache hit length, which was causing incorrect multiple applications of the EAGLE block dropping logic. This is a targeted and effective fix. The accompanying changes in tests/v1/core/test_prefix_caching.py are substantial, involving refactoring existing tests for better structure and adding new, thorough test cases for the EAGLE-enabled hybrid model scenario. The overall implementation is sound and well-tested.

dosubot · 2026-02-01T23:40:25Z

Related Documentation

No published documentation to review for changes on this repository.

Write your first living document

^{How did I do? Any feedback?}

…eagle spiral drop Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>

Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>

heheda12345

LGTM! Thank you very much.

Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> (cherry picked from commit a01ef3f)

…project#33524) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Signed-off-by: Pai <416932041@qq.com>

…project#33524) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>

mergify bot added gpt-oss Related to GPT-OSS models v1 bug Something isn't working labels Feb 1, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Feb 1, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Feb 1, 2026

gemini-code-assist bot reviewed Feb 1, 2026

View reviewed changes

ivanium marked this pull request as ready for review February 1, 2026 23:40

ivanium requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners February 1, 2026 23:40

ivanium added 3 commits February 1, 2026 23:55

fix (kv cache coordinator): early break when 1 full + 1 SWA to avoid …

e682ebe

…eagle spiral drop Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>

chore: add a note for future fixes and fix mypy errors

f50c52b

Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>

test: minor nit fixes to the test cases

47c12da

Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>

ivanium force-pushed the fix-longest-prefix-cache branch from fd23877 to 47c12da Compare February 1, 2026 23:55

heheda12345 approved these changes Feb 2, 2026

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Feb 2, 2026

heheda12345 enabled auto-merge (squash) February 2, 2026 00:12

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 2, 2026

heheda12345 merged commit a01ef3f into vllm-project:main Feb 2, 2026
42 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Feb 2, 2026

ivanium deleted the fix-longest-prefix-cache branch February 2, 2026 02:00

khluu pushed a commit that referenced this pull request Feb 2, 2026

[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (#33524)

f0d0058

Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> (cherry picked from commit a01ef3f)

PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026

[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (vllm-…

a93009a

…project#33524) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Signed-off-by: Pai <416932041@qq.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (vllm-…

c7a7a80

…project#33524) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix] prefix cache hit rate == 0 bug with gpt-oss style models#33524

[Fix] prefix cache hit rate == 0 bug with gpt-oss style models#33524
heheda12345 merged 3 commits intovllm-project:mainfrom
ivanium:fix-longest-prefix-cache

ivanium commented Feb 1, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

dosubot bot commented Feb 1, 2026

Uh oh!

heheda12345 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ivanium commented Feb 1, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

dosubot bot commented Feb 1, 2026

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ivanium commented Feb 1, 2026 •

edited by github-actions bot

Loading