Skip to content

fix: hicache with eagle didn't manage the draft model's kv cache.#17338

Closed
cicirori wants to merge 5 commits intomainfrom
fix_hicache_with_spec
Closed

fix: hicache with eagle didn't manage the draft model's kv cache.#17338
cicirori wants to merge 5 commits intomainfrom
fix_hicache_with_spec

Conversation

@cicirori
Copy link
Copy Markdown
Collaborator

@cicirori cicirori commented Jan 19, 2026

Motivation

When both speculative decoding and HiCache are enabled, the draft model KV cache is not managed by the HiRadix cache.

When the original draft model KV cache is evicted, but the target model hits HiCache, the draft model ends up drafting based on a random KV cache.

Modifications

Add dedicated L2 HiCache support for the draft model KV cache.

Accuracy Tests

  • Before the fix
    When HiCache was triggered, the acceptance length dropped for the same request.

  • After the fix
    The acceptance length stays almost the same.

see #16964

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@cicirori
Copy link
Copy Markdown
Collaborator Author

/tag-run-ci-label

@cicirori cicirori changed the title fix: eagle with hicache didn't manage the draft model's kv cache. fix: hicache with eagle didn't manage the draft model's kv cache. Jan 19, 2026
@cicirori
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@xiezhq-hermann
Copy link
Copy Markdown
Collaborator

@ispobock should the draft model just re-compute when the cache is already evicted for the draft model?

@hnyls2002
Copy link
Copy Markdown
Collaborator

/rerun-stage stage-b-test-large-1-gpu

@cicirori
Copy link
Copy Markdown
Collaborator Author

/rerun-stage stage-b-test-large-1-gpu

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered stage-b-test-large-1-gpu to run independently (skipping dependencies).

@github-actions
Copy link
Copy Markdown
Contributor

🔗 View workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working hicache Hierarchical Caching for SGLang high priority run-ci speculative-decoding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants