Fix eagle radix cache by ispobock · Pull Request #10846 · sgl-project/sglang

ispobock · 2025-09-24T04:26:46Z

Motivation

python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --speculative-algo EAGLE3 \
    --speculative-draft-model-path lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B  \
    --speculative-num-steps 2 --speculative-eagle-topk 1 --speculative-num-draft-tokens 3 \
    --dtype float16
    
cd benchmark/mtbench
python3 bench_sglang_eagle.py --parallel 1 --num-questions 10

main:

w/o radix cache:
#questions: 10, Throughput: 269.86 token/s, Acceptance length: 2.39

w/ radix cache:
#questions: 10, Throughput: 255.74 token/s, Acceptance length: 2.26

this PR:

w/o radix cache:
#questions: 10, Throughput: 269.45 token/s, Acceptance length: 2.39

w/ radix cache:
#questions: 10, Throughput: 270.85 token/s, Acceptance length: 2.39

compatibility test:

page size 16
page size 16 + chunked prefill 64
page size 2 + chunked prefill 64
page size 1 + chunked prefill 64
HiCache (this fix is not adapted for HiCache, but doesn't break current HiCache)
multi prompts in one batch (chunked prefill corner case, ref: Fix spec filter batch when target extend #10991)

gemini-code-assist · 2025-09-24T04:26:49Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

JustinTong0323

Awesome!

ispobock · 2025-09-24T10:16:56Z

The page_size>1 still have some issue, will fix it later.

python/sglang/srt/mem_cache/radix_cache.py

ispobock · 2025-09-27T17:19:10Z

@xiezhq-hermann @merrymercy This PR is ready for review.

xiezhq-hermann · 2025-09-28T05:31:16Z

python/sglang/srt/mem_cache/radix_cache.py

        if value is None:
            value = torch.tensor(key.token_ids, dtype=torch.int64)
+
+        if self.is_eagle:


hiradix (and other trees like swa) override the insert function, would that be a problem since eagle worker shared the same tree?

Yes, in current design, we need to adapt this change to other trees like swa and hiradix if they override these functions. This PR just makes the main radix tree ready. HiCache and swa need extra work and test to make them ready.

xiezhq-hermann · 2025-09-28T05:32:17Z

python/sglang/srt/mem_cache/radix_cache.py

+
        return self._insert_helper(self.root_node, key, value)

    def cache_finished_req(self, req: Req):


while hiradix does not, swa tree override this implementation as well

The swa cache inherits from BaseRadixCache, so it seems all the changes should be implemented again on it. HiCache is from RadixCache, we just need to do some adaptation on it with less override. But for HiCache, the main thing I'm concerning is that the chunked prefill size is a little changed. If the chunked prefill size is 64, actually only 63 bigram keys are inserted to the tree. Maybe it's not efficient for cache offloading with block.

My understanding is what we are doing is primarily to resolve conflict with eagle workers since it shares the same radix tree but has its own pool, but not to have hicache support for eagle workers, i.e., eagle workers to fetch kv caches from host memory, which seems unnecessary and potentially complicated. Is it correct?

I agree that the kv cache for eagle worker is unnecessary to store into host memory since it's only one layer. If we use HiCache only for target model, can we still share the kv indices between target and draft pool?

yes I think it should be fine just wanted to confirm that we are aligned on this

cc: @merrymercy

ispobock added 3 commits September 23, 2025 04:18

fix eagle radix cache

c2e27b4

format

6c0a58e

fix

a44351b

ispobock requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners September 24, 2025 04:26

sglang-bot added the run-ci label Sep 24, 2025

JustinTong0323 approved these changes Sep 24, 2025

View reviewed changes

ispobock changed the title ~~Fix eagle radix cache~~ [Do not merge] Fix eagle radix cache Sep 24, 2025

ispobock added 5 commits September 25, 2025 03:45

fix page size>1

b188f44

format

0bb18da

fix chunked prefill page size 1

e680778

add ut

d6f0206

add ut

a401aef

xiezhq-hermann self-assigned this Sep 25, 2025

xiezhq-hermann reviewed Sep 26, 2025

View reviewed changes

python/sglang/srt/mem_cache/radix_cache.py Show resolved Hide resolved

fix hicache ci

74ce777

ispobock changed the title ~~[Do not merge] Fix eagle radix cache~~ Fix eagle radix cache Sep 26, 2025

ispobock added 3 commits September 27, 2025 06:45

update test

aaeea64

fix memory leak issue

0da4297

Merge branch 'main' into ke/eagle-radix-cache

7397449

ispobock assigned merrymercy Sep 27, 2025

xiezhq-hermann reviewed Sep 28, 2025

View reviewed changes

Merge branch 'main' into ke/eagle-radix-cache

520d404

ispobock merged commit 91847e3 into main Sep 30, 2025
124 of 150 checks passed

ispobock deleted the ke/eagle-radix-cache branch September 30, 2025 15:00

This was referenced Oct 4, 2025

EAGLE cache fix for HiCache #11215

Merged

[Feature] Support mamba radix cache v0 #11214

Merged

EAGLE cache fix for SWARadixCache #11231

Merged

ch-tiger1 pushed a commit to ch-tiger1/sglang that referenced this pull request Oct 9, 2025

Fix eagle radix cache (sgl-project#10846)

73f6648


		return self._insert_helper(self.root_node, key, value)

		def cache_finished_req(self, req: Req):

Conversation

ispobock commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

gemini-code-assist bot commented Sep 24, 2025

Uh oh!

JustinTong0323 left a comment

Choose a reason for hiding this comment

Uh oh!

ispobock commented Sep 24, 2025

Uh oh!

Uh oh!

ispobock commented Sep 27, 2025

Uh oh!

xiezhq-hermann Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

ispobock Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

xiezhq-hermann Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

ispobock Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

xiezhq-hermann Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

ispobock Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

xiezhq-hermann Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

ispobock Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ispobock commented Sep 24, 2025 •

edited

Loading

ispobock Sep 29, 2025 •

edited

Loading