Separate allocation logic from scheduler by cctry · Pull Request #11313 · sgl-project/sglang

cctry · 2025-10-08T01:06:23Z

Motivation

Preparation for mem_cache V2.
The allocation logic is moved to mem_cache/ from schedule_batch.py. Ideally, scheduling code should only interact with tree_cache in V1 and memory_manager in V2.

Modifications

create mem_cache/common.py for allocation functions operating allocator and tree cache
refactor the allocation to use two wrappers alloc_for_extend and alloc_for_decode
- in prepare_for_decode, the increment of seqlen is moved after alloc_for_decode for clarity
- in prepare_for_extend, some allocation-needed fields are set before alloc_for_extend
for spec decode, the allocation functions are unchanged and they are to be changed after spec V2
in bench_one_batch.py, create a dummy tree_cache as the placeholder

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

xiezhq-hermann · 2025-10-11T00:41:44Z

python/sglang/srt/mem_cache/common.py

+    backup_state: bool = False,
+):
+    allocator = tree_cache.token_to_kv_pool_allocator
+    evict_from_tree_cache(tree_cache, num_tokens)


why evicting proactively here?

this is actually evict_if_needed

if self.token_to_kv_pool_allocator.available_size() < num_tokens: if self.tree_cache is not None: self.tree_cache.evict(num_tokens)

maybe we can rename it a bit, it's actually check for availability and evict if needed

what will be the case that we want to evict nodes regardless of the availability

i also feel the eviction policy is something non-trivial so evict_from_tree_cache will encapsulate complex logic from upper code

the current self.tree_cache.evict is indeed to evict nodes to meet the requirement regardless of the availability, and I think we can probably keep eviction policy under the hood too like what we have for now

airMeng · 2025-10-20T05:44:26Z

python/sglang/srt/managers/schedule_batch.py

-                )
-
-        # Allocate memory
-        if self.token_to_kv_pool_allocator.page_size == 1:


Hi, I find batch,token_to_kv_pool_allocator.page_size not always equals to batch.tree_cache.page_size, then the paged config will go to the wrong allocation, breaks a lot of cases

could you let me know when the page sizes are different? we can add tests to capture this in the future

#11313 (comment) seems the page sizes should be the same and this is just a bug.

But the question is why we need to access the same page size from different places? echos the suggestions here #11645 (comment)

airMeng · 2025-10-20T06:20:00Z

python/sglang/bench_one_batch.py

 def extend(reqs, model_runner):
+    # Create dummy tree_cache for benchmarks (no prefix caching, just allocation)
+    dummy_tree_cache = SimpleNamespace(
+        page_size=1,


why hard-code to 1 instead page_size=model_runner.server_args.page_size ?

Thanks for pointing out. wIll fix this

cctry added 14 commits October 3, 2025 12:08

match_prefix

9f238a4

fix

c0fc6a0

prepare_for_extend: adjust order

baa34d7

prepare_for_extend: unify indices writing

ffc57c0

clean

75e3cf7

clean

3fe0dd6

set prefix_indices default

ebb29fc

fix chunk cache

5fa6bbf

prepare_for_extend

9d7b32f

move functions out

a30f56b

better

4d4a272

wip

e3e1d28

wip

217fa5b

Merge remote-tracking branch 'origin/main' into shiyang/mem_v2/alloc

2b54ecc

sglang-bot added the run-ci label Oct 8, 2025

cctry added 2 commits October 7, 2025 18:34

clean

ba41c12

clean

d498464

cctry marked this pull request as ready for review October 8, 2025 01:50

cctry requested review from Ying1123, hnyls2002, kssteven418, merrymercy and xiezhq-hermann as code owners October 8, 2025 01:50

cctry added 3 commits October 7, 2025 18:59

clean

4e0bb42

Merge remote-tracking branch 'origin/main' into shiyang/mem_v2/alloc

6c29a88

fix benchmark

2e8d1a4

cctry requested a review from zhyncs as a code owner October 8, 2025 18:40

cctry self-assigned this Oct 8, 2025

fix hybrid

987a3dd

xiezhq-hermann self-assigned this Oct 9, 2025

xiezhq-hermann assigned merrymercy, hnyls2002, ispobock, hzh0425 and hanming-lu Oct 9, 2025

cctry added 2 commits October 10, 2025 14:10

update error msg

bd22a18

swa

563ba3a

merrymercy approved these changes Oct 11, 2025

View reviewed changes

xiezhq-hermann approved these changes Oct 11, 2025

View reviewed changes

cctry merged commit b36afed into main Oct 11, 2025
193 of 234 checks passed

cctry deleted the shiyang/mem_v2/alloc branch October 11, 2025 00:38

xiezhq-hermann reviewed Oct 11, 2025

View reviewed changes

leavelet mentioned this pull request Oct 12, 2025

[Bug] 'ChunkCache' object has no attribute 'device' #11492

Closed

5 tasks

whybeyoung mentioned this pull request Oct 13, 2025

[Bugfix] minor fix mem device #11523

Closed

rogeryoungh mentioned this pull request Oct 14, 2025

Fix mamba radix cache eviction logic in alloc_req_slots #11616

Merged

4 tasks

narutolhy mentioned this pull request Oct 14, 2025

[Bug] RadixCacheCpp Error when using #11630

Closed

5 tasks

cctry mentioned this pull request Oct 15, 2025

Fix CPP Radix Cache and add test to CI #11645

Merged

4 tasks

airMeng reviewed Oct 20, 2025

View reviewed changes

merrymercy mentioned this pull request Oct 23, 2025

Development Roadmap (2025 Q3) #7736

Closed

1 task

lpc0220 pushed a commit to lpc0220/sglang that referenced this pull request Oct 29, 2025

Separate allocation logic from scheduler (sgl-project#11313)

d6b4a47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate allocation logic from scheduler#11313

Separate allocation logic from scheduler#11313
cctry merged 22 commits intomainfrom
shiyang/mem_v2/alloc

cctry commented Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

xiezhq-hermann Oct 11, 2025

Uh oh!

cctry Oct 11, 2025

Uh oh!

xiezhq-hermann Oct 11, 2025

Uh oh!

cctry Oct 11, 2025

Uh oh!

xiezhq-hermann Oct 11, 2025

Uh oh!

airMeng Oct 20, 2025

Uh oh!

cctry Oct 20, 2025

Uh oh!

airMeng Oct 21, 2025 •

edited

Loading

Uh oh!

airMeng Oct 20, 2025

Uh oh!

cctry Oct 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

cctry commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

airMeng Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cctry Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

cctry commented Oct 8, 2025 •

edited

Loading

airMeng Oct 21, 2025 •

edited

Loading

cctry Oct 20, 2025 •

edited

Loading