Skip to content

Separate allocation logic from scheduler#11313

Merged
cctry merged 22 commits intomainfrom
shiyang/mem_v2/alloc
Oct 11, 2025
Merged

Separate allocation logic from scheduler#11313
cctry merged 22 commits intomainfrom
shiyang/mem_v2/alloc

Conversation

@cctry
Copy link
Copy Markdown
Collaborator

@cctry cctry commented Oct 8, 2025

Motivation

Preparation for mem_cache V2.
The allocation logic is moved to mem_cache/ from schedule_batch.py. Ideally, scheduling code should only interact with tree_cache in V1 and memory_manager in V2.

Modifications

  • create mem_cache/common.py for allocation functions operating allocator and tree cache
  • refactor the allocation to use two wrappers alloc_for_extend and alloc_for_decode
    • in prepare_for_decode, the increment of seqlen is moved after alloc_for_decode for clarity
    • in prepare_for_extend, some allocation-needed fields are set before alloc_for_extend
  • for spec decode, the allocation functions are unchanged and they are to be changed after spec V2
  • in bench_one_batch.py, create a dummy tree_cache as the placeholder

Accuracy Tests

Benchmarking and Profiling

Checklist

@cctry cctry marked this pull request as ready for review October 8, 2025 01:50
@cctry cctry requested a review from zhyncs as a code owner October 8, 2025 18:40
@cctry cctry self-assigned this Oct 8, 2025
@xiezhq-hermann xiezhq-hermann self-assigned this Oct 9, 2025
@cctry cctry merged commit b36afed into main Oct 11, 2025
193 of 234 checks passed
@cctry cctry deleted the shiyang/mem_v2/alloc branch October 11, 2025 00:38
backup_state: bool = False,
):
allocator = tree_cache.token_to_kv_pool_allocator
evict_from_tree_cache(tree_cache, num_tokens)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why evicting proactively here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually evict_if_needed

         if self.token_to_kv_pool_allocator.available_size() < num_tokens:
                if self.tree_cache is not None:
                    self.tree_cache.evict(num_tokens)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can rename it a bit, it's actually check for availability and evict if needed

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will be the case that we want to evict nodes regardless of the availability

i also feel the eviction policy is something non-trivial so evict_from_tree_cache will encapsulate complex logic from upper code

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current self.tree_cache.evict is indeed to evict nodes to meet the requirement regardless of the availability, and I think we can probably keep eviction policy under the hood too like what we have for now

)

# Allocate memory
if self.token_to_kv_pool_allocator.page_size == 1:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I find batch,token_to_kv_pool_allocator.page_size not always equals to batch.tree_cache.page_size, then the paged config will go to the wrong allocation, breaks a lot of cases

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you let me know when the page sizes are different? we can add tests to capture this in the future

Copy link
Copy Markdown
Collaborator

@airMeng airMeng Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11313 (comment) seems the page sizes should be the same and this is just a bug.

But the question is why we need to access the same page size from different places? echos the suggestions here #11645 (comment)

def extend(reqs, model_runner):
# Create dummy tree_cache for benchmarks (no prefix caching, just allocation)
dummy_tree_cache = SimpleNamespace(
page_size=1,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why hard-code to 1 instead page_size=model_runner.server_args.page_size ?

Copy link
Copy Markdown
Collaborator Author

@cctry cctry Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. wIll fix this

@merrymercy merrymercy mentioned this pull request Oct 23, 2025
1 task
lpc0220 pushed a commit to lpc0220/sglang that referenced this pull request Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants