Separate allocation logic from scheduler#11313
Conversation
| backup_state: bool = False, | ||
| ): | ||
| allocator = tree_cache.token_to_kv_pool_allocator | ||
| evict_from_tree_cache(tree_cache, num_tokens) |
There was a problem hiding this comment.
why evicting proactively here?
There was a problem hiding this comment.
this is actually evict_if_needed
if self.token_to_kv_pool_allocator.available_size() < num_tokens:
if self.tree_cache is not None:
self.tree_cache.evict(num_tokens)
There was a problem hiding this comment.
maybe we can rename it a bit, it's actually check for availability and evict if needed
There was a problem hiding this comment.
what will be the case that we want to evict nodes regardless of the availability
i also feel the eviction policy is something non-trivial so evict_from_tree_cache will encapsulate complex logic from upper code
There was a problem hiding this comment.
the current self.tree_cache.evict is indeed to evict nodes to meet the requirement regardless of the availability, and I think we can probably keep eviction policy under the hood too like what we have for now
| ) | ||
|
|
||
| # Allocate memory | ||
| if self.token_to_kv_pool_allocator.page_size == 1: |
There was a problem hiding this comment.
Hi, I find batch,token_to_kv_pool_allocator.page_size not always equals to batch.tree_cache.page_size, then the paged config will go to the wrong allocation, breaks a lot of cases
There was a problem hiding this comment.
could you let me know when the page sizes are different? we can add tests to capture this in the future
There was a problem hiding this comment.
#11313 (comment) seems the page sizes should be the same and this is just a bug.
But the question is why we need to access the same page size from different places? echos the suggestions here #11645 (comment)
| def extend(reqs, model_runner): | ||
| # Create dummy tree_cache for benchmarks (no prefix caching, just allocation) | ||
| dummy_tree_cache = SimpleNamespace( | ||
| page_size=1, |
There was a problem hiding this comment.
why hard-code to 1 instead page_size=model_runner.server_args.page_size ?
There was a problem hiding this comment.
Thanks for pointing out. wIll fix this
Motivation
Preparation for mem_cache V2.
The allocation logic is moved to
mem_cache/fromschedule_batch.py. Ideally, scheduling code should only interact withtree_cachein V1 andmemory_managerin V2.Modifications
mem_cache/common.pyfor allocation functions operating allocator and tree cachealloc_for_extendandalloc_for_decodeprepare_for_decode, the increment ofseqlenis moved afteralloc_for_decodefor clarityprepare_for_extend, some allocation-needed fields are set beforealloc_for_extendbench_one_batch.py, create a dummytree_cacheas the placeholderAccuracy Tests
Benchmarking and Profiling
Checklist