Conversation
e80b2de to
9c28d33
Compare
|
@Ying1123 May you help resolve the conflicts? Thanks! |
5e9c58a to
172c25f
Compare
There was a problem hiding this comment.
I think style-wise this is a bit confusing, if we define the allocator to be in charge of the agnostic memory operations and define another memory pool class for the underlying layouts, we should be using allocators consistently in scheduler and only use memory pool at lower level codes.
There was a problem hiding this comment.
Re @xiezhq-hermann: Let us merge this first to reduce the code divergence. Feel free to refactor it later with a better design.
There was a problem hiding this comment.
how about token_to_kv_indices_pool 😂
14f5d33 to
5d9a5ec
Compare
test.py import openai Chat completionstart = time.time() |
Thanks very much. |
| self.disable_radix_cache = True | ||
| self.chunked_prefill_size = -1 | ||
| if self.max_running_requests is None: | ||
| self.max_running_requests = 32 |
There was a problem hiding this comment.
This setting may affect throughput, especially for throughput oriented model like DeepSeek. I tried request rate 16 on ShareGPT datasets, the TTFT is higher and throughput is lower with this limit. We may need to figure out a solution to enable for larger batch sizes.
We will verify it for TP 16. |
|
@mpjlu Could you help test the latest commit again? |
There was a problem hiding this comment.
Re @xiezhq-hermann: Let us merge this first to reduce the code divergence. Feel free to refactor it later with a better design.
|
|
||
| class BaseTokenToKVPool: | ||
| class TokenToKVPoolAllocator: | ||
| """A memory pool that maps a token location to its kv cache data.""" |
There was a problem hiding this comment.
How about "A memory pool that stores free slots in the kv cache data"?
Co-authored-by: Ke Bao <ISPObaoke@163.com>
Prefix caching and chunked prefill will be compatible with eagle speculative decoding after this PR.
Co-authored-by: SangBin Cho rkooo567@gmail.com
Co-authored-by: Sehoon Kim kssteven418@gmail.com
Co-authored-by: Lianmin Zheng lianminzheng@gmail.com