Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
The page_size>1 still have some issue, will fix it later. |
|
@xiezhq-hermann @merrymercy This PR is ready for review. |
| if value is None: | ||
| value = torch.tensor(key.token_ids, dtype=torch.int64) | ||
|
|
||
| if self.is_eagle: |
There was a problem hiding this comment.
hiradix (and other trees like swa) override the insert function, would that be a problem since eagle worker shared the same tree?
There was a problem hiding this comment.
Yes, in current design, we need to adapt this change to other trees like swa and hiradix if they override these functions. This PR just makes the main radix tree ready. HiCache and swa need extra work and test to make them ready.
|
|
||
| return self._insert_helper(self.root_node, key, value) | ||
|
|
||
| def cache_finished_req(self, req: Req): |
There was a problem hiding this comment.
while hiradix does not, swa tree override this implementation as well
There was a problem hiding this comment.
The swa cache inherits from BaseRadixCache, so it seems all the changes should be implemented again on it. HiCache is from RadixCache, we just need to do some adaptation on it with less override. But for HiCache, the main thing I'm concerning is that the chunked prefill size is a little changed. If the chunked prefill size is 64, actually only 63 bigram keys are inserted to the tree. Maybe it's not efficient for cache offloading with block.
There was a problem hiding this comment.
My understanding is what we are doing is primarily to resolve conflict with eagle workers since it shares the same radix tree but has its own pool, but not to have hicache support for eagle workers, i.e., eagle workers to fetch kv caches from host memory, which seems unnecessary and potentially complicated. Is it correct?
There was a problem hiding this comment.
I agree that the kv cache for eagle worker is unnecessary to store into host memory since it's only one layer. If we use HiCache only for target model, can we still share the kv indices between target and draft pool?
There was a problem hiding this comment.
yes I think it should be fine just wanted to confirm that we are aligned on this
Motivation
main:
this PR:
compatibility test: