-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Optimize evictor-v2 performance #7193
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
thanks for the contribution! |
52379a2
to
8f387b2
Compare
8f387b2
to
0856f66
Compare
Looks good to me, although the NeuralMagic folks have better understanding of the prefix caching paths. cc @robertgshaw2-neuralmagic |
Looks pretty reasonable to me, and the test also passed. I will go ahead to merge this. thanks again @xiaobochen123 for the contribution! |
Signed-off-by: Alvant <[email protected]>
Using the AutoPrefixCache, the block_manager_v2 performs worse than v1.
The self.free_table in evictor_v2::LRUEvictor is OrderedDict class that remembers the order in which keys were first inserted. The larger timestamps will be at the end.
The reason V2 slower than V1 , is that V2 will go through all the free_table, in evict.
V2 has the 'update', It breaks the order. So we can move the block to the end when update. That can keep the lowest timestamp at the start.