Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Optimize evictor-v2 performance #7193

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

xiaobochen123
Copy link
Contributor

@xiaobochen123 xiaobochen123 commented Aug 6, 2024

Using the AutoPrefixCache, the block_manager_v2 performs worse than v1.

  • llama-3.1-8b, H800
  • Test 3510 cases from mmlu dataset
llm = LLM(
        model=path,
        tensor_parallel_size=1,
        trust_remote_code=True,
        gpu_memory_utilization=0.8,
        max_num_seqs=512,
        enable_prefix_caching=True,
        use_v2_block_manager=XXXX,
    )
​
sampling_params = SamplingParams(temperature=1.0, max_tokens=1)
​
mmlu_dataset = [...] # 3510 cases from mmlu
​
outputs = llm.generate(
        sampling_params=sampling_params,
        prompt_token_ids=mmlu_dataset,
    )

image

The self.free_table in evictor_v2::LRUEvictor is OrderedDict class that remembers the order in which keys were first inserted. The larger timestamps will be at the end.

The reason V2 slower than V1 , is that V2 will go through all the free_table, in evict.

V2 has the 'update', It breaks the order. So we can move the block to the end when update. That can keep the lowest timestamp at the start.

Copy link

github-actions bot commented Aug 6, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@youkaichao
Copy link
Member

thanks for the contribution!

cc @cadedaniel @zhuohan123

@cadedaniel
Copy link
Collaborator

Looks good to me, although the NeuralMagic folks have better understanding of the prefix caching paths. cc @robertgshaw2-neuralmagic

@youkaichao
Copy link
Member

Looks pretty reasonable to me, and the test also passed. I will go ahead to merge this.

thanks again @xiaobochen123 for the contribution!

@youkaichao youkaichao merged commit 660470e into vllm-project:main Aug 6, 2024
28 checks passed
sfc-gh-mkeralapura pushed a commit to sfc-gh-mkeralapura/vllm that referenced this pull request Aug 12, 2024
kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024
fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants