Skip to content

Add tests for chunked prefill and prefix cache with causal pooling models#26526

Merged
noooop merged 4 commits intovllm-project:mainfrom
maxdebayser:test_pooler_chunked_prefill
Oct 13, 2025
Merged

Add tests for chunked prefill and prefix cache with causal pooling models#26526
noooop merged 4 commits intovllm-project:mainfrom
maxdebayser:test_pooler_chunked_prefill

Conversation

@maxdebayser
Copy link
Copy Markdown
Contributor

@maxdebayser maxdebayser commented Oct 9, 2025

Addresses: #23436

This test uses Qwen3-Embedding-0.6B which has causal attention and LAST pooling, and therefore supports these features. To make these verifications, the test creates an interceptor Pooler which verifies if the prompt processing is done in one go or piecewise. For prefix caching it verifies if the number of tokens that it has seen is less than expected.

@ArkVex, this issue has been open since August so I finished the implementation here but I've added you as co-author. Can you take a look and see if you agree with the changes?

cc: @noooop

…dels

This test uses Qwen3-Embedding-0.6B which has causal attention
and LAST pooling, and therefore supports these features.
To make these verifications, the test creates an interceptor
Pooler which verifies if the prompt processing is done in
one go or piecewise. For prefix caching it verifies if
the number of tokens that it has seen is less than expected.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds end-to-end tests for chunked prefill and prefix caching with pooling models. The tests use a wrapper to intercept calls and verify the behavior. The implementation is mostly correct, but I found one issue in the prefix cache test where a hardcoded value is used, making the test brittle. My review provides a suggestion to make the test more robust by dynamically getting the value from the configuration.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@noooop noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 13, 2025
@noooop
Copy link
Copy Markdown
Collaborator

noooop commented Oct 13, 2025

e2e testing is now running on CPU LOL

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@noooop noooop enabled auto-merge (squash) October 13, 2025 15:52
Copy link
Copy Markdown
Collaborator

@noooop noooop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fascinating work

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
auto-merge was automatically disabled October 13, 2025 18:56

Head branch was pushed to by a user without write access

@maxdebayser
Copy link
Copy Markdown
Contributor Author

I've added an annotation to skip if CPU is detected, let me know if that's ok. The tests are also running on GPU in the "V1 Test e2e + engine". There was a successful execution here: https://buildkite.com/vllm/ci/builds/34649/steps/canvas?jid=0199de40-8310-4e84-9cf7-624a5d1922b0

@noooop noooop merged commit d8bebb0 into vllm-project:main Oct 13, 2025
21 checks passed
1994 pushed a commit to 1994/vllm that referenced this pull request Oct 14, 2025
…dels (vllm-project#26526)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
Signed-off-by: 1994 <1994@users.noreply.github.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
…dels (vllm-project#26526)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
@ArkVex
Copy link
Copy Markdown
Contributor

ArkVex commented Oct 14, 2025

@maxdebayser thanks for the PR i sure have a lot to learn i will come back with another good Pull Request for sure thankyou for giving me a chance.

@maxdebayser
Copy link
Copy Markdown
Contributor Author

No problem, @ArkVex . Thanks for getting this PR started. I had to bring it home for time reasons, but I hope you'll pick up other issues to work on. It sure is a steep learning curve, but it's rewarding.

bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…dels (vllm-project#26526)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…dels (vllm-project#26526)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…dels (vllm-project#26526)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…dels (vllm-project#26526)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…dels (vllm-project#26526)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…dels (vllm-project#26526)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…dels (vllm-project#26526)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants