Add tests for chunked prefill and prefix cache with causal pooling models by maxdebayser · Pull Request #26526 · vllm-project/vllm

maxdebayser · 2025-10-09T19:28:40Z

Addresses: #23436

This test uses Qwen3-Embedding-0.6B which has causal attention and LAST pooling, and therefore supports these features. To make these verifications, the test creates an interceptor Pooler which verifies if the prompt processing is done in one go or piecewise. For prefix caching it verifies if the number of tokens that it has seen is less than expected.

@ArkVex, this issue has been open since August so I finished the implementation here but I've added you as co-author. Can you take a look and see if you agree with the changes?

cc: @noooop

…dels This test uses Qwen3-Embedding-0.6B which has causal attention and LAST pooling, and therefore supports these features. To make these verifications, the test creates an interceptor Pooler which verifies if the prompt processing is done in one go or piecewise. For prefix caching it verifies if the number of tokens that it has seen is less than expected. Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com>

gemini-code-assist

Code Review

This pull request adds end-to-end tests for chunked prefill and prefix caching with pooling models. The tests use a wrapper to intercept calls and verify the behavior. The implementation is mostly correct, but I found one issue in the prefix cache test where a hardcoded value is used, making the test brittle. My review provides a suggestion to make the test more robust by dynamically getting the value from the configuration.

tests/v1/e2e/test_pooling_chunked_prefill.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

tests/v1/e2e/test_pooling_chunked_prefill.py

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

tests/v1/e2e/test_pooling_chunked_prefill.py

noooop · 2025-10-13T04:00:49Z

e2e testing is now running on CPU LOL

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

noooop

Thanks for this fascinating work

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

maxdebayser · 2025-10-13T18:57:46Z

I've added an annotation to skip if CPU is detected, let me know if that's ok. The tests are also running on GPU in the "V1 Test e2e + engine". There was a successful execution here: https://buildkite.com/vllm/ci/builds/34649/steps/canvas?jid=0199de40-8310-4e84-9cf7-624a5d1922b0

…dels (vllm-project#26526) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>

…dels (vllm-project#26526) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

ArkVex · 2025-10-14T21:06:24Z

@maxdebayser thanks for the PR i sure have a lot to learn i will come back with another good Pull Request for sure thankyou for giving me a chance.

maxdebayser · 2025-10-15T13:24:25Z

No problem, @ArkVex . Thanks for getting this PR started. I had to bring it home for time reasons, but I hope you'll pick up other issues to work on. It sure is a steep learning curve, but it's rewarding.

…dels (vllm-project#26526) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

…dels (vllm-project#26526) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com>

…dels (vllm-project#26526) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

…dels (vllm-project#26526) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com>

mergify bot added the v1 label Oct 9, 2025

maxdebayser mentioned this pull request Oct 9, 2025

test_chunked_prefill_pooler refrencing #23436 #24114

Closed

gemini-code-assist bot reviewed Oct 9, 2025

View reviewed changes

tests/v1/e2e/test_pooling_chunked_prefill.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Oct 9, 2025

View reviewed changes

tests/v1/e2e/test_pooling_chunked_prefill.py Outdated Show resolved Hide resolved

fix hardcoded (and wrong) block size

bb33a63

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

noooop reviewed Oct 9, 2025

View reviewed changes

tests/v1/e2e/test_pooling_chunked_prefill.py Show resolved Hide resolved

noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 13, 2025

Merge branch 'upstream_main' into test_pooler_chunked_prefill

9ef6aa6

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

noooop enabled auto-merge (squash) October 13, 2025 15:52

noooop approved these changes Oct 13, 2025

View reviewed changes

skip test on CPU

5bf42d5

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

auto-merge was automatically disabled October 13, 2025 18:56
Head branch was pushed to by a user without write access

noooop merged commit d8bebb0 into vllm-project:main Oct 13, 2025
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tests for chunked prefill and prefix cache with causal pooling models#26526

Add tests for chunked prefill and prefix cache with causal pooling models#26526
noooop merged 4 commits intovllm-project:mainfrom
maxdebayser:test_pooler_chunked_prefill

maxdebayser commented Oct 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

noooop commented Oct 13, 2025

Uh oh!

noooop left a comment

Uh oh!

maxdebayser commented Oct 13, 2025

Uh oh!

Uh oh!

ArkVex commented Oct 14, 2025

Uh oh!

maxdebayser commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

maxdebayser commented Oct 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

noooop commented Oct 13, 2025

Uh oh!

noooop left a comment

Choose a reason for hiding this comment

Uh oh!

maxdebayser commented Oct 13, 2025

Uh oh!

Uh oh!

ArkVex commented Oct 14, 2025

Uh oh!

maxdebayser commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxdebayser commented Oct 9, 2025 •

edited by github-actions bot

Loading