Fix test_kv_sharing_fast_prefill flakiness #22038

sarckk · 2025-07-31T22:17:25Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

This unit test, added in #21590, checks for exact outputs from two LLM instances. However, due to inherent randomness in vLLM (e.g. scheduling), sometimes the test fails, making it flaky. Following vLLM's reproducibility guide, make this less flaky.

The key was to turn off multiprocessing to make scheduling deterministic with VLLM_ENABLE_V1_MULTIPROCESSING=0. However, this requires further changes to release GPU memory for second LLM instance (as unlike with multiprocessing, there is no separate worker process that gets killed to free memory).

Follow up required: had to disable enforce_eager=False test case as deleting model and KV caches ref doesn't seem to remove all references, so memory doesn't get freed.

Test Plan

Ran pytest tests/v1/e2e/test_kv_sharing_fast_prefill.py::test_kv_sharing_fast_prefill 10 times

Test Result

Before PR, some runs would fail. After PR, all 10 runs pass.

github-actions · 2025-07-31T22:17:33Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request effectively addresses the flakiness of test_kv_sharing_fast_prefill by enforcing deterministic execution. The method of disabling multiprocessing and setting seeds is appropriate for achieving reproducibility.

My primary feedback concerns the new cleanup function. While it solves the immediate memory release issue, its approach of directly manipulating deep internal state of the LLM engine is fragile and creates a maintenance risk. I've added a comment recommending the creation of a proper teardown API on the LLM class as a more robust long-term solution. The current implementation is acceptable as a temporary measure to fix the test, but this technical debt should be addressed.

tests/v1/e2e/test_kv_sharing_fast_prefill.py

sarckk · 2025-07-31T22:19:36Z

@DarkLight1337 is there a recommended way to release GPU memory between non-multiprocessing LLM instances? I tried solutions in:

#1908
#16543
#6544
#3281

but none of them worked, and the only thing that worked was a hacky solution (see gemini comment above) that involves deleting references to model and KV caches from the model runner.

tests/v1/e2e/test_kv_sharing_fast_prefill.py

sarckk · 2025-08-01T23:53:54Z

CI test failure seems unrelated:


[2025-08-01T22:33:04Z] ______________ ERROR collecting tests/v1/engine/test_async_llm.py ______________
--
  | [2025-08-01T22:33:04Z] ImportError while importing test module '/vllm-workspace/tests/v1/engine/test_async_llm.py'.
  | [2025-08-01T22:33:04Z] Hint: make sure your test modules/packages have valid Python names.
  | [2025-08-01T22:33:04Z] Traceback:
  | [2025-08-01T22:33:04Z] /usr/lib/python3.12/importlib/__init__.py:90: in import_module
  | [2025-08-01T22:33:04Z]     return _bootstrap._gcd_import(name[level:], package, level)
  | [2025-08-01T22:33:04Z] v1/engine/test_async_llm.py:19: in <module>
  | [2025-08-01T22:33:04Z]     from vllm.v1.engine.async_llm import AsyncLLM
  | [2025-08-01T22:33:04Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:32: in <module>
  | [2025-08-01T22:33:04Z]     from vllm.v1.engine.core_client import EngineCoreClient
  | [2025-08-01T22:33:04Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:32: in <module>
  | [2025-08-01T22:33:04Z]     from vllm.v1.engine.core import EngineCore, EngineCoreProc
  | [2025-08-01T22:33:04Z] /usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py:22: in <module>
  | [2025-08-01T22:33:04Z]     from vllm.executor.multiproc_worker_utils import _add_prefix
  | [2025-08-01T22:33:04Z] E   ImportError: cannot import name '_add_prefix' from 'vllm.executor.multiproc_worker_utils' (/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_worker_utils.py)

DarkLight1337 · 2025-08-02T02:36:32Z

Can you merge from main to fix CI?

Signed-off-by: Yong Hoon Shin <[email protected]>

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Noam Gat <[email protected]>

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Paul Pak <[email protected]>

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

Signed-off-by: Yong Hoon Shin <[email protected]>

sarckk mentioned this pull request Jul 31, 2025

Override attention metadata for fast prefill in some KV sharing setups #21590

Merged

4 tasks

mergify bot added the v1 label Jul 31, 2025

gemini-code-assist bot reviewed Jul 31, 2025

View reviewed changes

tests/v1/e2e/test_kv_sharing_fast_prefill.py Outdated Show resolved Hide resolved

sarckk changed the title ~~Reduce test_kv_sharing_fast_prefill flakiness~~ Fix test_kv_sharing_fast_prefill flakiness Jul 31, 2025

sarckk mentioned this pull request Aug 1, 2025

[CI Failure]: Flaky tests/v1/e2e/test_kv_sharing_fast_prefill.py test_kv_sharing_fast_prefill[False] #22104

Closed

3 tasks

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 1, 2025

heheda12345 reviewed Aug 1, 2025

View reviewed changes

tests/v1/e2e/test_kv_sharing_fast_prefill.py Outdated Show resolved Hide resolved

heheda12345 reviewed Aug 1, 2025

View reviewed changes

tests/v1/e2e/test_kv_sharing_fast_prefill.py Outdated Show resolved Hide resolved

sarckk added 3 commits August 1, 2025 20:01

Improve reproducibility for flaky kv sharing test

edba46f

Signed-off-by: Yong Hoon Shin <[email protected]>

Make setenv local to test

e2f8a01

Signed-off-by: Yong Hoon Shin <[email protected]>

Remove debug code

0c2a533

Signed-off-by: Yong Hoon Shin <[email protected]>

sarckk force-pushed the fix-flaky-kv-sharing-test branch from 00d9b4c to 0c2a533 Compare August 2, 2025 03:01

DarkLight1337 approved these changes Aug 2, 2025

View reviewed changes

vllm-bot merged commit 8564dc9 into vllm-project:main Aug 2, 2025
16 of 18 checks passed

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

Fix test_kv_sharing_fast_prefill flakiness (vllm-project#22038)

5500c45

Signed-off-by: Yong Hoon Shin <[email protected]>

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

Fix test_kv_sharing_fast_prefill flakiness (vllm-project#22038)

aab9737

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025

Fix test_kv_sharing_fast_prefill flakiness (vllm-project#22038)

00530fa

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Noam Gat <[email protected]>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

Fix test_kv_sharing_fast_prefill flakiness (vllm-project#22038)

bbfb806

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Paul Pak <[email protected]>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

Fix test_kv_sharing_fast_prefill flakiness (vllm-project#22038)

7d03416

Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

Fix test_kv_sharing_fast_prefill flakiness (vllm-project#22038)

8ec54e6

Signed-off-by: Yong Hoon Shin <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

Fix test_kv_sharing_fast_prefill flakiness (vllm-project#22038)

d362e86

Signed-off-by: Yong Hoon Shin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix test_kv_sharing_fast_prefill flakiness #22038

Fix test_kv_sharing_fast_prefill flakiness #22038

Uh oh!

sarckk commented Jul 31, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

sarckk commented Jul 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

sarckk commented Aug 1, 2025

Uh oh!

DarkLight1337 commented Aug 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Fix test_kv_sharing_fast_prefill flakiness #22038

Fix test_kv_sharing_fast_prefill flakiness #22038

Uh oh!

Conversation

sarckk commented Jul 31, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

sarckk commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sarckk commented Aug 1, 2025

Uh oh!

DarkLight1337 commented Aug 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sarckk commented Jul 31, 2025 •

edited by github-actions bot

Loading

sarckk commented Jul 31, 2025 •

edited

Loading