[FIX_FOR_VLLM_CUSTOM=a78b842d0e85d287176031334f4721cd96b6e47d] Fix offloading_connector test flush assertion for load transfers#1468
Conversation
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates KV offload test utilities and scheduler tests to correctly account for “flush” behavior in both store and load transfer paths, especially under async scheduling.
Changes:
- Extend transfer parsing to record flushed GPU blocks for both store-flush and load-flush scenarios.
- Adjust scheduler tests’ expected flushed GPU block indexes depending on sync vs async scheduling.
- Add an assertion to ensure pending jobs are drained after request completion.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/unit_tests/kv_offload/offloading_connector/utils.py | Updates _parse_transfers to handle load-flush transfers in addition to store-flush. |
| tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py | Updates expected flush behavior for async scheduling and adds a pending-jobs drain assertion. |
88f030e to
0d5d656
Compare
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
0d5d656 to
e8f321d
Compare
e8f321d to
2850195
Compare
Upstream PR #42611 changed get_flushed_transfers() to return both store and load flushes. The vllm-gaudi test assumed only store flushes, causing isInstance assertion failures. Mirror the upstream fix: handle both flush types in _parse_transfers and update test expectations. Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>
2850195 to
98e5637
Compare
✅ CI PassedAll checks passed successfully against the following vllm commit: |
…floading_connector test flush assertion for load transfers (vllm-project#1468) Upstream vLLM PR vllm-project/vllm#42611 ("Flush all pending jobs on last step") changed \`get_flushed_transfers()\` to return both store and load flushes. The vllm-gaudi copy of the offloading_connector unit tests assumed only store flushes, causing: 1. \`AssertionError\` in \`utils.py\` \`_parse_transfers\` (\`isinstance(src_spec, GPULoadStoreSpec)\` assert fails on load flushes) 2. \`flushed_gpu_block_indexes\` mismatch in \`test_scheduler\` tests **Fix**: Mirror the upstream change — replace the assert with an \`if/else\` handling both store and load flush types, and add \`expected_flushed_gpu_block_indexes\` to affected tests. Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: 12010486 <silvia.colabrese@intel.com>
Upstream vLLM PR vllm-project/vllm#42611 ("Flush all pending jobs on last step") changed `get_flushed_transfers()` to return both store and load flushes. The vllm-gaudi copy of the offloading_connector unit tests assumed only store flushes, causing:
Fix: Mirror the upstream change — replace the assert with an `if/else` handling both store and load flush types, and add `expected_flushed_gpu_block_indexes` to affected tests.