[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset#37335
Conversation
… before cache reset Signed-off-by: Andreas Karatzas <akaratza@amd.com>
There was a problem hiding this comment.
Code Review
This pull request aims to stabilize the test_cpu_offloading test, which was intermittently failing on ROCm due to a race condition with asynchronous GPU-to-CPU offloading. The changes introduce a robust waiting mechanism, _wait_for_prefix_cache_reset, that polls until reset_prefix_cache() succeeds, ensuring the prefix cache is cleared only after the async offload is in a ready state. Additionally, timeouts have been increased for ROCm to accommodate slower operations under CI load, and a hard timeout has been added to event collection to prevent test hangs. The logic appears sound and the changes should effectively address the reported test flakiness. I found no high or critical issues in this pull request.
…e cache reset (vllm-project#37335) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…e cache reset (vllm-project#37335) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…e cache reset (vllm-project#37335) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…e cache reset (vllm-project#37335) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…e cache reset (vllm-project#37335) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
…e cache reset (vllm-project#37335) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…e cache reset (vllm-project#37335) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>
…e cache reset (vllm-project#37335) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>
…e cache reset (vllm-project#37335) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
test_cpu_offloading[TRITON_ATTN-48]was intermittently failing becausereset_prefix_cache()was called while the async GPU-to-CPU offload was still in progress, returningFalse(silently ignored). This meant the GPU prefix cache was never actually cleared, no new CPU stored events were produced, andassert subscriber.get_new_cpu_stored_events()failed with an empty list.Add
_wait_for_prefix_cache_reset()that retries with a timeout until blocks are freed and the reset succeeds.Use longer timeouts on ROCm where async offloads are slower under CI load.
Pass
max_num_seqs=1on ROCm to reduce batch variance.Test plan
pytest -s -v tests/v1/kv_offload/test_cpu_offloading.pycc @kenroche