Skip to content

[Tests] Replace flaky sleep with polling in test_background_cancel#32986

Merged
robertgshaw2-redhat merged 2 commits intovllm-project:mainfrom
sjhddh:tests/replace-sleep-with-polling
Jan 24, 2026
Merged

[Tests] Replace flaky sleep with polling in test_background_cancel#32986
robertgshaw2-redhat merged 2 commits intovllm-project:mainfrom
sjhddh:tests/replace-sleep-with-polling

Conversation

@sjhddh
Copy link
Copy Markdown
Contributor

@sjhddh sjhddh commented Jan 23, 2026

Summary

Replace the fixed 0.5s sleep (marked with # FIXME: This test can be flaky) with a proper polling loop that waits for the response status to change before attempting to cancel.

Changes

  • Poll with 0.1s intervals until status changes from "queued" (max 5s)
  • Use proper assertions in the post-cancel verification loop
  • Remove FIXME comment as the flakiness is addressed

Motivation

  • Fixed sleeps are unreliable: too short on slow machines, wasteful on fast ones
  • Polling adapts to actual server response times
  • Reduces test flakiness in CI

Test Plan

  • Run pytest tests/v1/entrypoints/openai/serving_responses/test_stateful.py::test_background_cancel multiple times
  • Verify test passes consistently

Risk

  • Low risk: only changes test synchronization logic
  • No changes to production code

@mergify mergify bot added the v1 label Jan 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the reliability of test_background_cancel by replacing a fixed sleep with a polling mechanism. This is a great change to address test flakiness. I've suggested a further refinement to the polling loop to use a monotonic clock for more accurate timeout handling, which will make the test even more robust.

Comment on lines +74 to +83
max_wait_seconds = 5.0
poll_interval = 0.1
elapsed = 0.0
while elapsed < max_wait_seconds:
await asyncio.sleep(poll_interval)
elapsed += poll_interval
response = await client.responses.retrieve(response.id)
if response.status != "queued":
# Started processing or completed - try to cancel
break
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current polling loop uses elapsed += poll_interval to track time, but this doesn't account for the time spent in the client.responses.retrieve(response.id) call. This can cause the effective timeout to be significantly longer than max_wait_seconds if the API call is slow, potentially slowing down tests in CI.

Using a monotonic clock like asyncio.get_running_loop().time() provides a more accurate and robust timeout mechanism. I've also reordered the loop to check the condition before sleeping, which is slightly more efficient.

Suggested change
max_wait_seconds = 5.0
poll_interval = 0.1
elapsed = 0.0
while elapsed < max_wait_seconds:
await asyncio.sleep(poll_interval)
elapsed += poll_interval
response = await client.responses.retrieve(response.id)
if response.status != "queued":
# Started processing or completed - try to cancel
break
loop = asyncio.get_running_loop()
start_time = loop.time()
max_wait_seconds = 5.0
poll_interval = 0.1
while loop.time() - start_time < max_wait_seconds:
response = await client.responses.retrieve(response.id)
if response.status != "queued":
# Started processing or completed - try to cancel
break
await asyncio.sleep(poll_interval)

Replace the fixed 0.5s sleep (which was marked as flaky) with a proper
polling loop that waits for the response status to change before
attempting to cancel.

This makes the test more deterministic:
- Poll with 0.1s intervals until status changes from "queued"
- Use proper assertions in the post-cancel verification loop
- Remove FIXME comment as the flakiness is addressed

Signed-off-by: 7. Sun <jhao.sun@gmail.com>
@sjhddh sjhddh force-pushed the tests/replace-sleep-with-polling branch from fcf361f to 20e45e4 Compare January 23, 2026 23:24
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) January 24, 2026 15:04
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 24, 2026
@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

Thanks for your contribution

@robertgshaw2-redhat robertgshaw2-redhat merged commit cd775bd into vllm-project:main Jan 24, 2026
21 checks passed
ms1design pushed a commit to ms1design/vllm that referenced this pull request Jan 24, 2026
…llm-project#32986)

Signed-off-by: 7. Sun <jhao.sun@gmail.com>
Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
cwazai pushed a commit to cwazai/vllm that referenced this pull request Jan 25, 2026
…llm-project#32986)

Signed-off-by: 7. Sun <jhao.sun@gmail.com>
Signed-off-by: 陈建华 <1647430658@qq.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants