[Tests] Replace flaky sleep with polling in test_background_cancel#32986
Conversation
There was a problem hiding this comment.
Code Review
This pull request improves the reliability of test_background_cancel by replacing a fixed sleep with a polling mechanism. This is a great change to address test flakiness. I've suggested a further refinement to the polling loop to use a monotonic clock for more accurate timeout handling, which will make the test even more robust.
| max_wait_seconds = 5.0 | ||
| poll_interval = 0.1 | ||
| elapsed = 0.0 | ||
| while elapsed < max_wait_seconds: | ||
| await asyncio.sleep(poll_interval) | ||
| elapsed += poll_interval | ||
| response = await client.responses.retrieve(response.id) | ||
| if response.status != "queued": | ||
| # Started processing or completed - try to cancel | ||
| break |
There was a problem hiding this comment.
The current polling loop uses elapsed += poll_interval to track time, but this doesn't account for the time spent in the client.responses.retrieve(response.id) call. This can cause the effective timeout to be significantly longer than max_wait_seconds if the API call is slow, potentially slowing down tests in CI.
Using a monotonic clock like asyncio.get_running_loop().time() provides a more accurate and robust timeout mechanism. I've also reordered the loop to check the condition before sleeping, which is slightly more efficient.
| max_wait_seconds = 5.0 | |
| poll_interval = 0.1 | |
| elapsed = 0.0 | |
| while elapsed < max_wait_seconds: | |
| await asyncio.sleep(poll_interval) | |
| elapsed += poll_interval | |
| response = await client.responses.retrieve(response.id) | |
| if response.status != "queued": | |
| # Started processing or completed - try to cancel | |
| break | |
| loop = asyncio.get_running_loop() | |
| start_time = loop.time() | |
| max_wait_seconds = 5.0 | |
| poll_interval = 0.1 | |
| while loop.time() - start_time < max_wait_seconds: | |
| response = await client.responses.retrieve(response.id) | |
| if response.status != "queued": | |
| # Started processing or completed - try to cancel | |
| break | |
| await asyncio.sleep(poll_interval) |
Replace the fixed 0.5s sleep (which was marked as flaky) with a proper polling loop that waits for the response status to change before attempting to cancel. This makes the test more deterministic: - Poll with 0.1s intervals until status changes from "queued" - Use proper assertions in the post-cancel verification loop - Remove FIXME comment as the flakiness is addressed Signed-off-by: 7. Sun <jhao.sun@gmail.com>
fcf361f to
20e45e4
Compare
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
|
Thanks for your contribution |
…llm-project#32986) Signed-off-by: 7. Sun <jhao.sun@gmail.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
…llm-project#32986) Signed-off-by: 7. Sun <jhao.sun@gmail.com> Signed-off-by: 陈建华 <1647430658@qq.com>
…llm-project#32986) Signed-off-by: 7. Sun <jhao.sun@gmail.com>
Summary
Replace the fixed 0.5s sleep (marked with
# FIXME: This test can be flaky) with a proper polling loop that waits for the response status to change before attempting to cancel.Changes
Motivation
Test Plan
pytest tests/v1/entrypoints/openai/serving_responses/test_stateful.py::test_background_cancelmultiple timesRisk