[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation delays by AndreasKaratzas · Pull Request #35052 · vllm-project/vllm

AndreasKaratzas · 2026-02-22T07:49:19Z

test_multi_chunk_streaming and test_empty_commit_does_not_crash_engine in entrypoints/openai/test_realtime_validation.py (Entrypoints Integration Test - API Server 1) intermittently fail with TimeoutError on ROCm builds.

The root cause is that aiter modules are JIT-compiled on the first inference request. The original test timeouts (30-60s) are insufficient when this compilation happens during the test's critical path.

Fix

Add a warm-up step to test_multi_chunk_streaming: send a small audio chunk before the real transcription to absorb JIT compilation latency, with a generous 360s timeout.
Increase the first-request timeout in test_empty_commit_does_not_crash_engine from 30s to 360s since the empty commit triggers the first inference (and thus JIT compilation).
Wait for session.updated after session.update to avoid racing the server's session setup.
Preserve the non-final input_audio_buffer.commit before sending audio, as it is required by the protocol to start a transcription session.

… delays Signed-off-by: Andreas Karatzas <akaratza@amd.com>

dosubot · 2026-02-22T07:49:28Z

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

gemini-code-assist

Code Review

This pull request addresses intermittent test timeouts on ROCm builds by introducing a warm-up step and increasing timeouts to account for JIT compilation delays. The changes are logical and directly address the issue. However, I've found a recurring issue in the implementation of waiting for the session.updated event. The current try...except TimeoutError: pass pattern is duplicated in three places and can hide bugs or lead to flaky tests by silently ignoring timeouts. My review comments suggest a more robust implementation that explicitly fails the test on timeout, which aligns with the goal of preventing race conditions.

gemini-code-assist · 2026-02-22T07:50:37Z

tests/entrypoints/openai/test_realtime_validation.py

+            try:
+                while True:
+                    event = await receive_event(ws, timeout=5.0)
+                    if event["type"] == "session.updated":
+                        break
+            except TimeoutError:
+                pass


The try...except TimeoutError: pass block is problematic. It will silently ignore a timeout if the session.updated event is not received within 5 seconds. This defeats the purpose of waiting for the event to avoid race conditions, as stated in the pull request description. If a timeout occurs, the test will proceed, potentially leading to flaky failures or hiding underlying issues.

To make the test more robust, the TimeoutError should be handled by explicitly failing the test. This ensures that the absence of the session.updated event is caught and reported. This pattern is repeated elsewhere in the file and should be fixed in all locations.

Suggested change

try:

while True:

event = await receive_event(ws, timeout=5.0)

if event["type"] == "session.updated":

break

except TimeoutError:

pass

try:

while True:

event = await receive_event(ws, timeout=5.0)

if event["type"] == "session.updated":

break

except TimeoutError:

pytest.fail("Timed out waiting for session.updated event.")

gemini-code-assist · 2026-02-22T07:50:37Z

tests/entrypoints/openai/test_realtime_validation.py

+            try:
+                while True:
+                    event = await receive_event(ws, timeout=5.0)
+                    if event["type"] == "session.updated":
+                        break
+            except TimeoutError:
+                pass


As with the previous occurrence, this try...except TimeoutError: pass block can lead to flaky tests or hide bugs by not ensuring the session.updated event is received. The test should explicitly fail if a timeout occurs to ensure reliability.

Suggested change

try:

while True:

event = await receive_event(ws, timeout=5.0)

if event["type"] == "session.updated":

break

except TimeoutError:

pass

try:

while True:

event = await receive_event(ws, timeout=5.0)

if event["type"] == "session.updated":

break

except TimeoutError:

pytest.fail("Timed out waiting for session.updated event.")

gemini-code-assist · 2026-02-22T07:50:37Z

tests/entrypoints/openai/test_realtime_validation.py

+            try:
+                while True:
+                    event = await receive_event(ws, timeout=5.0)
+                    if event["type"] == "session.updated":
+                        break
+            except TimeoutError:
+                pass


This is the third instance of the problematic try...except TimeoutError: pass pattern. To ensure test reliability and prevent race conditions, the test should fail explicitly if the session.updated event is not received within the timeout period.

Suggested change

try:

while True:

event = await receive_event(ws, timeout=5.0)

if event["type"] == "session.updated":

break

except TimeoutError:

pass

try:

while True:

event = await receive_event(ws, timeout=5.0)

if event["type"] == "session.updated":

break

except TimeoutError:

pytest.fail("Timed out waiting for session.updated event.")

Good suggestion. I would not fail the test necessarily, but I added a warning.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-02-22T07:59:23Z

Motivation: https://buildkite.com/vllm/amd-ci/builds/5158/steps/canvas?sid=019c7f00-2d73-4581-8dd4-1dc9a30917fc&tab=output

Inspired by: #34922

… delays (vllm-project#35052) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

… delays (vllm-project#35052) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

… delays (vllm-project#35052) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation…

02086e1

… delays Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas requested review from DarkLight1337, NickLucche, aarnphm and robertgshaw2-redhat as code owners February 22, 2026 07:49

mergify bot added the rocm Related to AMD ROCm label Feb 22, 2026

github-project-automation bot added this to AMD Feb 22, 2026

github-project-automation bot moved this to Todo in AMD Feb 22, 2026

gemini-code-assist bot reviewed Feb 22, 2026

View reviewed changes

AndreasKaratzas added 2 commits February 22, 2026 01:55

Added warning instead of silent pass

32f22c2

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Added warning instead of silent pass

a3b7619

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

DarkLight1337 approved these changes Feb 22, 2026

View reviewed changes

DarkLight1337 enabled auto-merge (squash) February 22, 2026 08:40

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 22, 2026

DarkLight1337 merged commit dd8c3a7 into vllm-project:main Feb 22, 2026
16 checks passed

github-project-automation bot moved this from Todo to Done in AMD Feb 22, 2026

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Feb 22, 2026

[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation…

64f6b6d

… delays (vllm-project#35052) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas deleted the akaratza_entrypoints_api_server_i branch February 22, 2026 21:21

jmamou pushed a commit to jmamou/vllm that referenced this pull request Feb 23, 2026

[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation…

3cb2543

… delays (vllm-project#35052) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation…

78ba904

… delays (vllm-project#35052) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026

[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation…

bc2fecd

… delays (vllm-project#35052) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026

[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation…

a7217a7

… delays (vllm-project#35052) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation delays#35052

[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation delays#35052
DarkLight1337 merged 3 commits intovllm-project:mainfrom
ROCm:akaratza_entrypoints_api_server_i

AndreasKaratzas commented Feb 22, 2026 •

edited by github-actions bot

Loading

Uh oh!

dosubot bot commented Feb 22, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 22, 2026

Uh oh!

gemini-code-assist bot Feb 22, 2026

Uh oh!

gemini-code-assist bot Feb 22, 2026

Uh oh!

AndreasKaratzas Feb 22, 2026

Uh oh!

AndreasKaratzas commented Feb 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

AndreasKaratzas commented Feb 22, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix

Uh oh!

dosubot bot commented Feb 22, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AndreasKaratzas commented Feb 22, 2026 •

edited by github-actions bot

Loading

AndreasKaratzas commented Feb 22, 2026 •

edited

Loading