[ROCm][CI] Fix engine teardown and text normalization to stabilize voxtral test#37138
Conversation
…ation Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…ation Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…ation Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…ation Signed-off-by: Andreas Karatzas <akaratza@amd.com>
There was a problem hiding this comment.
Code Review
This pull request introduces several important fixes to stabilize the voxtral tests on ROCm. The changes include adding ROCm-specific engine arguments to ensure deterministic outputs, refactoring text normalization into a shared helper function, and improving assertion messages for better debugging. A key part of this PR is the fix for an engine teardown issue that caused OOM errors, which is addressed by converting fixtures to yield fixtures with cleanup logic.
My review focuses on the correctness of this new teardown logic. I've identified a critical issue in the shutdown sequence for the synchronous engine fixture that will likely prevent the resource leak from being fixed. Please see my detailed comment.
…ation Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Head branch was pushed to by a user without write access
…xtral test (vllm-project#37138) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…xtral test (vllm-project#37138) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…xtral test (vllm-project#37138) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Fixes two independent failures in
test_voxtral_realtime_forwardandtest_voxtral_realtime_generatorobserved on ROCm CI runs.ROCm non-determinism (
test_voxtral_realtime_forward)ROCM_ENGINE_KWARGStotests/utils.pyas the Python-API equivalent of the existingROCM_EXTRA_ARGS(CLI flags).max_num_seqs=1andenable_prefix_caching=Falseon ROCm via**ROCM_ENGINE_KWARGSinENGINE_CONFIG, eliminating batch variance and prefix-cache interference that caused non-deterministic outputs even attemperature=0.0.Missing text normalization (
test_voxtral_realtime_forward)"OBS"as"a base hit"and"oh, my"as"oh my", which are acoustically valid alternatives.test_voxtral_realtime_generatoralready handled this inline. Extracted into a shared_normalize()helper and applied to both tests to eliminate the drift.Engine teardown / OOM (
test_voxtral_realtime_generator)enginefixture had no teardown, so theLLMheld GPU memory when the async engine's subprocess tried to start, causingEngine core initialization failedwithFailed core proc(s): {}.llm.llm_engine.shutdown()andtorch.cuda.empty_cache()after the test.Assertion clarity (both tests)
assert texts == EXPECTED_TEXTwith per-element assertions that printgot/expectedon failure, removing the need for-vvto see the actual diff.cc @kenroche