[ROCm][CI] Fix Assertion Logic For test_gpt_oss#35806
Merged
DarkLight1337 merged 1 commit intovllm-project:mainfrom Mar 3, 2026
Merged
[ROCm][CI] Fix Assertion Logic For test_gpt_oss#35806DarkLight1337 merged 1 commit intovllm-project:mainfrom
test_gpt_oss#35806DarkLight1337 merged 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request addresses a flaky test, test_gpt_oss, which was failing in the AMD CI. The original assertion incorrectly rejected valid accuracy scores that were higher than the expected value plus a tolerance. The fix correctly adjusts the assertion to check only if the measured accuracy meets a minimum threshold (expected_accuracy - rtol), which is the standard practice for such tests. Additionally, an unused general importlib import was replaced with the more specific importlib.util. The changes are correct and well-implemented.
3 tasks
BowenBao
approved these changes
Mar 2, 2026
Contributor
BowenBao
left a comment
There was a problem hiding this comment.
LGTM, thanks!
The original threshold might be a bit tight on the upper end.
DarkLight1337
approved these changes
Mar 3, 2026
Copilot AI
pushed a commit
to machov/vllm
that referenced
this pull request
Mar 10, 2026
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
avinashsingh77
pushed a commit
to avinashsingh77/vllm
that referenced
this pull request
Mar 12, 2026
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
wendyliu235
pushed a commit
to wendyliu235/vllm-public
that referenced
this pull request
Mar 18, 2026
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
After #35658 was merged, we saw
Quantized Models Teststarted failing in AMD CI:https://buildkite.com/vllm/amd-ci/builds/5661/steps/canvas?sid=019cad58-e61d-4bf9-8028-1a6a6a7ac897&tab=output
As it turns out, this test was previously skipped because quark was not installed in our CI builds. Now that it is, it exposed that this test was flaky because it doesn't allow for measured accuracy that is higher than the expected accuracy. Before this PR, tests sometimes fail with errors like
AssertionError: Expected: 0.89 | Measured: 0.913151364764268. (e.g. you can observe this withpytest -v -s models/quantization/test_gpt_oss.py::test_gpt_oss_attention_quantization[amd/gpt-oss-20b-WFP8-AFP8-KVFP8-0.89-1]). Clearly, 0.91 is a perfectly valid accuracy score, so we should accept it.