[CI][BugFix][AMD] Add check for model_config being None and update conftest.py to load AITER of available to fix Kernels MoE Test % N by rasmith · Pull Request #33952 · vllm-project/vllm

rasmith · 2026-02-06T00:06:09Z

Purpose

This PR broke many tests (over 30) and this PR fixed one test in the Kernels MoE Test %N group, but when the test is run as a group using

pytest -sv kernels/moe

the first test that run does not load AITER ops and when subsequent tests run, they will also not have AITER ops loaded.

This PR loads the ops in vllm._aiter_ops but then ensures that VLLM_ROCM_USE_AITER=0 when tests run. This ensures that tests that need the function pointers in vllm._aiter_ops are available, but for tests that do not want to use AITER and may depend on VLLM_ROCM_USE_AITER=0 will run properly.

In the context of testing, it does seem reasonable to load AITER ops on ROCm if AITER is available. So, I added a function to conftest.py to load AITER ops if they are available, which now lets the entire group pass.

This PR introduced a check in vllm.py that crashes if the VllmConfig model_config is None, so I added a check to see if the model_config is not None to prevent this from happening.

Test Plan

pytest -sv kernels/moe

Test Result

1950 passed, 5399 skipped, 8 warnings

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…o work Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

mergify · 2026-02-06T00:11:23Z

Hi @rasmith, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

gemini-code-assist

Code Review

This pull request introduces two important fixes. First, it resolves a potential AttributeError in vllm/config/vllm.py by adding a necessary null check for cfg.model_config before accessing its attributes. This is a good defensive programming practice that prevents crashes. Second, it addresses a CI failure for MoE tests on ROCm by centralizing the AITER op loading logic into tests/conftest.py. This ensures that the test environment is set up correctly for all tests in the suite, improving the reliability of the CI pipeline. The corresponding cleanup in test_rocm_aiter_topk.py is also appropriate. The changes are well-implemented and clearly explained. Overall, this is a solid contribution that improves both code robustness and test stability.

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

robertgshaw2-redhat · 2026-02-06T00:19:58Z

tests/conftest.py

 from torch._inductor.utils import fresh_cache


+def use_aiter_if_available():


hm, not sure if this is a good idea. I think many tests explicity set or dont set this env

I updated it so that only the function pointers are loaded, which was happening before, but the environment variable is set to 0.

@robertgshaw2-redhat I could also do this in _aiter_ops.py, which also works. Basically, the functions get loaded but the env var will be unset:

@@ -36,7 +36,7 @@ def is_aiter_found_and_supported() -> bool: Checks: platform (ROCm), device arch (gfx9), library existence, and VLLM_ROCM_USE_AITER env variable. """ - if current_platform.is_rocm() and IS_AITER_FOUND and envs.VLLM_ROCM_USE_AITER: + if current_platform.is_rocm() and IS_AITER_FOUND: from vllm.platforms.rocm import on_gfx9

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

tjtanaa · 2026-02-06T16:40:45Z

@rasmith can you check if this PR #33749 resolves the issue that this PR is trying to address? We have worked out a way to resolve the log regression without complicating the imports.

rasmith · 2026-02-06T17:17:18Z

@rasmith can you check if this PR #33749 resolves the issue that this PR is trying to address? We have worked out a way to resolve the log regression without complicating the imports.

@tjtanaa Yes, it will cause vllm._aiter_ops functions to always get loaded if on ROCm and the aiter library is available, even if VLLM_ROCM_USE_AITER is 0, which is what should happen IMO.

Closing this PR and opening this one to fix the rest of the issues in the test group.

gshtras and others added 12 commits January 29, 2026 21:50

Fixing the skinny gemm dispatch logic. Weights can be padded for it t…

9737831

…o work Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Added weight padding into fp8 skinny gemm tests

5c1f660

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

add default_vllm_config so tests pass

515e0c5

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

Merge branch 'main' of github.com:rasmith/vllm

a965c65

Merge branch 'main' of github.com:rasmith/vllm

e8bc3c2

Merge branch 'main' of github.com:rasmith/vllm

4cecd78

enable aiter in tests

cd2f3b6

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

Merge branch 'main' of github.com:rasmith/vllm

1eb74e2

merge main

7b09163

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

typo

32caeb5

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

undo mori

afe4b5c

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

update comment

1cfd671

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

rasmith requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tjtanaa, tlrmchlsmth, yewentao256 and youkaichao as code owners February 6, 2026 00:06

mergify bot added rocm Related to AMD ROCm bug Something isn't working labels Feb 6, 2026

github-project-automation bot added this to AMD Feb 6, 2026

github-project-automation bot moved this to Todo in AMD Feb 6, 2026

gemini-code-assist bot reviewed Feb 6, 2026

View reviewed changes

rasmith added 2 commits February 6, 2026 00:12

pre-commit

57d5660

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

fix typo

9d0c5f8

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

robertgshaw2-redhat reviewed Feb 6, 2026

View reviewed changes

only load ops and ensure env var not st when tests run

09d804c

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

rasmith closed this Feb 6, 2026

github-project-automation bot moved this from Todo to Done in AMD Feb 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI][BugFix][AMD] Add check for model_config being None and update conftest.py to load AITER of available to fix Kernels MoE Test % N #33952

[CI][BugFix][AMD] Add check for model_config being None and update conftest.py to load AITER of available to fix Kernels MoE Test % N #33952
rasmith wants to merge 15 commits intovllm-project:mainfrom
rasmith:rasmith_fix_test_triton_moe_ptpc_fp8

rasmith commented Feb 6, 2026 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Feb 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

robertgshaw2-redhat Feb 6, 2026

Uh oh!

rasmith Feb 6, 2026

Uh oh!

rasmith Feb 6, 2026

Uh oh!

tjtanaa commented Feb 6, 2026 •

edited

Loading

Uh oh!

rasmith commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		from torch._inductor.utils import fresh_cache


		def use_aiter_if_available():

Uh oh!

Conversation

rasmith commented Feb 6, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Feb 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

robertgshaw2-redhat Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

rasmith Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

rasmith Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rasmith commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rasmith commented Feb 6, 2026 •

edited by github-actions bot

Loading

tjtanaa commented Feb 6, 2026 •

edited

Loading

rasmith commented Feb 6, 2026 •

edited

Loading