Skip to content

[Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars#32837

Merged
DarkLight1337 merged 1 commit intovllm-project:mainfrom
ROCm:fix_deprecated_env_vars
Jan 22, 2026
Merged

[Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars#32837
DarkLight1337 merged 1 commit intovllm-project:mainfrom
ROCm:fix_deprecated_env_vars

Conversation

@mawong-amd
Copy link
Contributor

@mawong-amd mawong-amd commented Jan 22, 2026

Purpose

This PR fixes various issues on AMD ROCm caused by the deprecation of environment variables in #32812.

Firstly, the deprecation of VLLM_V1_USE_PREFILL_DECODE_ATTENTION means that get_current_vllm_config is now called as a replacement in RocmPlatform.get_attn_backend_cls to determine if the ROCM_ATTN backend should be used. However, there are instances where the current vLLM config is not yet set, which causes the above function to error.
For instance, this causes a test regression in
v1/attention/test_rocm_attention_backends_selection.py::test_standard_attention_backend_selection

Secondly, the deprecation of VLLM_ATTENTION_BACKEND meant that the required ROCM_ATTN backend was no longer correctly passed to the NIXL accuracy tests, resulting in failing tests.

The aforementioned test regressions can be seen on this AMD CI nightly build under the failing

  1. V1 Test attention (H100)
  2. NixlConnector PD accuracy tests (Distributed)
  3. DP EP NixlConnector PD accuracy tests (Distributed)

test groups.

Test Plan

Run the following

  1. pytest -sv tests/v1/attention/test_rocm_attention_backends_selection.py -k test_standard_attention_backend_selection
  2. ROCM_ATTN=1 bash v1/kv_connector/nixl_integration/config_sweep_accuracy_test.sh
  3. DP_EP=1 ROCM_ATTN=1 bash v1/kv_connector/nixl_integration/config_sweep_accuracy_test.sh

as part of the

  1. V1 Test attention (H100)
  2. NixlConnector PD accuracy tests (Distributed)
  3. DP EP NixlConnector PD accuracy tests (Distributed)

test groups in AMD CI.

Test Result

The tests now pass.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
@mergify mergify bot added ci/build rocm Related to AMD ROCm v1 bug Something isn't working kv-connector labels Jan 22, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses regressions caused by the deprecation of certain environment variables, particularly VLLM_V1_USE_PREFILL_DECODE_ATTENTION and VLLM_ATTENTION_BACKEND. The changes ensure that the ROCM_ATTN backend is correctly selected even when the vLLM configuration might not be fully initialized, and updates the CI/CD scripts to pass the ROCM_ATTN flag as an environment variable. The modifications enhance the robustness of the attention backend selection logic and ensure the accuracy tests run as expected.

@mawong-amd mawong-amd changed the title [Hardware][AMD][Bugfix] Fix regressions from deprecated env vars [Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars Jan 22, 2026
Copy link
Collaborator

@tjtanaa tjtanaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for the quick fix.

@tjtanaa tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 22, 2026
@mawong-amd
Copy link
Contributor Author

Failing Quantization Test is due to PTPC FP8 and is unrelated. It would be fixed by #32813, but is going to be deprecated soon by #32700 anyway.

@DarkLight1337 DarkLight1337 merged commit c517d8c into vllm-project:main Jan 22, 2026
57 of 60 checks passed
@mawong-amd mawong-amd deleted the fix_deprecated_env_vars branch January 22, 2026 17:04
monajafi-amd pushed a commit to monajafi-amd/vllm that referenced this pull request Jan 23, 2026
…llm-project#32837)

Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>
cwazai pushed a commit to cwazai/vllm that referenced this pull request Jan 25, 2026
…llm-project#32837)

Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Signed-off-by: 陈建华 <1647430658@qq.com>
lapy pushed a commit to lapy/vllm that referenced this pull request Jan 27, 2026
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci/build kv-connector ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants