[Bugfix][ROCm]: Allow `gpt_oss_mxfp4` quantization method on rocm by Rohan138 · Pull Request #39754 · vllm-project/vllm

Rohan138 · 2026-04-14T00:43:04Z

Purpose

#39604 added the gpt_oss_mxfp4 quantization method, but did not add it to the list of supported_quantization methods that rocm.py maintains. Running vllm serve --model openai/gpt-oss-120b thus gives me the following error:

VLLM_ROCM_USE_AITER=1 vllm bench latency --model openai/gpt-oss-120b --batch-size 32 --input-len 16 --output-len 16 --num-iters-warmup 1
 --num-iters 3 --attention-backend ROCM_AITER_UNIFIED_ATTN
WARNING 04-14 00:24:08 [gpt_oss_triton_kernels_moe.py:58] Using legacy triton_kernels on ROCm
INFO 04-14 00:24:09 [utils.py:233] non-default args: {'enable_prefix_caching': False, 'enable_lora': None, 'reasoning_parser_plugin': '', 'attention_backend': 'ROCM_AITER_UNIFIED_ATTN', 'model': 'openai/gpt-oss-120b'}
INFO 04-14 00:24:26 [model.py:554] Resolved architecture: GptOssForCausalLM
Parse safetensors files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:02<00:00,  5.54it/s]
INFO 04-14 00:24:29 [model.py:1685] Using max model len 131072
[aiter] start build [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_aiter_enum
[aiter] finish build [module_aiter_enum], cost 25.4s 
[aiter] import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 33, in <module>
    sys.exit(load_entry_point('vllm', 'console_scripts', 'vllm')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ropotdar/Desktop/vllm/vllm/entrypoints/cli/main.py", line 75, in main
    args.dispatch_function(args)
  File "/home/ropotdar/Desktop/vllm/vllm/entrypoints/cli/benchmark/latency.py", line 21, in cmd
    main(args)
  File "/home/ropotdar/Desktop/vllm/vllm/benchmarks/latency.py", line 87, in main
    llm = LLM.from_engine_args(engine_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ropotdar/Desktop/vllm/vllm/entrypoints/llm.py", line 413, in from_engine_args
    return cls(**vars(engine_args))
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ropotdar/Desktop/vllm/vllm/entrypoints/llm.py", line 381, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ropotdar/Desktop/vllm/vllm/v1/engine/llm_engine.py", line 163, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ropotdar/Desktop/vllm/vllm/engine/arg_utils.py", line 1584, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ropotdar/Desktop/vllm/vllm/engine/arg_utils.py", line 1432, in create_model_config
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
  Value error, gpt_oss_mxfp4 quantization is currently not supported in rocm. [type=value_error, input_value=ArgsKwargs((), {'model': ...nderer_num_workers': 1}), input_type=ArgsKwargs]
    For further information visit https://errors.pydantic.dev/2.13/v/value_error

This PR fixes the error. As an aside, ROCm seems to be the only platform that maintains such a list-do we know if we still want it here? I think it makes more sense for the quantization method itself to specify the supported platform(s), rather than this list at the platform level.

cc @gshtras @tjtanaa @mgoin @zyongye

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

gemini-code-assist

Code Review

This pull request adds "gpt_oss_mxfp4" to the list of supported quantization formats for the ROCm platform in vllm/platforms/rocm.py. No review comments were provided for this change, and I have no feedback to provide.

…lm-project#39754) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: zengxian <xiangdong.zeng@intel.com>

…lm-project#39754) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

…lm-project#39754) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

reenable gpt-oss on rocm

8499154

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

Rohan138 requested a review from tjtanaa as a code owner April 14, 2026 00:43

mergify Bot added gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm bug Something isn't working labels Apr 14, 2026

github-project-automation Bot added this to AMD and gpt-oss Issues & Enhancements Apr 14, 2026

github-project-automation Bot moved this to Todo in AMD Apr 14, 2026

github-project-automation Bot moved this to To Triage in gpt-oss Issues & Enhancements Apr 14, 2026

gemini-code-assist Bot reviewed Apr 14, 2026

View reviewed changes

gshtras approved these changes Apr 14, 2026

View reviewed changes

github-project-automation Bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Apr 14, 2026

gshtras enabled auto-merge (squash) April 14, 2026 15:42

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 14, 2026

gshtras merged commit 23f3760 into vllm-project:main Apr 14, 2026
52 of 53 checks passed

github-project-automation Bot moved this from Todo to Done in AMD Apr 14, 2026

github-project-automation Bot moved this from Ready to Done in gpt-oss Issues & Enhancements Apr 14, 2026

whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026

[Bugfix][ROCm]: Allow gpt_oss_mxfp4 quantization method on rocm (vl…

d27a6b2

…lm-project#39754) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][ROCm]: Allow `gpt_oss_mxfp4` quantization method on rocm#39754

[Bugfix][ROCm]: Allow `gpt_oss_mxfp4` quantization method on rocm#39754
gshtras merged 1 commit intovllm-project:mainfrom
ROCm:reenable_gpt_oss_mxfp4_rocm

Rohan138 commented Apr 14, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Rohan138 commented Apr 14, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rohan138 commented Apr 14, 2026 •

edited by github-actions Bot

Loading