Skip to content

[Bugfix][ROCm]: Allow gpt_oss_mxfp4 quantization method on rocm#39754

Merged
gshtras merged 1 commit intovllm-project:mainfrom
ROCm:reenable_gpt_oss_mxfp4_rocm
Apr 14, 2026
Merged

[Bugfix][ROCm]: Allow gpt_oss_mxfp4 quantization method on rocm#39754
gshtras merged 1 commit intovllm-project:mainfrom
ROCm:reenable_gpt_oss_mxfp4_rocm

Conversation

@Rohan138
Copy link
Copy Markdown
Contributor

@Rohan138 Rohan138 commented Apr 14, 2026

Purpose

#39604 added the gpt_oss_mxfp4 quantization method, but did not add it to the list of supported_quantization methods that rocm.py maintains. Running vllm serve --model openai/gpt-oss-120b thus gives me the following error:

VLLM_ROCM_USE_AITER=1 vllm bench latency --model openai/gpt-oss-120b --batch-size 32 --input-len 16 --output-len 16 --num-iters-warmup 1
 --num-iters 3 --attention-backend ROCM_AITER_UNIFIED_ATTN
WARNING 04-14 00:24:08 [gpt_oss_triton_kernels_moe.py:58] Using legacy triton_kernels on ROCm
INFO 04-14 00:24:09 [utils.py:233] non-default args: {'enable_prefix_caching': False, 'enable_lora': None, 'reasoning_parser_plugin': '', 'attention_backend': 'ROCM_AITER_UNIFIED_ATTN', 'model': 'openai/gpt-oss-120b'}
INFO 04-14 00:24:26 [model.py:554] Resolved architecture: GptOssForCausalLM
Parse safetensors files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:02<00:00,  5.54it/s]
INFO 04-14 00:24:29 [model.py:1685] Using max model len 131072
[aiter] start build [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/build/module_aiter_enum
[aiter] finish build [module_aiter_enum], cost 25.4s 
[aiter] import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 33, in <module>
    sys.exit(load_entry_point('vllm', 'console_scripts', 'vllm')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ropotdar/Desktop/vllm/vllm/entrypoints/cli/main.py", line 75, in main
    args.dispatch_function(args)
  File "/home/ropotdar/Desktop/vllm/vllm/entrypoints/cli/benchmark/latency.py", line 21, in cmd
    main(args)
  File "/home/ropotdar/Desktop/vllm/vllm/benchmarks/latency.py", line 87, in main
    llm = LLM.from_engine_args(engine_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ropotdar/Desktop/vllm/vllm/entrypoints/llm.py", line 413, in from_engine_args
    return cls(**vars(engine_args))
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ropotdar/Desktop/vllm/vllm/entrypoints/llm.py", line 381, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ropotdar/Desktop/vllm/vllm/v1/engine/llm_engine.py", line 163, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ropotdar/Desktop/vllm/vllm/engine/arg_utils.py", line 1584, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ropotdar/Desktop/vllm/vllm/engine/arg_utils.py", line 1432, in create_model_config
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
  Value error, gpt_oss_mxfp4 quantization is currently not supported in rocm. [type=value_error, input_value=ArgsKwargs((), {'model': ...nderer_num_workers': 1}), input_type=ArgsKwargs]
    For further information visit https://errors.pydantic.dev/2.13/v/value_error

This PR fixes the error. As an aside, ROCm seems to be the only platform that maintains such a list-do we know if we still want it here? I think it makes more sense for the quantization method itself to specify the supported platform(s), rather than this list at the platform level.

cc @gshtras @tjtanaa @mgoin @zyongye

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@Rohan138 Rohan138 requested a review from tjtanaa as a code owner April 14, 2026 00:43
@mergify mergify Bot added gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm bug Something isn't working labels Apr 14, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Apr 14, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds "gpt_oss_mxfp4" to the list of supported quantization formats for the ROCm platform in vllm/platforms/rocm.py. No review comments were provided for this change, and I have no feedback to provide.

@github-project-automation github-project-automation Bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Apr 14, 2026
@gshtras gshtras enabled auto-merge (squash) April 14, 2026 15:42
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 14, 2026
@gshtras gshtras merged commit 23f3760 into vllm-project:main Apr 14, 2026
52 of 53 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD Apr 14, 2026
zxd1997066 pushed a commit to zxd1997066/vllm that referenced this pull request Apr 15, 2026
…lm-project#39754)

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
…lm-project#39754)

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants