[ROCm][Quantization] fallback trust_remote_code=True in Quark config for some cases by xuebwang-amd · Pull Request #37408 · vllm-project/vllm

xuebwang-amd · 2026-03-18T09:51:53Z

Purpose

Model: amd/MiniMax-M2.1-MXFP4
Transformers: 4.57.6

Error message:

... ...
(APIServer pid=295080)   File "/workspace/xuebwang/vllm/vllm/engine/arg_utils.py", line 1928, in create_engine_config
(APIServer pid=295080)     config = VllmConfig(
(APIServer pid=295080)              ^^^^^^^^^^^
(APIServer pid=295080)   File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=295080)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=295080) pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(APIServer pid=295080)   Value error, The repository /workspace/amd/MiniMax-M2.1-MXFP4 contains custom code which must be executed to correctly load the model. You can inspect the repository content at /workspace/amd/MiniMax-M2.1-MXFP4 .
(APIServer pid=295080)  You can inspect the repository content at https://hf.co//workspace/amd/MiniMax-M2.1-MXFP4.
(APIServer pid=295080) Please pass the argument `trust_remote_code=True` to allow custom code to be run. [type=value_error, input_value=ArgsKwargs((), {'model_co... 'shutdown_timeout': 0}), input_type=ArgsKwargs]

Test Plan & Result

After fixing:

vllm (pretrained=/workspace/amd/MiniMax-M2.1-MXFP4,tensor_parallel_size=2,dtype=auto,gpu_memory_utilization=0.9,enforce_eager=True,trust_remote_code=True,max_model_len=32768), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|    Tasks     |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|--------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_platinum|      3|flexible-extract|     5|exact_match|↑  |0.9603|±  |0.0056|
|              |       |strict-match    |     5|exact_match|↑  |0.9570|±  |0.0058|

Signed-off-by: xuebwang-amd <xuebwang@amd.com>

gemini-code-assist

Code Review

This pull request adds a fallback mechanism to load model configurations that require trust_remote_code=True. While this improves compatibility with certain models, it introduces a security risk by potentially executing remote code without user awareness. I've added a critical comment to suggest logging a warning when this fallback is triggered to ensure users are informed about the remote code execution.

Signed-off-by: xuebwang-amd <xuebwang@amd.com>

hongxiayang · 2026-03-20T13:46:41Z

            quant_dtype = quant_config["global_quant_config"]["weight"]["dtype"]
            model_type = self.hf_config.model_type
            if quant_dtype == "fp4" and model_type == "deepseek_v3":
                self.dynamic_mxfp4_quant = True


It seems the whole purpose of that overridden function maybe_update_config is to set dynamic_mxfp4_quant to True for deepseek_v3 model famliy, very model specific.

Is that possible to guide the whole block of code of calling get_config only for the deepseek_v3 type of model without impacting other models, like the model you have issue?

Thanks for the review! Yeah, from model perspective, more exceptional cases can be collected.

gshtras · 2026-03-20T15:09:29Z

+
+            self.hf_config = get_config(
+                model=model_name,
+                trust_remote_code=True,


--trust-remote-code needs to be explicitly provided if it is required, doesn't it, to avoid security risks?
Why would you override this silently?

Thanks for the review! Yes, --trust-remote-code is needed in the CLI.
Please see updated details #37408 (comment).

functionstackx · 2026-03-20T18:28:14Z

hi everyone

thanks for this PR

i want to use mxfp4 minimax m2.5 but unfortunately running into this issue

hongxiayang · 2026-03-20T20:21:58Z

check this PR: #37698

xuebwang-amd · 2026-03-22T04:46:26Z

Here is a detailed content with re-validation for the PR.

Model: amd/MiniMax-M2.1-MXFP4
Transformers version: 4.57

Root cause

It is not a CLI propagation issue: --trust-remote-code is present, but trust_remote_code=False is overwritten during internal metadata load (QuarkConfig.maybe_update_config()), which can fail for MiniMax-M2 on Transformers 4.57.6.

Note:

In transformers v4.57, AutoConfig mapping includes minimax, but not minimax_m2.
In transformers v5.2.0, mapping includes both minimax and minimax_m2 (MiniMaxM2Config).

Minimal compatibility fix of this PR

To keep risk low, the patch is intentionally minimal and Quark-local:

strict load first (trust_remote_code=False)
retry with trust_remote_code=True only for the known trust/custom-config failure (e.g., amd/MiniMax-M2.1-MXFP4 + transformers v4.57)
re-raise unrelated exceptions unchanged

No broad fallback is kept in vllm/transformers_utils/config.py, so trust behavior outside this Quark path remains unchanged (global trust policy remains).

xuebwang-amd · 2026-03-31T04:09:10Z

Close since #37698 is merged.

main commit

e00b653

Signed-off-by: xuebwang-amd <xuebwang@amd.com>

xuebwang-amd requested a review from tjtanaa as a code owner March 18, 2026 09:51

xuebwang-amd mentioned this pull request Mar 18, 2026

[Bug]: GPU hang on MI325/300 when serving MiniMax-M2.1-MXFP4 with TP=1 #37406

Open

1 task

mergify Bot added the rocm Related to AMD ROCm label Mar 18, 2026

github-project-automation Bot added this to AMD Mar 18, 2026

github-project-automation Bot moved this to Todo in AMD Mar 18, 2026

gemini-code-assist Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/quantization/quark/quark.py

emit a warning when this fallback occurs

dce059d

Signed-off-by: xuebwang-amd <xuebwang@amd.com>

hongxiayang reviewed Mar 20, 2026

View reviewed changes

gshtras reviewed Mar 20, 2026

View reviewed changes

xuebwang-amd mentioned this pull request Mar 27, 2026

[Bug]: AMD's minimax mxfp4 trust_remote_code bug #38307

Closed

1 task

xuebwang-amd closed this Mar 31, 2026

github-project-automation Bot moved this from Todo to Done in AMD Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][Quantization] fallback trust_remote_code=True in Quark config for some cases#37408

[ROCm][Quantization] fallback trust_remote_code=True in Quark config for some cases#37408
xuebwang-amd wants to merge 2 commits intovllm-project:mainfrom
xuebwang-amd:xuebin_trust_remote_code_issue_in_quark

xuebwang-amd commented Mar 18, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

hongxiayang Mar 20, 2026 •

edited

Loading

Uh oh!

xuebwang-amd Mar 22, 2026

Uh oh!

gshtras Mar 20, 2026

Uh oh!

xuebwang-amd Mar 22, 2026

Uh oh!

functionstackx commented Mar 20, 2026

Uh oh!

hongxiayang commented Mar 20, 2026

Uh oh!

xuebwang-amd commented Mar 22, 2026 •

edited

Loading

Uh oh!

xuebwang-amd commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

xuebwang-amd commented Mar 18, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan & Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

hongxiayang Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuebwang-amd Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gshtras Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

xuebwang-amd Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx commented Mar 20, 2026

Uh oh!

hongxiayang commented Mar 20, 2026

Uh oh!

xuebwang-amd commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Minimal compatibility fix of this PR

Uh oh!

xuebwang-amd commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xuebwang-amd commented Mar 18, 2026 •

edited by github-actions Bot

Loading

hongxiayang Mar 20, 2026 •

edited

Loading

xuebwang-amd commented Mar 22, 2026 •

edited

Loading