Skip to content

[ROCm][Quantization] fallback trust_remote_code=True in Quark config for some cases#37408

Closed
xuebwang-amd wants to merge 2 commits intovllm-project:mainfrom
xuebwang-amd:xuebin_trust_remote_code_issue_in_quark
Closed

[ROCm][Quantization] fallback trust_remote_code=True in Quark config for some cases#37408
xuebwang-amd wants to merge 2 commits intovllm-project:mainfrom
xuebwang-amd:xuebin_trust_remote_code_issue_in_quark

Conversation

@xuebwang-amd
Copy link
Copy Markdown
Contributor

@xuebwang-amd xuebwang-amd commented Mar 18, 2026

Purpose

Model: amd/MiniMax-M2.1-MXFP4
Transformers: 4.57.6

Error message:

... ...
(APIServer pid=295080)   File "/workspace/xuebwang/vllm/vllm/engine/arg_utils.py", line 1928, in create_engine_config
(APIServer pid=295080)     config = VllmConfig(
(APIServer pid=295080)              ^^^^^^^^^^^
(APIServer pid=295080)   File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=295080)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=295080) pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(APIServer pid=295080)   Value error, The repository /workspace/amd/MiniMax-M2.1-MXFP4 contains custom code which must be executed to correctly load the model. You can inspect the repository content at /workspace/amd/MiniMax-M2.1-MXFP4 .
(APIServer pid=295080)  You can inspect the repository content at https://hf.co//workspace/amd/MiniMax-M2.1-MXFP4.
(APIServer pid=295080) Please pass the argument `trust_remote_code=True` to allow custom code to be run. [type=value_error, input_value=ArgsKwargs((), {'model_co... 'shutdown_timeout': 0}), input_type=ArgsKwargs]

Test Plan & Result

After fixing:

vllm (pretrained=/workspace/amd/MiniMax-M2.1-MXFP4,tensor_parallel_size=2,dtype=auto,gpu_memory_utilization=0.9,enforce_eager=True,trust_remote_code=True,max_model_len=32768), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|    Tasks     |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|--------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_platinum|      3|flexible-extract|     5|exact_match|↑  |0.9603|±  |0.0056|
|              |       |strict-match    |     5|exact_match|↑  |0.9570|±  |0.0058|

Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a fallback mechanism to load model configurations that require trust_remote_code=True. While this improves compatibility with certain models, it introduces a security risk by potentially executing remote code without user awareness. I've added a critical comment to suggest logging a warning when this fallback is triggered to ensure users are informed about the remote code execution.

Comment thread vllm/model_executor/layers/quantization/quark/quark.py
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
quant_dtype = quant_config["global_quant_config"]["weight"]["dtype"]
model_type = self.hf_config.model_type
if quant_dtype == "fp4" and model_type == "deepseek_v3":
self.dynamic_mxfp4_quant = True
Copy link
Copy Markdown
Collaborator

@hongxiayang hongxiayang Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the whole purpose of that overridden function maybe_update_config is to set dynamic_mxfp4_quant to True for deepseek_v3 model famliy, very model specific.

Is that possible to guide the whole block of code of calling get_config only for the deepseek_v3 type of model without impacting other models, like the model you have issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Yeah, from model perspective, more exceptional cases can be collected.


self.hf_config = get_config(
model=model_name,
trust_remote_code=True,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--trust-remote-code needs to be explicitly provided if it is required, doesn't it, to avoid security risks?
Why would you override this silently?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Yes, --trust-remote-code is needed in the CLI.
Please see updated details #37408 (comment).

@functionstackx
Copy link
Copy Markdown

hi everyone

thanks for this PR

i want to use mxfp4 minimax m2.5 but unfortunately running into this issue

@hongxiayang
Copy link
Copy Markdown
Collaborator

check this PR: #37698

@xuebwang-amd
Copy link
Copy Markdown
Contributor Author

xuebwang-amd commented Mar 22, 2026

Here is a detailed content with re-validation for the PR.

  • Model: amd/MiniMax-M2.1-MXFP4
  • Transformers version: 4.57

Root cause

It is not a CLI propagation issue: --trust-remote-code is present, but trust_remote_code=False is overwritten during internal metadata load (QuarkConfig.maybe_update_config()), which can fail for MiniMax-M2 on Transformers 4.57.6.

Note:

  • In transformers v4.57, AutoConfig mapping includes minimax, but not minimax_m2.
  • In transformers v5.2.0, mapping includes both minimax and minimax_m2 (MiniMaxM2Config).

Minimal compatibility fix of this PR

To keep risk low, the patch is intentionally minimal and Quark-local:

  • strict load first (trust_remote_code=False)
  • retry with trust_remote_code=True only for the known trust/custom-config failure (e.g., amd/MiniMax-M2.1-MXFP4 + transformers v4.57)
  • re-raise unrelated exceptions unchanged

No broad fallback is kept in vllm/transformers_utils/config.py, so trust behavior outside this Quark path remains unchanged (global trust policy remains).

@xuebwang-amd
Copy link
Copy Markdown
Contributor Author

Close since #37698 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants