[ROCm][Bugfix] fix exception related to trust_remote_code for MiniMax-M2.1-MXFP4#37698
Conversation
There was a problem hiding this comment.
Code Review
This pull request effectively addresses the bug related to trust_remote_code for Quark models by propagating the hf_config from ModelConfig down to maybe_update_config. This avoids re-fetching the configuration with a hardcoded trust_remote_code=False and also adds a performance improvement by skipping logic for non-applicable models. The signature changes across various quantization configs are correctly handled to maintain compatibility. I've added one comment regarding improving the robustness of dictionary access to prevent potential KeyError exceptions.
|
Hi @hongxiayang, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
cc @dllehr-amd |
… amd quark models Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
d86a7a3 to
30d7c71
Compare
|
cc @tjtanaa |
BowenBao
left a comment
There was a problem hiding this comment.
LGTM, thanks for the fix, and restricting the dynamic quant for DS.
| revision: The revision of the model | ||
| Returns: | ||
| """ | ||
| # TODO: revision is never passed currently in vllm.py, |
There was a problem hiding this comment.
yea should be okay to drop revision.
cc @dllehr-amd
There was a problem hiding this comment.
will do on a follow up PR
|
cc @DarkLight1337 for helping review and merge |
|
It seems the failures are not related to this PR. |
…-M2.1-MXFP4 (vllm-project#37698) Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com> Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com> Signed-off-by: neweyes <328719365@qq.com>
…-M2.1-MXFP4 (vllm-project#37698) Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com> Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com> Signed-off-by: Rishi Puri <riship@nvidia.com>
### What this PR does / why we need it? Main2main upgrade vllm to 0330 fix breaks: 1. vllm-project/vllm#37728 add clear_row method for BlockTable 2. vllm-project/vllm#37975 Adapt GatedDeltaNetAttention Refactor 3. vllm-project/vllm#37698 update maybe_update_config in vllm_ascend/quantization/modelslim_config.py to adapt this pr change 4. vllm-project/vllm#37880 This pr add the feat where we can set different moe backends between draft and target model, we should overwrite it in the draft proposer 5. vllm-project/vllm#37853 for now just to skip test_cpu_offloading.py test case utils this feature has been adapted. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.18.0 - vLLM main: vllm-project/vllm@29e4870 --------- Signed-off-by: 22dimensions <waitingwind@foxmail.com> Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: wangli <wangli858794774@gmail.com> Co-authored-by: Claude Code <claude@anthropic.com> Co-authored-by: Claude Code <noreply@anthropic.com> Co-authored-by: wxsIcey <1790571317@qq.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…-M2.1-MXFP4 (vllm-project#37698) Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com> Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
Purpose
Fix: #38307
Bug Fix: QuarkConfig.maybe_update_config
Problem: The original code called get_config() with hardcoded trust_remote_code=False for every Quark model. This caused:
For example:
File Changes
vllm/model_executor/layers/quantization/quark/quark.py:Replaced get_config() call with pre-loaded hf_config from ModelConfig, so no need to get from hf config. Also, user should be able to override trust_remote_code from command line.
Added early return for non-deepseek_v3 model types via _DEEPSEEK_V3_FAMILY_MODEL_TYPES frozenset.
vllm/model_executor/layers/quantization/base_config.py: Extended base maybe_update_config signature to accept revision + **kwargsvllm.py: Passes hf_config, revision, and trust_remote_code from ModelConfig to maybe_update_configThis will allow user to specify trust_remote_code.
and other places to align with the signature change.
Added new Test
tests/quantization/test_quark_maybe_update_config.py: 3 tests using real HF configs — verifies amd/MiniMax-M2.1-MXFP4 stays False, amd/DeepSeek-R1-MXFP4-ASQ enables True, and missing hf_config doesn't crash
Test Result
root@node:/home/vllm/tests/quantization# pytest test_quark_maybe_update_config.py
==================================================== test session starts ====================================================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /dockerx/vllm
configfile: pyproject.toml
plugins: asyncio-1.3.0, anyio-4.12.1
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 3 items
test_quark_maybe_update_config.py ... [100%]
=============================================== 3 passed, 2 warnings in 4.72s ===============================================
sys:1: DeprecationWarning: builtin type swigvarlink has no module attribute
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.