Skip to content

[Bugfix] Accept file-type-only quant types (IQ2_M, IQ3_XS, ...) in remote GGUF model IDs#19

Merged
Isotr0py merged 1 commit into
vllm-project:mainfrom
Sunt-ing:fix/accept-file-type-only-quants
Jun 3, 2026
Merged

[Bugfix] Accept file-type-only quant types (IQ2_M, IQ3_XS, ...) in remote GGUF model IDs#19
Isotr0py merged 1 commit into
vllm-project:mainfrom
Sunt-ing:fix/accept-file-type-only-quants

Conversation

@Sunt-ing
Copy link
Copy Markdown
Contributor

@Sunt-ing Sunt-ing commented Jun 3, 2026

Purpose

Port the fix approved at vllm-project/vllm#44218 to the plugin, as @Isotr0py requested there (GGUF support is migrating here per #39612). Fixes vllm-project/vllm#42734.

vllm serve <repo>:UD-IQ2_M (and any repo_id:quant_type reference whose quant is IQ2_M, IQ3_M, IQ3_XS, or MXFP4_MOE) fails with Repo id must use alphanumeric chars...: the whole string is treated as a plain repo id instead of a remote GGUF reference.

Root cause. is_valid_gguf_quant_type() only checks GGMLQuantizationType (the tensor quantization enum), but the quant_type in a repo_id:quant_type reference is a GGUF file type (LlamaFileType) used to select the .gguf file. These are two distinct gguf enums, and IQ2_M/IQ3_M/IQ3_XS/MXFP4_MOE exist only in LlamaFileType — they have no GGMLQuantizationType member, and no valid base after suffix stripping (IQ2_MIQ2, which doesn't exist). So is_remote_gguf() returns False and the reference is rejected. UD-IQ1_S/UD-IQ1_M work only because IQ1_S/IQ1_M happen to exist in both enums.

This matches the upstream conclusion in ggml-org/llama.cpp#23085: the gguf maintainer confirms this is a downstream issue and that general.file_type must be parsed with LlamaFileType. It also cannot be addressed by adding IQ2_M to GGMLQuantizationType upstream, because GGMLQuantizationType.IQ1_M and LlamaFileType.MOSTLY_IQ2_M both equal 29 in the shared int space.

Fix. Accept either enum in is_valid_gguf_quant_type(): a LlamaFileType file type (members are prefixed MOSTLY_) or a GGMLQuantizationType tensor type. The existing suffix handling for extended names (e.g. Q4_K_MQ4_K) is preserved. Minimal change, no special-case table, so newly added file types are picked up automatically.

Test Plan

Unit: extend tests/test_gguf_utils.py::TestIsRemoteGGUF to cover file-type-only quants (IQ2_M/IQ3_M/IQ3_XS/MXFP4_MOE), with and without a vendor prefix (UD-), plus negative cases (IQ9_M, NOTATYPE). Each new assertion was confirmed to fail on the unpatched code and pass after the fix.

Test Result

$ pytest tests/test_gguf_utils.py -q
25 passed

Before the fix the new cases fail (e.g. is_remote_gguf("unsloth/Qwen3.6-35B-A3B-GGUF:UD-IQ2_M") returns False, reproducing the reported Repo id must use alphanumeric chars...); after the fix all 25 pass. ruff check, ruff format, and typos are clean on the changed files.

AI assistance was used to investigate, reproduce, and draft this change; the author reviewed the diff and validation output.

… in remote GGUF model IDs

The quant_type in a repo_id:quant_type reference is a GGUF file type
(LlamaFileType), not a tensor type (GGMLQuantizationType). File types
such as IQ2_M, IQ3_M, IQ3_XS and MXFP4_MOE have no GGMLQuantizationType
member, so is_valid_gguf_quant_type() rejected them and the whole
reference was treated as a plain repo id, failing with "Repo id must use
alphanumeric chars...". Accept either enum (LlamaFileType members are
prefixed MOSTLY_) so these file-type-only quants are recognized; the
existing extended-suffix handling (e.g. Q4_K_M -> Q4_K) is preserved.

Ports the fix approved at vllm-project/vllm#44218 to the plugin, as
requested by the maintainer since GGUF support is migrating here
(vllm-project/vllm#39612).

Fixes vllm-project/vllm#42734

Signed-off-by: Ting Sun <suntcrick@gmail.com>
Copy link
Copy Markdown
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Isotr0py Isotr0py merged commit 69dae43 into vllm-project:main Jun 3, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: UD-IQ2_M not supported

2 participants