[Bugfix] Accept file-type-only quant types (IQ2_M, IQ3_XS, ...) in remote GGUF model IDs#19
Merged
Isotr0py merged 1 commit intoJun 3, 2026
Conversation
… in remote GGUF model IDs The quant_type in a repo_id:quant_type reference is a GGUF file type (LlamaFileType), not a tensor type (GGMLQuantizationType). File types such as IQ2_M, IQ3_M, IQ3_XS and MXFP4_MOE have no GGMLQuantizationType member, so is_valid_gguf_quant_type() rejected them and the whole reference was treated as a plain repo id, failing with "Repo id must use alphanumeric chars...". Accept either enum (LlamaFileType members are prefixed MOSTLY_) so these file-type-only quants are recognized; the existing extended-suffix handling (e.g. Q4_K_M -> Q4_K) is preserved. Ports the fix approved at vllm-project/vllm#44218 to the plugin, as requested by the maintainer since GGUF support is migrating here (vllm-project/vllm#39612). Fixes vllm-project/vllm#42734 Signed-off-by: Ting Sun <suntcrick@gmail.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Port the fix approved at vllm-project/vllm#44218 to the plugin, as @Isotr0py requested there (GGUF support is migrating here per #39612). Fixes vllm-project/vllm#42734.
vllm serve <repo>:UD-IQ2_M(and anyrepo_id:quant_typereference whose quant isIQ2_M,IQ3_M,IQ3_XS, orMXFP4_MOE) fails withRepo id must use alphanumeric chars...: the whole string is treated as a plain repo id instead of a remote GGUF reference.Root cause.
is_valid_gguf_quant_type()only checksGGMLQuantizationType(the tensor quantization enum), but thequant_typein arepo_id:quant_typereference is a GGUF file type (LlamaFileType) used to select the.gguffile. These are two distinct gguf enums, andIQ2_M/IQ3_M/IQ3_XS/MXFP4_MOEexist only inLlamaFileType— they have noGGMLQuantizationTypemember, and no valid base after suffix stripping (IQ2_M→IQ2, which doesn't exist). Sois_remote_gguf()returnsFalseand the reference is rejected.UD-IQ1_S/UD-IQ1_Mwork only becauseIQ1_S/IQ1_Mhappen to exist in both enums.This matches the upstream conclusion in ggml-org/llama.cpp#23085: the gguf maintainer confirms this is a downstream issue and that
general.file_typemust be parsed withLlamaFileType. It also cannot be addressed by addingIQ2_MtoGGMLQuantizationTypeupstream, becauseGGMLQuantizationType.IQ1_MandLlamaFileType.MOSTLY_IQ2_Mboth equal29in the shared int space.Fix. Accept either enum in
is_valid_gguf_quant_type(): aLlamaFileTypefile type (members are prefixedMOSTLY_) or aGGMLQuantizationTypetensor type. The existing suffix handling for extended names (e.g.Q4_K_M→Q4_K) is preserved. Minimal change, no special-case table, so newly added file types are picked up automatically.Test Plan
Unit: extend
tests/test_gguf_utils.py::TestIsRemoteGGUFto cover file-type-only quants (IQ2_M/IQ3_M/IQ3_XS/MXFP4_MOE), with and without a vendor prefix (UD-), plus negative cases (IQ9_M,NOTATYPE). Each new assertion was confirmed to fail on the unpatched code and pass after the fix.Test Result
Before the fix the new cases fail (e.g.
is_remote_gguf("unsloth/Qwen3.6-35B-A3B-GGUF:UD-IQ2_M")returnsFalse, reproducing the reportedRepo id must use alphanumeric chars...); after the fix all 25 pass.ruff check,ruff format, andtyposare clean on the changed files.AI assistance was used to investigate, reproduce, and draft this change; the author reviewed the diff and validation output.