Skip to content

fix: handle vendor-prefixed GGUF quant types (e.g., UD-Q4_K_XL)#39470

Closed
ianliuy wants to merge 2 commits into
vllm-project:mainfrom
ianliuy:fix/issue-39198-gguf-vendor-prefix
Closed

fix: handle vendor-prefixed GGUF quant types (e.g., UD-Q4_K_XL)#39470
ianliuy wants to merge 2 commits into
vllm-project:mainfrom
ianliuy:fix/issue-39198-gguf-vendor-prefix

Conversation

@ianliuy

@ianliuy ianliuy commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #39198

GGUF model publishers like Unsloth use vendor-prefixed quant type names (e.g., UD-Q4_K_XL for Unsloth Dynamic quantization). These were not recognized by is_valid_gguf_quant_type(), causing the entire GGUF detection chain to fail. The colon separator was never stripped from the model name, and the full string (with :) was passed to HuggingFace APIs, resulting in HFValidationError.

Root Cause

is_valid_gguf_quant_type() only checks exact GGMLQuantizationType enum members + standard size suffixes (_M, _S, _L, _XL, _XS, _XXS). Vendor-prefixed types like UD-Q4_K_XL are not in the enum, so the full detection chain returns False. The : is never stripped, and the full model string is passed as-is to HF's hf_hub_download, which rejects the : character.

Fix

GGML quant type names never contain hyphens - they only use underscores and alphanumerics. A hyphen is therefore a reliable signal of a vendor prefix.

is_valid_gguf_quant_type() now:

  1. Tries the existing base validation (backward compatible)
  2. If that fails and the string contains a hyphen, splits on the first hyphen and validates the remainder

Example: UD-Q4_K_XL -> strip UD- -> Q4_K_XL -> strip suffix _XL -> Q4_K (valid enum member)

Changes

  • vllm/transformers_utils/gguf_utils.py: Extract _is_base_gguf_quant_type() helper; add vendor-prefix stripping; update error message
  • tests/transformers_utils/test_utils.py: Add tests for vendor-prefixed quant types across all GGUF utility functions

Testing

  • All new vendor-prefix test cases pass (valid: UD-Q4_K_XL, UD-F16, XX-Q4_K_M; invalid: UD-INVALID, UD-, -Q4_K)
  • Existing non-prefixed quant types unaffected
  • Edge cases covered: empty prefix, empty remainder, double-hyphen

GGUF model publishers like Unsloth use vendor-prefixed quant type names
(e.g., UD-Q4_K_XL for Unsloth Dynamic quantization). These were not
recognized by is_valid_gguf_quant_type(), causing the entire GGUF
detection chain to fail. The colon separator was never stripped from
the model name, and the full string (with ':') was passed to HuggingFace
APIs, resulting in HFValidationError.

Fix: Since GGML quant type names never contain hyphens, any hyphen in
the quant string reliably indicates a vendor prefix. The validation
function now strips the prefix (splitting on the first hyphen) before
checking the base type against the GGMLQuantizationType enum.

Fixes vllm-project#39198

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for vendor-prefixed GGUF quantization types, such as Unsloth Dynamic (UD-), by updating the validation logic and adding comprehensive unit tests. The is_valid_gguf_quant_type function was refactored to handle hyphenated prefixes, and error messages were updated to reflect this new support. A review comment suggests using rsplit instead of split when isolating the vendor prefix to ensure robustness if a vendor name itself contains a hyphen.

Comment thread vllm/transformers_utils/gguf_utils.py Outdated
# GGML quant type names never contain hyphens, so a hyphen indicates
# a vendor prefix.
if "-" in gguf_quant_type:
prefix, remainder = gguf_quant_type.split("-", 1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using split("-", 1) only supports vendor prefixes that do not contain hyphens. If a vendor prefix itself contains a hyphen (e.g., MY-VENDOR-Q4_K), this logic will fail to validate the remainder. Since GGML quantization types are guaranteed not to contain hyphens, using rsplit("-", 1) is a more robust way to isolate the quantization type from any vendor prefix.

Suggested change
prefix, remainder = gguf_quant_type.split("-", 1)
prefix, remainder = gguf_quant_type.rsplit("-", 1)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Applied in db8856f. Since GGML quant types never contain hyphens, rsplit correctly isolates the quant type from any multi-part vendor prefix (e.g., MY-VENDOR-Q4_K_XL -> remainder Q4_K_XL).

rsplit('-', 1) is more robust than split('-', 1) for multi-part vendor
prefixes (e.g., MY-VENDOR-Q4_K_XL). Since GGML quant type names never
contain hyphens, splitting from the right correctly isolates the quant
type regardless of how many hyphens the vendor prefix contains.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mergify

mergify Bot commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ianliuy.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 10, 2026
@ianliuy

ianliuy commented Apr 12, 2026

Copy link
Copy Markdown
Contributor Author

Closing this was superseded by #39471 which merged a similar fix (is_nonstandard_gguf_quant_type with rsplit). Same root cause analysis, same approach. Thanks!

@ianliuy ianliuy closed this Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: HFValidationError when trying to run a GGUF model with quants

1 participant