[Migration] Migrate GGUF quantization support to plugin#39612
[Migration] Migrate GGUF quantization support to plugin#39612Isotr0py wants to merge 34 commits into
Conversation
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
Documentation preview: https://vllm--39612.org.readthedocs.build/en/39612/ |
There was a problem hiding this comment.
Code Review
This pull request removes hardcoded GGUF support from the core vLLM codebase and replaces it with a more extensible ModelFormatHandler architecture. The changes involve deleting GGUF-specific CUDA kernels, documentation, and tests, while refactoring model loaders and layers (Linear, MoE, Embedding) to use generic quantization configuration hooks. I have no feedback to provide.
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Let's include them in this PR |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
Hi @Isotr0py, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
This pull request has merge conflicts that must be resolved before it can be |
… in remote GGUF model IDs The quant_type in a repo_id:quant_type reference is a GGUF file type (LlamaFileType), not a tensor type (GGMLQuantizationType). File types such as IQ2_M, IQ3_M, IQ3_XS and MXFP4_MOE have no GGMLQuantizationType member, so is_valid_gguf_quant_type() rejected them and the whole reference was treated as a plain repo id, failing with "Repo id must use alphanumeric chars...". Accept either enum (LlamaFileType members are prefixed MOSTLY_) so these file-type-only quants are recognized; the existing extended-suffix handling (e.g. Q4_K_M -> Q4_K) is preserved. Ports the fix approved at vllm-project/vllm#44218 to the plugin, as requested by the maintainer since GGUF support is migrating here (vllm-project/vllm#39612). Fixes vllm-project/vllm#42734 Signed-off-by: Ting Sun <suntcrick@gmail.com>
… in remote GGUF model IDs (#19) The quant_type in a repo_id:quant_type reference is a GGUF file type (LlamaFileType), not a tensor type (GGMLQuantizationType). File types such as IQ2_M, IQ3_M, IQ3_XS and MXFP4_MOE have no GGMLQuantizationType member, so is_valid_gguf_quant_type() rejected them and the whole reference was treated as a plain repo id, failing with "Repo id must use alphanumeric chars...". Accept either enum (LlamaFileType members are prefixed MOSTLY_) so these file-type-only quants are recognized; the existing extended-suffix handling (e.g. Q4_K_M -> Q4_K) is preserved. Ports the fix approved at vllm-project/vllm#44218 to the plugin, as requested by the maintainer since GGUF support is migrating here (vllm-project/vllm#39612). Fixes vllm-project/vllm#42734 Signed-off-by: Ting Sun <suntcrick@gmail.com>
Purpose
After this PR, GGUF support will be migrated to https://github.com/vllm-project/vllm-gguf-plugin, you can still use GGUF models normally after plugin installation!
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.