Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel#41567
Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel#41567ArthurZucker merged 87 commits intohuggingface:mainfrom
Conversation
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
|
cc @MekkCyber for quantization |
5492303 to
d839d2b
Compare
|
We have begun AutoAWQ deprecation as well.
|
|
Hi @Qubitium ! Thanks a lot for working on this! Quick question, what do you mean by AutoAWQ being part of GPT-QModel now? Did you integrate the entire library (including the transformers dependency, like AutoAWQ does), or did you just port over the linear layers, kernels, and related components? |
We are performing final CI run in gpt-qmodel and once that passes. I will push 5.6.0 release asap to pypi so this PR can be ready for merge. |
|
@SunMarc Ready. GPT-QModel v5.6.0 has been released with wheels currently building slowly for all the python/torch versions (may take a few hours): https://github.com/ModelCloud/GPTQModel/releases/tag/v5.6.0 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
MekkCyber
left a comment
There was a problem hiding this comment.
Thank you all for your work! I only have 2 nits
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
MekkCyber
left a comment
There was a problem hiding this comment.
Thank you for iterating lgtm
|
[For maintainers] Suggested jobs to run (before merge) run-slow: autoawq, gptq |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41567&sha=e16e48 |
|
Just to be sure, this will be part of transformers v5? |
Right now the pip install script already auto download the precompiled gpt-qmodel whl (125mb) from gitbub releases, if we have the env python,torch,cuda version match. But yeah, it is painful to install if you have to compile from src ranging from 10-20 minutes for compile. |
Yes ! |
Okay, I must have a mismatched values hence it wasn't downloading the pre-compiled wheels. Is there a way to return a warning if this is not the case ? |
We missed building torch 2.9.1 whl for 5.6.0 so that is likely the cause. Will push 5.6.2 to resolve this other misc setup issues other users have reported plus logwarning when setup cannt match downloaded whl. |
* fully deprecate autogptq * remove use_cuda and use_exllama toggles are fully deprecated in gptqmodel * format * add `act_group_aware` property * fix QUANT_TYPE assert Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * format * mod awq import * remove autoawq fuse support * remove remove autoawq.config fuse * cleanup * remove awq fuse test * fix import * use gptqmodel * cleanup * remove get_modules_to_fuse * mod require_auto_awq -> require_gptqmodel * convert vertion to checkpoint_format * check is_gptqmodel_available * revert modules_to_not_convert * pass bits, sym, desc_act * fix awqconfig init * fix wrong args * fix ipex * mod ipex version check * cleanup * fix awq_linear * remove self.exllama_config = exllama_config * cleanuo * Revert "cleanuo" This reverts commit 90019c6. * update is_trainable * cleanup * remove fused * call hf_select_quant_linear_v2() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Remove the "version" field from AwqConfig Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Add torch_fused inferencefix test_gptq test Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix test_awq Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix test_awq Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix AwqConfig Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * call hf_select_quant_linear_v2() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * remove auto_awq Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix typo Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Compatible with legacy field: checkpoint_format Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Compatible with legacy field: checkpoint_format Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * format Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * CLEANUP Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * update test_awq Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix get_modules_to_not_convert() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix test_awq.py::AwqTest::test_quantized_model_exllama Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Apply style fixes * test_awq.py added EXPECTED_OUTPUT Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * update test_gptq.py Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix test_awq.py::AwqTest::test_save_pretrained Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * use assertEqual() instead of assertTrue() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix test_quantized_layers_class() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * remove ExllamaV1 Test Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * format Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix get_modules_to_not_convert() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * added EXPECTED_OUTPUT Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * remove ExllamaV1 Test Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * add AwqBackend.AUTO_TRAINABLE Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Update docs/source/zh/llm_tutorial.md Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * revert temporarily fix Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> --------- Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: LRL2-ModelCloud <lrl2@modelcloud.ai> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Remove autogptq clutter and autogptq related configs that are not worth adding backward compat. For reference, see - huggingface/transformers#41567 - huggingface/optimum#2385 * fix gptq test Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * remove auto_gptq Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * call hf_select_quant_linear_v2() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * remove auto_awq Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * cleanup Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * format Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix PeftAwqGPUTests Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * update GPTQMODEL_MINIMUM_VERSION to 5.6.0 Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * update GPTQMODEL_MINIMUM_VERSION Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Update minimum version requirement for gptqmodel * style * call is_gptqmodel_available() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * call is_gptqmodel_available() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Update minimum version for gptqmodel to 5.6.12 * When hf_device_map does not exist, infer the device_map Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * cleanup Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> --------- Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
Remove autogptq clutter and autogptq related configs that are not worth adding backward compat.
GPTQModel has a slight project name change (pypi package and import name stays the same) to GPT-QModel with
-as we now have addedawq/AutoAWQ into our repo and will be making pr soon to address awq loading using GPT-QModel.GPTQConfighas the most important changes in this PR:The 3 removed properties are all related
kernelselection. These 3 are a hot potatoe mess and legacy from autogptq. GPT-QModel uses unifiedbackend(existing) property to select kernels. There were compat codes written toconvertthese 3 properties tobackendbehind the scenes in 2024 but no longer relevant for 2025.Note:
kernel.QUANT_TYPE(str). GPTQ-QModel will return best performing kernel for the relevant module and it may be different per module due to in/out features and other gptq/module properties in relation to device type + dtype + many factors.kernel.QUANT_TYPEif the test specifies a specific kernel viabackendselection.