Skip to content

Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel#41567

Merged
ArthurZucker merged 87 commits intohuggingface:mainfrom
Qubitium:gptqmodel
Dec 10, 2025
Merged

Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel#41567
ArthurZucker merged 87 commits intohuggingface:mainfrom
Qubitium:gptqmodel

Conversation

@Qubitium
Copy link
Contributor

@Qubitium Qubitium commented Oct 14, 2025

Remove autogptq clutter and autogptq related configs that are not worth adding backward compat.

GPTQModel has a slight project name change (pypi package and import name stays the same) to GPT-QModel with - as we now have added awq/AutoAWQ into our repo and will be making pr soon to address awq loading using GPT-QModel.

GPTQConfig has the most important changes in this PR:

# New GPTQConfig Property. Applicable for sister Peft/Optimum PRs
act_group_aware (`bool`, *optional*, defaults to `True`):
    Use GAR (group aware activation order) during quantization. Has measurable positive impact on quantization
    quality. Only applicable when `desc_act = False`. Will forced to be `False` when `desc_act = True`.
    
    
# Removed GPTQConfig Properties:
use_cuda_fp16
use_exllama
exllama_config

The 3 removed properties are all related kernel selection. These 3 are a hot potatoe mess and legacy from autogptq. GPT-QModel uses unified backend (existing) property to select kernels. There were compat codes written to convert these 3 properties to backend behind the scenes in 2024 but no longer relevant for 2025.

Note:

  • Transformers/Optimum/Peft CI tests should never check for kernel.QUANT_TYPE (str). GPTQ-QModel will return best performing kernel for the relevant module and it may be different per module due to in/out features and other gptq/module properties in relation to device type + dtype + many factors.
  • CI tests should only assert check for kernel.QUANT_TYPE if the test specifies a specific kernel via backend selection.

@Rocketknight1
Copy link
Member

cc @MekkCyber for quantization

@Qubitium Qubitium changed the title [WIP] Fully deprecate AutoGPTQ for GPT-QModel [WIP] Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel Nov 20, 2025
@Qubitium
Copy link
Contributor Author

We have begun AutoAWQ deprecation as well.

  • Fused module codes have all been removed. AutoAWQ used to do quant linear level fusing but I do not believe that this is maintainable or good since if SGLang/vLLM adopts Transformers v5 for model loading, they will do their own auto fusing and the quant module should not interfere with this.

  • IPEX is deprecated by Intel and we have a new AwqTorchFused kernel (based on same Intel TorchFused kernel for GPTQ) so any code/unit tests for IPEX now will point to AwqTorchFused kernel.

@MekkCyber
Copy link
Contributor

Hi @Qubitium ! Thanks a lot for working on this! Quick question, what do you mean by AutoAWQ being part of GPT-QModel now? Did you integrate the entire library (including the transformers dependency, like AutoAWQ does), or did you just port over the linear layers, kernels, and related components?

@Qubitium
Copy link
Contributor Author

Qubitium commented Dec 9, 2025

LMK when we can merge !

We are performing final CI run in gpt-qmodel and once that passes. I will push 5.6.0 release asap to pypi so this PR can be ready for merge.

@Qubitium
Copy link
Contributor Author

Qubitium commented Dec 9, 2025

@SunMarc Ready. GPT-QModel v5.6.0 has been released with wheels currently building slowly for all the python/torch versions (may take a few hours): https://github.com/ModelCloud/GPTQModel/releases/tag/v5.6.0

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all for your work! I only have 2 nits

ZX-ModelCloud and others added 3 commits December 10, 2025 09:25
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
@Qubitium Qubitium requested a review from MekkCyber December 10, 2025 06:59
@Qubitium Qubitium requested a review from SunMarc December 10, 2025 10:35
Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for iterating lgtm

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: autoawq, gptq

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41567&sha=e16e48

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! About gptq-model installation, would it be possible at some point to fetch the correct wheels depending on the user setup like torch using pip ? This is actually a quite a barrier for new users, or try to give more instructions ?

@ArthurZucker ArthurZucker merged commit 8ebfd84 into huggingface:main Dec 10, 2025
20 of 23 checks passed
@BenjaminBossan
Copy link
Member

Just to be sure, this will be part of transformers v5?

@Qubitium
Copy link
Contributor Author

Qubitium commented Dec 11, 2025

Thanks ! About gptq-model installation, would it be possible at some point to fetch the correct wheels depending on the user setup like torch using pip ? This is actually a quite a barrier for new users, or try to give more instructions ?

Right now the pip install script already auto download the precompiled gpt-qmodel whl (125mb) from gitbub releases, if we have the env python,torch,cuda version match. But yeah, it is painful to install if you have to compile from src ranging from 10-20 minutes for compile.

@SunMarc
Copy link
Member

SunMarc commented Dec 11, 2025

Just to be sure, this will be part of transformers v5?

Yes !

@SunMarc
Copy link
Member

SunMarc commented Dec 11, 2025

Right now the pip install script already auto download the precompiled gpt-qmodel whl (125mb) from gitbub releases, if we have the env python,torch,cuda version match. But yeah, it is painful to install if you have to compile from src ranging from 10-20 minutes for compile.

Okay, I must have a mismatched values hence it wasn't downloading the pre-compiled wheels. Is there a way to return a warning if this is not the case ?

@Qubitium
Copy link
Contributor Author

Okay, I must have a mismatched values hence it wasn't downloading the pre-compiled wheels. Is there a way to return a warning if this is not the case ?

We missed building torch 2.9.1 whl for 5.6.0 so that is likely the cause. Will push 5.6.2 to resolve this other misc setup issues other users have reported plus logwarning when setup cannt match downloaded whl.

SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* fully deprecate autogptq

* remove use_cuda and use_exllama toggles are fully deprecated in gptqmodel

* format

* add `act_group_aware` property

* fix QUANT_TYPE assert

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* format

* mod awq import

* remove autoawq fuse support

* remove remove autoawq.config fuse

* cleanup

* remove awq fuse test

* fix import

* use gptqmodel

* cleanup

* remove get_modules_to_fuse

* mod require_auto_awq -> require_gptqmodel

* convert vertion to checkpoint_format

* check is_gptqmodel_available

* revert modules_to_not_convert

* pass bits, sym, desc_act

* fix awqconfig init

* fix wrong args

* fix ipex

* mod ipex version check

* cleanup

* fix awq_linear

* remove self.exllama_config = exllama_config

* cleanuo

* Revert "cleanuo"

This reverts commit 90019c6.

* update is_trainable

* cleanup

* remove fused

* call hf_select_quant_linear_v2()

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* Remove the "version" field from AwqConfig

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* Add torch_fused inferencefix test_gptq test

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* fix test_awq

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* fix test_awq

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* fix AwqConfig

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* call hf_select_quant_linear_v2()

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* remove auto_awq

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* fix typo

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* Compatible with legacy field: checkpoint_format

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* Compatible with legacy field: checkpoint_format

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* format

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* CLEANUP

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* update test_awq

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* fix get_modules_to_not_convert()

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* fix test_awq.py::AwqTest::test_quantized_model_exllama

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* Apply style fixes

* test_awq.py added EXPECTED_OUTPUT

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* update test_gptq.py

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* fix test_awq.py::AwqTest::test_save_pretrained

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* use assertEqual() instead of assertTrue()

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* fix test_quantized_layers_class()

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* remove ExllamaV1 Test

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* format

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* fix get_modules_to_not_convert()

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* added EXPECTED_OUTPUT

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* remove ExllamaV1 Test

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* add AwqBackend.AUTO_TRAINABLE

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* Update docs/source/zh/llm_tutorial.md

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* revert temporarily fix

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

---------

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: LRL2-ModelCloud <lrl2@modelcloud.ai>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
githubnemo added a commit to huggingface/peft that referenced this pull request Jan 29, 2026
Remove autogptq clutter and autogptq related configs that are not worth adding backward compat.

For reference, see
- huggingface/transformers#41567
- huggingface/optimum#2385


* fix gptq test

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* remove auto_gptq

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* call hf_select_quant_linear_v2()

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* remove auto_awq

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* cleanup

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* format

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* fix PeftAwqGPUTests

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* update GPTQMODEL_MINIMUM_VERSION to 5.6.0

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* update GPTQMODEL_MINIMUM_VERSION

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* Update minimum version requirement for gptqmodel

* style

* call is_gptqmodel_available()

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* call is_gptqmodel_available()

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* Update minimum version for gptqmodel to 5.6.12

* When hf_device_map does not exist, infer the device_map

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

* cleanup

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

---------

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants