Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel by Qubitium · Pull Request #41567 · huggingface/transformers

Qubitium · 2025-10-14T07:46:11Z

Remove autogptq clutter and autogptq related configs that are not worth adding backward compat.

GPTQModel has a slight project name change (pypi package and import name stays the same) to GPT-QModel with - as we now have added awq/AutoAWQ into our repo and will be making pr soon to address awq loading using GPT-QModel.

GPTQConfig has the most important changes in this PR:

# New GPTQConfig Property. Applicable for sister Peft/Optimum PRs
act_group_aware (`bool`, *optional*, defaults to `True`):
    Use GAR (group aware activation order) during quantization. Has measurable positive impact on quantization
    quality. Only applicable when `desc_act = False`. Will forced to be `False` when `desc_act = True`.
    
    
# Removed GPTQConfig Properties:
use_cuda_fp16
use_exllama
exllama_config

The 3 removed properties are all related kernel selection. These 3 are a hot potatoe mess and legacy from autogptq. GPT-QModel uses unified backend (existing) property to select kernels. There were compat codes written to convert these 3 properties to backend behind the scenes in 2024 but no longer relevant for 2025.

Note:

Transformers/Optimum/Peft CI tests should never check for kernel.QUANT_TYPE (str). GPTQ-QModel will return best performing kernel for the relevant module and it may be different per module due to in/out features and other gptq/module properties in relation to device type + dtype + many factors.
CI tests should only assert check for kernel.QUANT_TYPE if the test specifies a specific kernel via backend selection.

…odel

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

Rocketknight1 · 2025-10-15T12:18:22Z

cc @MekkCyber for quantization

Qubitium · 2025-11-20T03:26:49Z

We have begun AutoAWQ deprecation as well.

Fused module codes have all been removed. AutoAWQ used to do quant linear level fusing but I do not believe that this is maintainable or good since if SGLang/vLLM adopts Transformers v5 for model loading, they will do their own auto fusing and the quant module should not interfere with this.
IPEX is deprecated by Intel and we have a new AwqTorchFused kernel (based on same Intel TorchFused kernel for GPTQ) so any code/unit tests for IPEX now will point to AwqTorchFused kernel.

MekkCyber · 2025-11-20T09:02:29Z

Hi @Qubitium ! Thanks a lot for working on this! Quick question, what do you mean by AutoAWQ being part of GPT-QModel now? Did you integrate the entire library (including the transformers dependency, like AutoAWQ does), or did you just port over the linear layers, kernels, and related components?

Qubitium · 2025-12-09T03:56:26Z

LMK when we can merge !

We are performing final CI run in gpt-qmodel and once that passes. I will push 5.6.0 release asap to pypi so this PR can be ready for merge.

Qubitium · 2025-12-09T12:08:53Z

@SunMarc Ready. GPT-QModel v5.6.0 has been released with wheels currently building slowly for all the python/torch versions (may take a few hours): https://github.com/ModelCloud/GPTQModel/releases/tag/v5.6.0

HuggingFaceDocBuilderDev · 2025-12-09T13:00:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

MekkCyber

Thank you all for your work! I only have 2 nits

docs/source/zh/llm_tutorial.md

src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

MekkCyber

Thank you for iterating lgtm

github-actions · 2025-12-10T16:29:32Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: autoawq, gptq

github-actions · 2025-12-10T16:38:20Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41567&sha=e16e48

SunMarc

Thanks ! About gptq-model installation, would it be possible at some point to fetch the correct wheels depending on the user setup like torch using pip ? This is actually a quite a barrier for new users, or try to give more instructions ?

BenjaminBossan · 2025-12-11T15:24:16Z

Just to be sure, this will be part of transformers v5?

Qubitium · 2025-12-11T15:29:58Z

Thanks ! About gptq-model installation, would it be possible at some point to fetch the correct wheels depending on the user setup like torch using pip ? This is actually a quite a barrier for new users, or try to give more instructions ?

Right now the pip install script already auto download the precompiled gpt-qmodel whl (125mb) from gitbub releases, if we have the env python,torch,cuda version match. But yeah, it is painful to install if you have to compile from src ranging from 10-20 minutes for compile.

SunMarc · 2025-12-11T15:31:53Z

Just to be sure, this will be part of transformers v5?

Yes !

SunMarc · 2025-12-11T15:32:48Z

Right now the pip install script already auto download the precompiled gpt-qmodel whl (125mb) from gitbub releases, if we have the env python,torch,cuda version match. But yeah, it is painful to install if you have to compile from src ranging from 10-20 minutes for compile.

Okay, I must have a mismatched values hence it wasn't downloading the pre-compiled wheels. Is there a way to return a warning if this is not the case ?

Qubitium · 2025-12-12T00:24:20Z

Okay, I must have a mismatched values hence it wasn't downloading the pre-compiled wheels. Is there a way to return a warning if this is not the case ?

We missed building torch 2.9.1 whl for 5.6.0 so that is likely the cause. Will push 5.6.2 to resolve this other misc setup issues other users have reported plus logwarning when setup cannt match downloaded whl.

* fully deprecate autogptq * remove use_cuda and use_exllama toggles are fully deprecated in gptqmodel * format * add `act_group_aware` property * fix QUANT_TYPE assert Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * format * mod awq import * remove autoawq fuse support * remove remove autoawq.config fuse * cleanup * remove awq fuse test * fix import * use gptqmodel * cleanup * remove get_modules_to_fuse * mod require_auto_awq -> require_gptqmodel * convert vertion to checkpoint_format * check is_gptqmodel_available * revert modules_to_not_convert * pass bits, sym, desc_act * fix awqconfig init * fix wrong args * fix ipex * mod ipex version check * cleanup * fix awq_linear * remove self.exllama_config = exllama_config * cleanuo * Revert "cleanuo" This reverts commit 90019c6. * update is_trainable * cleanup * remove fused * call hf_select_quant_linear_v2() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Remove the "version" field from AwqConfig Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Add torch_fused inferencefix test_gptq test Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix test_awq Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix test_awq Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix AwqConfig Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * call hf_select_quant_linear_v2() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * remove auto_awq Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix typo Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Compatible with legacy field: checkpoint_format Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Compatible with legacy field: checkpoint_format Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * format Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * CLEANUP Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * update test_awq Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix get_modules_to_not_convert() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix test_awq.py::AwqTest::test_quantized_model_exllama Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Apply style fixes * test_awq.py added EXPECTED_OUTPUT Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * update test_gptq.py Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix test_awq.py::AwqTest::test_save_pretrained Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * use assertEqual() instead of assertTrue() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix test_quantized_layers_class() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * remove ExllamaV1 Test Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * format Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix get_modules_to_not_convert() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * added EXPECTED_OUTPUT Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * remove ExllamaV1 Test Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * add AwqBackend.AUTO_TRAINABLE Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Update docs/source/zh/llm_tutorial.md Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * revert temporarily fix Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> --------- Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: LRL2-ModelCloud <lrl2@modelcloud.ai> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

Remove autogptq clutter and autogptq related configs that are not worth adding backward compat. For reference, see - huggingface/transformers#41567 - huggingface/optimum#2385 * fix gptq test Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * remove auto_gptq Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * call hf_select_quant_linear_v2() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * remove auto_awq Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * cleanup Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * format Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * fix PeftAwqGPUTests Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * update GPTQMODEL_MINIMUM_VERSION to 5.6.0 Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * update GPTQMODEL_MINIMUM_VERSION Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Update minimum version requirement for gptqmodel * style * call is_gptqmodel_available() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * call is_gptqmodel_available() Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Update minimum version for gptqmodel to 5.6.12 * When hf_device_map does not exist, infer the device_map Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * cleanup Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> --------- Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>

Qubitium and others added 7 commits October 14, 2025 07:33

fully deprecate autogptq

79013a4

remove use_cuda and use_exllama toggles are fully deprecated in gptqm…

0400ee5

…odel

format

cada621

add act_group_aware property

b82e291

Merge branch 'main' into gptqmodel

6058d40

fix QUANT_TYPE assert

c1d907f

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

Merge remote-tracking branch 'origin/gptqmodel' into gptqmodel

0326488

Qubitium added 5 commits October 15, 2025 20:29

Merge branch 'main' into gptqmodel

eda2f44

Merge branch 'main' into gptqmodel

821fd5b

format

8a7da2a

Merge branch 'main' into gptqmodel

7f55adc

Merge branch 'main' into gptqmodel

5938f75

Qubitium mentioned this pull request Nov 20, 2025

CI: Add gptqmodel to the CI huggingface/peft#2342

Closed

Merge branch 'main' into gptqmodel

1fdd855

Qubitium changed the title ~~[WIP] Fully deprecate AutoGPTQ for GPT-QModel~~ [WIP] Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel Nov 20, 2025

LRL2-ModelCloud added 5 commits November 20, 2025 10:49

mod awq import

18a6d80

remove autoawq fuse support

fece25c

remove remove autoawq.config fuse

000e223

cleanup

c9f9c02

remove awq fuse test

d839d2b

LRL2-ModelCloud force-pushed the gptqmodel branch from 5492303 to d839d2b Compare November 20, 2025 03:26

LRL2-ModelCloud added 5 commits November 20, 2025 13:54

fix import

32dd6ac

use gptqmodel

ed0c0a3

cleanup

0cb315d

remove get_modules_to_fuse

a930c47

mod require_auto_awq -> require_gptqmodel

13191b9

convert vertion to checkpoint_format

e91e272

Merge branch 'main' into gptqmodel

90a01c5

Merge branch 'main' into gptqmodel

8cf9863

Merge branch 'main' into gptqmodel

e45c7dc

SunMarc added 2 commits December 9, 2025 14:38

Merge branch 'main' into gptqmodel

99251a2

Merge branch 'main' into gptqmodel

5e836e9

MekkCyber reviewed Dec 9, 2025

View reviewed changes

docs/source/zh/llm_tutorial.md Outdated Show resolved Hide resolved

src/transformers/models/auto/tokenization_auto.py Outdated Show resolved Hide resolved

ZX-ModelCloud and others added 3 commits December 10, 2025 09:25

Update docs/source/zh/llm_tutorial.md

3847f86

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

revert temporarily fix

2d24fd5

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>

Merge branch 'main' into gptqmodel

ff64eae

Qubitium requested a review from MekkCyber December 10, 2025 06:59

Merge branch 'main' into gptqmodel

5ea0feb

Qubitium requested a review from SunMarc December 10, 2025 10:35

Merge branch 'main' into gptqmodel

484b1f2

MekkCyber approved these changes Dec 10, 2025

View reviewed changes

SunMarc added 2 commits December 10, 2025 15:23

Merge branch 'main' into gptqmodel

aff246d

Merge branch 'main' into gptqmodel

e16e485

SunMarc approved these changes Dec 10, 2025

View reviewed changes

ArthurZucker merged commit 8ebfd84 into huggingface:main Dec 10, 2025
20 of 23 checks passed

Conversation

Qubitium commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Oct 15, 2025

Uh oh!

Qubitium commented Nov 20, 2025

Uh oh!

MekkCyber commented Nov 20, 2025

Uh oh!

Qubitium commented Dec 9, 2025

Uh oh!

Qubitium commented Dec 9, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 9, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

SunMarc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BenjaminBossan commented Dec 11, 2025

Uh oh!

Qubitium commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SunMarc commented Dec 11, 2025

Uh oh!

SunMarc commented Dec 11, 2025

Uh oh!

Qubitium commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Qubitium commented Oct 14, 2025 •

edited

Loading

SunMarc left a comment •

edited

Loading

Qubitium commented Dec 11, 2025 •

edited

Loading