Skip to content

Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel#2932

Merged
githubnemo merged 22 commits intohuggingface:mainfrom
ZX-ModelCloud:gptqmodel
Jan 29, 2026
Merged

Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel#2932
githubnemo merged 22 commits intohuggingface:mainfrom
ZX-ModelCloud:gptqmodel

Conversation

@ZX-ModelCloud
Copy link
Contributor

Remove autogptq clutter and autogptq related configs that are not worth adding backward compat.
See
huggingface/transformers#41567
huggingface/optimum#2385

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
@ZX-ModelCloud ZX-ModelCloud changed the title [WIP] Fully deprecate AutoGPTQ for GPT-QModel [WIP] Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel Dec 1, 2025
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
@ZX-ModelCloud ZX-ModelCloud marked this pull request as ready for review December 2, 2025 09:07
@ZX-ModelCloud ZX-ModelCloud changed the title [WIP] Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel Dec 2, 2025
@Qubitium
Copy link
Contributor

Qubitium commented Dec 2, 2025

@BenjaminBossan PR is now synced to Optimum/Transformer pending Prs. Ready for final review for this portion. All relevant tests passing paired with the tranformer companion pr with pending gpt-qmodel 5.4.4 release (later today).

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
@BenjaminBossan
Copy link
Member

Thanks for all the work @ZX-ModelCloud and @Qubitium. Let's wait for the transformers PR to be merged and then do the final testing on PEFT.

There is a small merge conflict now in the Dockerfile. It's just because we use conda run now, it should be easy to fix. Could you please take care?

@ZX-ModelCloud
Copy link
Contributor Author

Thanks for all the work @ZX-ModelCloud and @Qubitium. Let's wait for the transformers PR to be merged and then do the final testing on PEFT.

There is a small merge conflict now in the Dockerfile. It's just because we use conda run now, it should be easy to fix. Could you please take care?

Thank you for your reply. The Dockerfile conflict has been resolved.

@BenjaminBossan
Copy link
Member

@ZX-ModelCloud @Qubitium The transformers PR is merged, so I think we can proceed with this one. Before I start testing and reviewing, does this PR supersede #2917? And should we have a min version check for v5.6.0?

@Qubitium
Copy link
Contributor

@ZX-ModelCloud @Qubitium The transformers PR is merged, so I think we can proceed with this one. Before I start testing and reviewing, does this PR supersede #2917? And should we have a min version check for v5.6.0?

Yes. this pr superceeds the older pr and we should have a min check for 5.6.0.

@BenjaminBossan
Copy link
Member

Yes. this pr superceeds the older pr and we should have a min check for 5.6.0.

Great, let's add the version check then. I closed the other PR.

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
def is_gptqmodel_available():
if importlib.util.find_spec("gptqmodel") is not None:
GPTQMODEL_MINIMUM_VERSION = packaging.version.parse("2.0.0")
GPTQMODEL_MINIMUM_VERSION = packaging.version.parse("5.6.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the version.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. The code changes look good, it's a nice clean up and simplification.

I did, however, have issues with testing gptqmodel:

1. pypcre

One of the dependencies, pypcre, could not be installed on my system. After some digging, I found that it tries to link to i386 version of libpcre2 and then the build fails:

'extra_link_args': ['/usr/lib/x86_64-linux-gnu/libpcre2-8.so', '/usr/lib/i386-linux-gnu/libpcre2-8.so.0']

...

/home/name/anaconda3/envs/peft-torch2.9/compiler_compat/ld: /usr/lib/i386-linux-gnu/libpcre2-8.so.0: error adding symbols: file in wrong format

I edited the code in the setup.py to remove this link and then it worked. Is this package really needed for gptqmodel?

2. random-word

More of an indirect issue, but one of your dependencies, random-word, doesn't really seem to be maintained anymore. It fixes the pytest version to < 9.0, which means that when I install gptqmodel, my pytest is downgraded. It would be great if this could be prevented.

3. marlin kernel

The final issue is that I get this error when trying to import gptqmodel:

ImportError: cannot import name 'gptqmodel_marlin_kernels' from 'gptqmodel.utils.marlin'

When I set GPTQMODEL_FORCE_BUILD=1, it says that "PyTorch C++ extension headers are unavailable", but I don't know why. When I try:

$ python -c "from torch.utils import cpp_extension as cpp_ext;print(cpp_ext)"
<module 'torch.utils.cpp_extension' from '/home/name/anaconda3/envs/peft-torch2.9/lib/python3.13/site-packages/torch/utils/cpp_extension.py'>

it works.

Anyway, I'm not even sure if I need marlin kernels, probably gptqmodel could be updated to not hard fail if they're not present.

LMK if you need further details to debug these issues.

@Qubitium
Copy link
Contributor

Qubitium commented Dec 15, 2025

One of the dependencies, pypcre, could not be installed on my system. After some digging, I found that it tries to link to i386 version of libpcre2 and then the build fails:
'extra_link_args': ['/usr/lib/x86_64-linux-gnu/libpcre2-8.so', '/usr/lib/i386-linux-gnu/libpcre2-8.so.0']
/home/name/anaconda3/envs/peft-torch2.9/compiler_compat/ld: /usr/lib/i386-linux-gnu/libpcre2-8.so.0: error adding symbols: file in wrong format

This is a serious/blocking bug for pypcre. I am trying to get to the bottom of this. Need to check why the i386 libs are getting included for the linker in x86_64 env as this will cause ld errors and another user also reported tihs same error. To reduce pypcre install barrier, I try to link to existing pcre2 before fallback source download and static compile.

  1. random-word

Thanks. We did not realize random-word pinned pytest. Going to check and this and remove pkg depend.

@Qubitium
Copy link
Contributor

Qubitium commented Dec 15, 2025

@BenjaminBossan gpt-qmodel will release 5.6.4 today with updated depends on pypcre 0.2.8 which fixed the multi-arch bug where both i386 and x86_64 libpcre2 exists in some desktop os environments. Our ci hosts only server ubutunu pkgs so never ran into this issue. Apparently many desktop os apps require i386 libs and install those by default. random-words depend has been removed as well.

@BenjaminBossan
Copy link
Member

Thanks @Qubitium I will give this version a try once it's on PyPI and will let you know if the error is resolved for me.

@Qubitium
Copy link
Contributor

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Qubitium Thanks for the note. I still had issues with the Marlin kernel. I think the problem stems from here:

https://github.com/ModelCloud/GPTQModel/blob/9a79b62ce32ad8d95c1c9dbbd25ec31869614638/gptqmodel/nn_modules/qlinear/marlin_awq.py#L21

I fixed it locally by adding gptqmodel_marlin_kernels = None to the except clause here:

https://github.com/ModelCloud/GPTQModel/blob/9a79b62ce32ad8d95c1c9dbbd25ec31869614638/gptqmodel/utils/marlin.py#L19-L22

After this "fix", I got gptqmodel working and the tests passed on my machine. Could you please take a look at this issue? Also, for testing purposes, what would I need to do to enable Marlin kernels?

def is_gptqmodel_available():
if importlib.util.find_spec("gptqmodel") is not None:
GPTQMODEL_MINIMUM_VERSION = packaging.version.parse("2.0.0")
GPTQMODEL_MINIMUM_VERSION = packaging.version.parse("5.6.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the version.

@Qubitium
Copy link
Contributor

Qubitium commented Dec 16, 2025

-- github double post ---

@Qubitium
Copy link
Contributor

Qubitium commented Dec 16, 2025

@Qubitium Thanks for the note. I still had issues with the Marlin kernel. I think the problem stems from here:

https://github.com/ModelCloud/GPTQModel/blob/9a79b62ce32ad8d95c1c9dbbd25ec31869614638/gptqmodel/nn_modules/qlinear/marlin_awq.py#L21

I fixed it locally by adding gptqmodel_marlin_kernels = None to the except clause here:

https://github.com/ModelCloud/GPTQModel/blob/9a79b62ce32ad8d95c1c9dbbd25ec31869614638/gptqmodel/utils/marlin.py#L19-L22

After this "fix", I got gptqmodel working and the tests passed on my machine. Could you please take a look at this issue? Also, for testing purposes, what would I need to do to enable Marlin kernels?

Thanks for the pointer. Yes. the awq marlin import was missing the var init, fixed on main. we will get 5.6.8 pushed out today which also fixed similar kernel import issues for macos.

to enable marlin kerenl:

  1. must have nvidia ampere class or higher gpu
  2. pass backend = 'marlin' in quantize config for manual control
  3. marlin is auto selected as top priority kernel by default

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member

BenjaminBossan commented Dec 16, 2025

@Qubitium Please call make style and then the PR should be good to be merged.

Ah wait, we shouldn't merge until after the v5 release, since we need the huggingface/transformers#41567, right?

@BenjaminBossan BenjaminBossan added the wait-transformers-v5 Don't merge before transformers v5 release. label Dec 16, 2025
@Qubitium
Copy link
Contributor

@BenjaminBossan make style pushed but unsure what the style error is from below or if related to this pr.

ruff check --fix src tests examples docs scripts docker
All checks passed!
ruff format src tests examples docs scripts docker
308 files left unchanged
doc-builder style src/peft tests docs/source --max_len 119
make: doc-builder: No such file or directory
make: *** [Makefile:17: style] Error 127

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We missed a spot where gptqmodel is imported unconditionally, which leads to the CI failure. Please check.

Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
@Qubitium
Copy link
Contributor

@BenjaminBossan We fixed uv compat. 5.6.12 releasd to pypi and prebuilt wheels are being created right now (likely take 1 hour to complete all the wheels). https://github.com/ModelCloud/GPTQModel/actions/runs/20301283813/job/58308710953

Once the wheels are done, you can trigger ci, and both uv and pip should install download the prebuilt wheel without having to build the full kernels.

@BenjaminBossan
Copy link
Member

We don't use uv in PEFT, so I think there is no issue here. Still good that it's fixed.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work here and elsewhere @ZX-ModelCloud @Qubitium. The PR looks good. As mentioned above, let's wait until transformers v5 to merge this PR.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@BenjaminBossan
Copy link
Member

Not stale, still waiting for transformers v5, I'm sure it's ready SOON :)

@BenjaminBossan
Copy link
Member

(failing docs build is unrelated)

@ZX-ModelCloud
Copy link
Contributor Author

The transformers v5.0.0 release is now available, so this PR can be safely merged.

@githubnemo githubnemo merged commit d748229 into huggingface:main Jan 29, 2026
@githubnemo
Copy link
Collaborator

Thanks! :)

githubnemo pushed a commit to githubnemo/peft that referenced this pull request Jan 30, 2026
Merging PR huggingface#2932 introduced two bugs that lead to failing
CPU and GPU pipelines.

Firstly, a merge on main in the PR without a follow-up CI
run re-introduced a deleted AutoGPTQ code branch which is now
removed again.

Secondly, gptqmodel seems, just like EETQ, to need a non-isolated
build environment to find external dependencies like PyTorch
to be installed correctly. As this was not present, the nightly
slow CI wasn't run.
githubnemo added a commit that referenced this pull request Feb 2, 2026
* Fix two issues introduced in AutoGPTQ deprecation

Merging PR #2932 introduced two bugs that lead to failing
CPU and GPU pipelines.

Firstly, a merge on main in the PR without a follow-up CI
run re-introduced a deleted AutoGPTQ code branch which is now
removed again.

Secondly, gptqmodel seems, just like EETQ, to need a non-isolated
build environment to find external dependencies like PyTorch
to be installed correctly. As this was not present, the nightly
slow CI wasn't run.

* Make sure that hf-doc-builder has requests

Apparently `hf-doc-builder` doesn't expose its dependency to `requests`
in the `setup.py` (it does in `pyproject.toml`). For some reason
`requests` is not installed anymore (some other dependency removed it
probably), so we're getting CI errors.

* Fix docker build test

The docker test build failed because of escaped values passed
to the next task. Used a workaround to undo shell escaping in the JSON
string without violating security measures.

* Ignore DeprecationWarning for BPE as well

---------

Co-authored-by: nemo <git@ningu.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wait-transformers-v5 Don't merge before transformers v5 release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants