Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel#2932
Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel#2932githubnemo merged 22 commits intohuggingface:mainfrom
Conversation
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
|
@BenjaminBossan PR is now synced to Optimum/Transformer pending Prs. Ready for final review for this portion. All relevant tests passing paired with the tranformer companion pr with pending gpt-qmodel 5.4.4 release (later today). |
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
|
Thanks for all the work @ZX-ModelCloud and @Qubitium. Let's wait for the transformers PR to be merged and then do the final testing on PEFT. There is a small merge conflict now in the Dockerfile. It's just because we use |
Thank you for your reply. The Dockerfile conflict has been resolved. |
|
@ZX-ModelCloud @Qubitium The transformers PR is merged, so I think we can proceed with this one. Before I start testing and reviewing, does this PR supersede #2917? And should we have a min version check for v5.6.0? |
Yes. this pr superceeds the older pr and we should have a min check for 5.6.0. |
Great, let's add the version check then. I closed the other PR. |
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
src/peft/import_utils.py
Outdated
| def is_gptqmodel_available(): | ||
| if importlib.util.find_spec("gptqmodel") is not None: | ||
| GPTQMODEL_MINIMUM_VERSION = packaging.version.parse("2.0.0") | ||
| GPTQMODEL_MINIMUM_VERSION = packaging.version.parse("5.6.0") |
There was a problem hiding this comment.
Let's update the version.
There was a problem hiding this comment.
Thank you for the PR. The code changes look good, it's a nice clean up and simplification.
I did, however, have issues with testing gptqmodel:
1. pypcre
One of the dependencies, pypcre, could not be installed on my system. After some digging, I found that it tries to link to i386 version of libpcre2 and then the build fails:
'extra_link_args': ['/usr/lib/x86_64-linux-gnu/libpcre2-8.so', '/usr/lib/i386-linux-gnu/libpcre2-8.so.0']
...
/home/name/anaconda3/envs/peft-torch2.9/compiler_compat/ld: /usr/lib/i386-linux-gnu/libpcre2-8.so.0: error adding symbols: file in wrong format
I edited the code in the setup.py to remove this link and then it worked. Is this package really needed for gptqmodel?
2. random-word
More of an indirect issue, but one of your dependencies, random-word, doesn't really seem to be maintained anymore. It fixes the pytest version to < 9.0, which means that when I install gptqmodel, my pytest is downgraded. It would be great if this could be prevented.
3. marlin kernel
The final issue is that I get this error when trying to import gptqmodel:
ImportError: cannot import name 'gptqmodel_marlin_kernels' from 'gptqmodel.utils.marlin'
When I set GPTQMODEL_FORCE_BUILD=1, it says that "PyTorch C++ extension headers are unavailable", but I don't know why. When I try:
$ python -c "from torch.utils import cpp_extension as cpp_ext;print(cpp_ext)"
<module 'torch.utils.cpp_extension' from '/home/name/anaconda3/envs/peft-torch2.9/lib/python3.13/site-packages/torch/utils/cpp_extension.py'>
it works.
Anyway, I'm not even sure if I need marlin kernels, probably gptqmodel could be updated to not hard fail if they're not present.
LMK if you need further details to debug these issues.
This is a serious/blocking bug for pypcre. I am trying to get to the bottom of this. Need to check why the i386 libs are getting included for the linker in x86_64 env as this will cause
Thanks. We did not realize |
|
@BenjaminBossan gpt-qmodel will release 5.6.4 today with updated depends on pypcre 0.2.8 which fixed the |
|
Thanks @Qubitium I will give this version a try once it's on PyPI and will let you know if the error is resolved for me. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
@Qubitium Thanks for the note. I still had issues with the Marlin kernel. I think the problem stems from here:
I fixed it locally by adding gptqmodel_marlin_kernels = None to the except clause here:
After this "fix", I got gptqmodel working and the tests passed on my machine. Could you please take a look at this issue? Also, for testing purposes, what would I need to do to enable Marlin kernels?
src/peft/import_utils.py
Outdated
| def is_gptqmodel_available(): | ||
| if importlib.util.find_spec("gptqmodel") is not None: | ||
| GPTQMODEL_MINIMUM_VERSION = packaging.version.parse("2.0.0") | ||
| GPTQMODEL_MINIMUM_VERSION = packaging.version.parse("5.6.0") |
There was a problem hiding this comment.
Let's update the version.
|
-- github double post --- |
Thanks for the pointer. Yes. the awq marlin import was missing the var init, fixed on main. we will get 5.6.8 pushed out today which also fixed similar kernel import issues for macos. to enable marlin kerenl:
|
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@Qubitium Please call Ah wait, we shouldn't merge until after the v5 release, since we need the huggingface/transformers#41567, right? |
|
@BenjaminBossan |
BenjaminBossan
left a comment
There was a problem hiding this comment.
We missed a spot where gptqmodel is imported unconditionally, which leads to the CI failure. Please check.
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
|
@BenjaminBossan We fixed uv compat. 5.6.12 releasd to pypi and prebuilt wheels are being created right now (likely take 1 hour to complete all the wheels). https://github.com/ModelCloud/GPTQModel/actions/runs/20301283813/job/58308710953 Once the wheels are done, you can trigger ci, and both uv and pip should install download the prebuilt wheel without having to build the full kernels. |
|
We don't use uv in PEFT, so I think there is no issue here. Still good that it's fixed. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for the great work here and elsewhere @ZX-ModelCloud @Qubitium. The PR looks good. As mentioned above, let's wait until transformers v5 to merge this PR.
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
|
Not stale, still waiting for transformers v5, I'm sure it's ready SOON :) |
|
(failing docs build is unrelated) |
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
|
The transformers v5.0.0 release is now available, so this PR can be safely merged. |
|
Thanks! :) |
Merging PR huggingface#2932 introduced two bugs that lead to failing CPU and GPU pipelines. Firstly, a merge on main in the PR without a follow-up CI run re-introduced a deleted AutoGPTQ code branch which is now removed again. Secondly, gptqmodel seems, just like EETQ, to need a non-isolated build environment to find external dependencies like PyTorch to be installed correctly. As this was not present, the nightly slow CI wasn't run.
* Fix two issues introduced in AutoGPTQ deprecation Merging PR #2932 introduced two bugs that lead to failing CPU and GPU pipelines. Firstly, a merge on main in the PR without a follow-up CI run re-introduced a deleted AutoGPTQ code branch which is now removed again. Secondly, gptqmodel seems, just like EETQ, to need a non-isolated build environment to find external dependencies like PyTorch to be installed correctly. As this was not present, the nightly slow CI wasn't run. * Make sure that hf-doc-builder has requests Apparently `hf-doc-builder` doesn't expose its dependency to `requests` in the `setup.py` (it does in `pyproject.toml`). For some reason `requests` is not installed anymore (some other dependency removed it probably), so we're getting CI errors. * Fix docker build test The docker test build failed because of escaped values passed to the next task. Used a workaround to undo shell escaping in the JSON string without violating security measures. * Ignore DeprecationWarning for BPE as well --------- Co-authored-by: nemo <git@ningu.net>
Remove autogptq clutter and autogptq related configs that are not worth adding backward compat.
See
huggingface/transformers#41567
huggingface/optimum#2385