feature/docker_improvements#4768
Conversation
|
@Callum17 thanks for the new PR! GPTQ-for-LLaMa is a legacy loader that is only maintained because it is the only way to run GPTQ models on Pascal cards. Earlier this year there were additional compilation requirements to install it, but now it is simply a CUDA 12.1 wheel in the requirements.txt. If there is any special treatment to GPTQ-for-LLaMa in the dockerfile, like cloning a repository or trying to compile it, it should be removed. About extensions requirements: in the one-click installer, they are installed before the web UI requirements. This way, the web UI takes precendence and will not break due to an extension. I think that's the best way to do it. If these two are already okay, let me know and I'll merge the PR. |
…luded in core app requirements.txt
That simplifies things :) I did do a rebuild and try inference again with GPTQ-for-LLaMa. Do we expect GPTQ-for-LLaMa to be working?
The Dockerfile build will simply fail and give you a list of conflicts if you try to install any incompatible extensions via the optional BUILD_EXTENSIONS ARG. Probably better to fail at build time rather than potentially failing at runtime where it is harder to catch. |
|
@Callum17 , I have a ❓ |
That's expected. GPTQ-for-LLaMa doesn't work with models that use both groupsize and actorder like this one. Since the changes have been tested, let's merge the PR. I appreciate the help with improving the Dockerfile -- it was stuck in March 2023 before this PR. |
Checklist:
This is a the cleanup requested in:
#4144 (comment)
Changes
known issues
i. Some of the core dependencies have conflicts with extensions dependencies, so I've made extensions a configurable build arg BUILD_EXTENSIONS
These issues should be fixed downstream in the extensions.
Known culprits from extensions (there may be more):
ii.
GPTQ-for-Llama is broken by dependency upgrades. Discovered there were conflicts at build time. Resolved the build conflicts by loosening the GPTQ-for-Llama package requirements.txt constraints, but it looks like there were breaking changes that cause failure at inference time.
The Dockerfile effectively applies the following changes:
oobabooga/GPTQ-for-LLaMa@cuda...Callum17:GPTQ-for-LLaMa:bugfix/text-generation-webui-dependency-conflicts
But it was a naive fix attempt.
Running a GPTQ model with actorder and group_size yielded the following error:
@oobabooga do we want to try and fix the issues with GPTQ-for-Llama fork? Or are we planning to drop support for it? In which case I'll remove the references to it in this PR.
I'm not actually sure if GPTQ-for-Llama works in the current docker image build.
Testing
Otherwise this Docker build seems fine. Successfully tested inference with Transformers, GGUF, AutoGPTQ, Exllama, Exllamav2.