Skip to content

feature/docker_improvements#4768

Merged
oobabooga merged 13 commits into
oobabooga:devfrom
Callum17:feature/docker_improvements
Nov 30, 2023
Merged

feature/docker_improvements#4768
oobabooga merged 13 commits into
oobabooga:devfrom
Callum17:feature/docker_improvements

Conversation

@Callum17
Copy link
Copy Markdown
Contributor

@Callum17 Callum17 commented Nov 29, 2023

Checklist:

This is a the cleanup requested in:
#4144 (comment)

Changes

  • dropped venv
  • smaller / simpler Dockerfile definitions
  • ability to set app uid and gid permissions (handy for cloud deployments)

known issues
i. Some of the core dependencies have conflicts with extensions dependencies, so I've made extensions a configurable build arg BUILD_EXTENSIONS
These issues should be fixed downstream in the extensions.
Known culprits from extensions (there may be more):

ii.
GPTQ-for-Llama is broken by dependency upgrades. Discovered there were conflicts at build time. Resolved the build conflicts by loosening the GPTQ-for-Llama package requirements.txt constraints, but it looks like there were breaking changes that cause failure at inference time.

The Dockerfile effectively applies the following changes:
oobabooga/GPTQ-for-LLaMa@cuda...Callum17:GPTQ-for-LLaMa:bugfix/text-generation-webui-dependency-conflicts

But it was a naive fix attempt.
Running a GPTQ model with actorder and group_size yielded the following error:

 File "/home/app/text-generation-webui/modules/ui_model_menu.py", line 209, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
File "/home/app/text-generation-webui/modules/models.py", line 85, in load_model
output = load_func_map[loader](model_name)
File "/home/app/text-generation-webui/modules/models.py", line 336, in GPTQ_loader
model = modules.GPTQ_loader.load_quantized(model_name)
File "/home/app/text-generation-webui/modules/GPTQ_loader.py", line 141, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, pre_layer)
File "/home/app/.local/lib/python3.10/site-packages/gptq_for_llama/gptq_old/llama_inference_offload.py", line 236, in load_quant
model.load_state_dict(safe_load(checkpoint))
File "/home/app/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
    Unexpected key(s) in state_dict: "model.layers.0.self_attn.k_proj.g_idx", "model.layers.0.self_attn.o_proj.g_idx",

@oobabooga do we want to try and fix the issues with GPTQ-for-Llama fork? Or are we planning to drop support for it? In which case I'll remove the references to it in this PR.
I'm not actually sure if GPTQ-for-Llama works in the current docker image build.

Testing
Otherwise this Docker build seems fine. Successfully tested inference with Transformers, GGUF, AutoGPTQ, Exllama, Exllamav2.

@oobabooga
Copy link
Copy Markdown
Owner

@Callum17 thanks for the new PR! GPTQ-for-LLaMa is a legacy loader that is only maintained because it is the only way to run GPTQ models on Pascal cards. Earlier this year there were additional compilation requirements to install it, but now it is simply a CUDA 12.1 wheel in the requirements.txt. If there is any special treatment to GPTQ-for-LLaMa in the dockerfile, like cloning a repository or trying to compile it, it should be removed.

About extensions requirements: in the one-click installer, they are installed before the web UI requirements. This way, the web UI takes precendence and will not break due to an extension. I think that's the best way to do it.

If these two are already okay, let me know and I'll merge the PR.

@Callum17
Copy link
Copy Markdown
Contributor Author

Callum17 commented Nov 29, 2023

but now it is simply a CUDA 12.1 wheel in the requirements.txt

That simplifies things :)
Added a commit to drop additional Docker build steps for GPTQ-for-LLaMa.

I did do a rebuild and try inference again with GPTQ-for-LLaMa.
Specifically with https://huggingface.co/TheBloke/LLaMa-7B-GPTQ -b gptq-4bit-32g-actorder_True
Failed with the same error as before. Think the issue is not so much the binaries but rather interface changes in the other packages.

Do we expect GPTQ-for-LLaMa to be working?

This way, the web UI takes precendence and will not break due to an extension. I think that's the best way to do it.

The Dockerfile build will simply fail and give you a list of conflicts if you try to install any incompatible extensions via the optional BUILD_EXTENSIONS ARG. Probably better to fail at build time rather than potentially failing at runtime where it is harder to catch.

@mongolu
Copy link
Copy Markdown
Contributor

mongolu commented Nov 30, 2023

@Callum17 , I have a ❓
Why not using one-click in Docker file ?
It makes things a lot more easier.

@oobabooga
Copy link
Copy Markdown
Owner

Specifically with https://huggingface.co/TheBloke/LLaMa-7B-GPTQ -b gptq-4bit-32g-actorder_True
Failed with the same error as before. Think the issue is not so much the binaries but rather interface changes in the other packages.

That's expected. GPTQ-for-LLaMa doesn't work with models that use both groupsize and actorder like this one.

Since the changes have been tested, let's merge the PR. I appreciate the help with improving the Dockerfile -- it was stuck in March 2023 before this PR.

@oobabooga oobabooga merged commit 88620c6 into oobabooga:dev Nov 30, 2023
@Penagwin Penagwin mentioned this pull request Dec 12, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants