Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to updated GPTQ with new PTB and C4 eval #541

Closed
Loufe opened this issue Mar 24, 2023 · 12 comments
Closed

Move to updated GPTQ with new PTB and C4 eval #541

Loufe opened this issue Mar 24, 2023 · 12 comments
Labels
enhancement New feature or request

Comments

@Loufe
Copy link
Contributor

Loufe commented Mar 24, 2023

Description

As visible in the referenced commit below, the GPTQ authors published some improvements to the quantization and those changes are now in QWOP's implementation.

I imagine some testing is required as QWOP didn't implement the changes as a PR with confirmed quality before merging. That said, at least adding a flag to TGW such that --new-eval is passed on to QWOP would be a good place to start for testing.

Additional Context

Added to the readme for GPTQ-for-LLaMA:

  • Changed to support new features proposed by GPTQ.
  • Slightly adjusted preprocessing of C4 and PTB for more realistic evaluations (used in our updated results); can be activated via the flag --new-eval.
  • two new tricks:--act-order (quantizing columns in order of decreasing activation size) and --true-sequential (performing sequential quantization even within a single Transformer block). Those fix GPTQ's strangely bad performance on the 7B model (from 7.15 to 6.09 Wiki2 PPL) and lead to slight improvements on most models/settings in general.

Commits as of early 2023-03-24 by QWOPQWOP200

@Loufe Loufe added the enhancement New feature or request label Mar 24, 2023
@USBhost
Copy link
Contributor

USBhost commented Mar 24, 2023

And we need support for batchsize as well. NVM just saw #530

@oobabooga
Copy link
Owner

It seems like, at least for now, llama_inference.py is unchanged.

As @USBhost pointed out, group-size is implemented here #530, but ideally I would like someone to quantize the weights and put a download link somewhere before merging. So far we have had decapoda-research, but the author hasn't updated the repositories in a while.

@USBhost
Copy link
Contributor

USBhost commented Mar 24, 2023

@oobabooga I could throw up a torrent for everyone. Are safetensors supported for these 4bit files? I have the resources to make 7b-65b. Also I have no problems keeping it updated along with GPTQ.

Hey I could also throw in HF converted models as well. But idk about the legal stuff about it.

@oobabooga
Copy link
Owner

I could throw up a torrent for everyone

That would be awesome. Safetensors is ideal because it is safer and loads faster than pt files. There is a PR for this (I haven't tested it yet but it should work): #529

It seems like @qwopqwop200 is using act-order instead of group-size by default now:

https://github.com/qwopqwop200/GPTQ-for-LLaMa#llama

I am not sure if this is compatible with the code in #530.

@USBhost
Copy link
Contributor

USBhost commented Mar 24, 2023

Give me around 6h as I get through it all. 65b takes an absurdly long time. So looking at the new "default" it seems to be both act-order and true-sequential. Also my old 7b batchsize 128 from 2 days got wikitext2 6.251303672790527, ptb 9.746664047241211, c4 7.778744220733643.

So before I start what default do you want me to go with? @oobabooga

@oobabooga
Copy link
Owner

I'd try the latest default with the smallest model first to make sure that the quantization works and that the resulting safetensors can be loaded in the web UI.

@USBhost
Copy link
Contributor

USBhost commented Mar 24, 2023

Okay.

@USBhost
Copy link
Contributor

USBhost commented Mar 24, 2023

I'd try the latest default with the smallest model first to make sure that the quantization works and that the resulting safetensors can be loaded in the web UI.

Loading llama-7b...
Could not find llama-7b-4bit.pt, exiting...

Are safetensors supported?

@oobabooga
Copy link
Owner

oobabooga commented Mar 24, 2023

@Loufe
Copy link
Contributor Author

Loufe commented Mar 24, 2023 via email

@USBhost
Copy link
Contributor

USBhost commented Mar 24, 2023

remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 4 (delta 3), reused 4 (delta 3), pack-reused 0
Unpacking objects: 100% (4/4), 500 bytes | 500.00 KiB/s, done.
From https://github.com/oobabooga/text-generation-webui
 * branch            refs/pull/529/head -> FETCH_HEAD
Auto-merging modules/GPTQ_loader.py
CONFLICT (content): Merge conflict in modules/GPTQ_loader.py
Automatic merge failed; fix conflicts and then commit the result.

No lol.... time to port this

@USBhost
Copy link
Contributor

USBhost commented Mar 24, 2023

Traceback (most recent call last):
  File "/UI/text-generation-webui/server.py", line 234, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/UI/text-generation-webui/modules/models.py", line 101, in load_model
    model = load_quantized(model_name)
  File "/UI/text-generation-webui/modules/GPTQ_loader.py", line 69, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits, 128)
  File "/UI/text-generation-webui/repositories/GPTQ-for-LLaMa/llama.py", line 259, in load_quant
    model.load_state_dict(safe_load(checkpoint))
  File "/UI/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
        size mismatch for model.layers.0.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 512]) from checkpoint, the shape in current model is torch.Size([32, 512]).
        size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 4096]) from checkpoint, the shape in current model is torch.Size([32, 4096]).
        size mismatch for model.layers.0.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 512]) from checkpoint, the shape in current model is torch.Size([32, 512]).
        size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 4096]) from checkpoint, the shape in current model is torch.Size([32, 4096]).
        size mismatch for model.layers.0.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 512]) from checkpoint, the shape in current model is torch.Size([32, 512]).
        size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 4096]) from checkpoint, the shape in current model is torch.Size([32, 4096]).
        size mismatch for model.layers.0.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 512]) from checkpoint, the shape in current model is torch.Size([32, 512]).
        size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 4096]) from checkpoint, the shape in current model is torch.Size([32, 4096]).
        size mismatch for model.layers.0.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 512]) from checkpoint, the shape in current model is torch.Size([86, 512]).
        etc..................

If I ported it right here's the error I get.
Edit: I believe I ported it right as I was able to load my 2day old 7b groupsize 128 fine. Currently redoing 7b as pt, just to clear safetensors. pt has the same issue. New 7b groupsize 128 safetensors works so the issue is currently with act-order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants