Move to updated GPTQ with new PTB and C4 eval #541

Loufe · 2023-03-24T15:12:22Z

Description

As visible in the referenced commit below, the GPTQ authors published some improvements to the quantization and those changes are now in QWOP's implementation.

I imagine some testing is required as QWOP didn't implement the changes as a PR with confirmed quality before merging. That said, at least adding a flag to TGW such that --new-eval is passed on to QWOP would be a good place to start for testing.

Additional Context

Added to the readme for GPTQ-for-LLaMA:

Changed to support new features proposed by GPTQ.
Slightly adjusted preprocessing of C4 and PTB for more realistic evaluations (used in our updated results); can be activated via the flag --new-eval.
two new tricks:--act-order (quantizing columns in order of decreasing activation size) and --true-sequential (performing sequential quantization even within a single Transformer block). Those fix GPTQ's strangely bad performance on the 7B model (from 7.15 to 6.09 Wiki2 PPL) and lead to slight improvements on most models/settings in general.

Commits as of early 2023-03-24 by QWOPQWOP200

USBhost · 2023-03-24T15:17:19Z

And we need support for batchsize as well. NVM just saw #530

oobabooga · 2023-03-24T15:40:37Z

It seems like, at least for now, llama_inference.py is unchanged.

As @USBhost pointed out, group-size is implemented here #530, but ideally I would like someone to quantize the weights and put a download link somewhere before merging. So far we have had decapoda-research, but the author hasn't updated the repositories in a while.

USBhost · 2023-03-24T16:14:59Z

@oobabooga I could throw up a torrent for everyone. Are safetensors supported for these 4bit files? I have the resources to make 7b-65b. Also I have no problems keeping it updated along with GPTQ.

Hey I could also throw in HF converted models as well. But idk about the legal stuff about it.

oobabooga · 2023-03-24T16:27:46Z

I could throw up a torrent for everyone

That would be awesome. Safetensors is ideal because it is safer and loads faster than pt files. There is a PR for this (I haven't tested it yet but it should work): #529

It seems like @qwopqwop200 is using act-order instead of group-size by default now:

https://github.com/qwopqwop200/GPTQ-for-LLaMa#llama

I am not sure if this is compatible with the code in #530.

USBhost · 2023-03-24T16:36:37Z

Give me around 6h as I get through it all. 65b takes an absurdly long time. So looking at the new "default" it seems to be both act-order and true-sequential. Also my old 7b batchsize 128 from 2 days got wikitext2 6.251303672790527, ptb 9.746664047241211, c4 7.778744220733643.

So before I start what default do you want me to go with? @oobabooga

oobabooga · 2023-03-24T16:39:37Z

I'd try the latest default with the smallest model first to make sure that the quantization works and that the resulting safetensors can be loaded in the web UI.

USBhost · 2023-03-24T16:40:47Z

Okay.

USBhost · 2023-03-24T17:21:09Z

I'd try the latest default with the smallest model first to make sure that the quantization works and that the resulting safetensors can be loaded in the web UI.

Loading llama-7b...
Could not find llama-7b-4bit.pt, exiting...

Are safetensors supported?

oobabooga · 2023-03-24T17:29:41Z

Can you try this pull request code? https://github.com/oobabooga/text-generation-webui/pull/529/files

Loufe · 2023-03-24T17:30:00Z

That's awesome! Thanks USB. I think you'll need to checkout PR 529 as Ooba mentioned, as the safetensor support hasn't been merged yet.

…

On Mar. 24, 2023 at 1:21 p.m., USBhost ***@***.***> wrote: > > > I'd try the latest default with the smallest model first to make sure that the quantization works and that the resulting safetensors can be loaded in the web UI. > > Loading llama-b7... Could not find llama-b7-4bit.pt, exiting... Are safetensors supported? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

USBhost · 2023-03-24T17:33:42Z

remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 4 (delta 3), reused 4 (delta 3), pack-reused 0
Unpacking objects: 100% (4/4), 500 bytes | 500.00 KiB/s, done.
From https://github.com/oobabooga/text-generation-webui
 * branch            refs/pull/529/head -> FETCH_HEAD
Auto-merging modules/GPTQ_loader.py
CONFLICT (content): Merge conflict in modules/GPTQ_loader.py
Automatic merge failed; fix conflicts and then commit the result.

No lol.... time to port this

USBhost · 2023-03-24T17:44:31Z

Traceback (most recent call last):
  File "/UI/text-generation-webui/server.py", line 234, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/UI/text-generation-webui/modules/models.py", line 101, in load_model
    model = load_quantized(model_name)
  File "/UI/text-generation-webui/modules/GPTQ_loader.py", line 69, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits, 128)
  File "/UI/text-generation-webui/repositories/GPTQ-for-LLaMa/llama.py", line 259, in load_quant
    model.load_state_dict(safe_load(checkpoint))
  File "/UI/text-generation-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
        size mismatch for model.layers.0.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 512]) from checkpoint, the shape in current model is torch.Size([32, 512]).
        size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 4096]) from checkpoint, the shape in current model is torch.Size([32, 4096]).
        size mismatch for model.layers.0.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 512]) from checkpoint, the shape in current model is torch.Size([32, 512]).
        size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 4096]) from checkpoint, the shape in current model is torch.Size([32, 4096]).
        size mismatch for model.layers.0.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 512]) from checkpoint, the shape in current model is torch.Size([32, 512]).
        size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 4096]) from checkpoint, the shape in current model is torch.Size([32, 4096]).
        size mismatch for model.layers.0.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 512]) from checkpoint, the shape in current model is torch.Size([32, 512]).
        size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 4096]) from checkpoint, the shape in current model is torch.Size([32, 4096]).
        size mismatch for model.layers.0.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 512]) from checkpoint, the shape in current model is torch.Size([86, 512]).
        etc..................

If I ported it right here's the error I get.
Edit: I believe I ported it right as I was able to load my 2day old 7b groupsize 128 fine. Currently redoing 7b as pt, just to clear safetensors. pt has the same issue. New 7b groupsize 128 safetensors works so the issue is currently with act-order.

Loufe added the enhancement New feature or request label Mar 24, 2023

USBhost mentioned this issue Mar 24, 2023

Add support for the latest GPTQ models with group-size #530

Merged

oobabooga closed this as completed Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move to updated GPTQ with new PTB and C4 eval #541

Move to updated GPTQ with new PTB and C4 eval #541

Loufe commented Mar 24, 2023 •

edited

Loading

USBhost commented Mar 24, 2023 •

edited

Loading

oobabooga commented Mar 24, 2023

USBhost commented Mar 24, 2023 •

edited

Loading

oobabooga commented Mar 24, 2023

USBhost commented Mar 24, 2023

oobabooga commented Mar 24, 2023

USBhost commented Mar 24, 2023

USBhost commented Mar 24, 2023 •

edited

Loading

oobabooga commented Mar 24, 2023 •

edited

Loading

Loufe commented Mar 24, 2023 via email

USBhost commented Mar 24, 2023

USBhost commented Mar 24, 2023 •

edited

Loading

Move to updated GPTQ with new PTB and C4 eval #541

Move to updated GPTQ with new PTB and C4 eval #541

Comments

Loufe commented Mar 24, 2023 • edited Loading

USBhost commented Mar 24, 2023 • edited Loading

oobabooga commented Mar 24, 2023

USBhost commented Mar 24, 2023 • edited Loading

oobabooga commented Mar 24, 2023

USBhost commented Mar 24, 2023

oobabooga commented Mar 24, 2023

USBhost commented Mar 24, 2023

USBhost commented Mar 24, 2023 • edited Loading

oobabooga commented Mar 24, 2023 • edited Loading

Loufe commented Mar 24, 2023 via email

USBhost commented Mar 24, 2023

USBhost commented Mar 24, 2023 • edited Loading

Loufe commented Mar 24, 2023 •

edited

Loading

USBhost commented Mar 24, 2023 •

edited

Loading

USBhost commented Mar 24, 2023 •

edited

Loading

USBhost commented Mar 24, 2023 •

edited

Loading

oobabooga commented Mar 24, 2023 •

edited

Loading

USBhost commented Mar 24, 2023 •

edited

Loading