-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to updated GPTQ with new PTB and C4 eval #541
Comments
And we need support for batchsize as well. NVM just saw #530 |
It seems like, at least for now, llama_inference.py is unchanged. As @USBhost pointed out, group-size is implemented here #530, but ideally I would like someone to quantize the weights and put a download link somewhere before merging. So far we have had decapoda-research, but the author hasn't updated the repositories in a while. |
@oobabooga I could throw up a torrent for everyone. Are safetensors supported for these 4bit files? I have the resources to make 7b-65b. Also I have no problems keeping it updated along with GPTQ. Hey I could also throw in HF converted models as well. But idk about the legal stuff about it. |
That would be awesome. Safetensors is ideal because it is safer and loads faster than pt files. There is a PR for this (I haven't tested it yet but it should work): #529 It seems like @qwopqwop200 is using https://github.com/qwopqwop200/GPTQ-for-LLaMa#llama I am not sure if this is compatible with the code in #530. |
Give me around 6h as I get through it all. 65b takes an absurdly long time. So looking at the new "default" it seems to be both act-order and true-sequential. Also my old 7b batchsize 128 from 2 days got wikitext2 6.251303672790527, ptb 9.746664047241211, c4 7.778744220733643. So before I start what default do you want me to go with? @oobabooga |
I'd try the latest default with the smallest model first to make sure that the quantization works and that the resulting safetensors can be loaded in the web UI. |
Okay. |
Are safetensors supported? |
Can you try this pull request code? https://github.com/oobabooga/text-generation-webui/pull/529/files |
That's awesome! Thanks USB. I think you'll need to checkout PR 529 as Ooba mentioned, as the safetensor support hasn't been merged yet.
…
On Mar. 24, 2023 at 1:21 p.m., USBhost ***@***.***> wrote:
>
>
> I'd try the latest default with the smallest model first to make sure that the quantization works and that the resulting safetensors can be loaded in the web UI.
>
>
Loading llama-b7... Could not find llama-b7-4bit.pt, exiting...
Are safetensors supported?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
No lol.... time to port this |
If I ported it right here's the error I get. |
Description
As visible in the referenced commit below, the GPTQ authors published some improvements to the quantization and those changes are now in QWOP's implementation.
I imagine some testing is required as QWOP didn't implement the changes as a PR with confirmed quality before merging. That said, at least adding a flag to TGW such that --new-eval is passed on to QWOP would be a good place to start for testing.
Additional Context
Added to the readme for GPTQ-for-LLaMA:
Commits as of early 2023-03-24 by QWOPQWOP200
The text was updated successfully, but these errors were encountered: