-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SpinQuant to generate.py #1069
Conversation
No need to import the large Hadamard matrices required for SpinQuant if it isn't necessary
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1069
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1543e4f with merge base e7b33bc (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
thanks, any results we can show? |
@jerryzh168 I'm fixing a torch.compile issue first related to the Hadamard transform using in SpinQuant, after that I'll post some benchmark results here. If you want, we can keep this PR open and I'll push the changes here. |
SpinQuant now also works with torch.compile. Benchmark results (llama-2-7b, tested on an A100): Baseline + torch.compile
Spinquant (R4) + torch.compile
Spinquant (R1+R2+R4) + torch.compileNB: R1 and R2 are fused into the linear weights before inference takes place, so it is expected that they do not lead to additional overhead at inference time.
|
Results without torch.compile: Baseline
Spinquant (R4)
|
Thanks @tobiasvanderwerff, may I know which model you tested on, llama-2-7b? |
Yep, llama-2-7b, I'll add that to the benchmark. |
can you add benchmark numbers for R1+R2 as well? i think R4 is only for activation quantization |
would be good to add this info into a readme file inside the spinquant dir |
ready to merge? |
Yep, this is ready @jerryzh168 |
torchao/_models/llama/generate.py
eval.py
andgenerate.py
(No need to import the large Hadamard matrices required for SpinQuant otherwise)