-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitpackingv2 #307
Bitpackingv2 #307
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/307
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d9a94c8 with merge base 6a380a3 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some feedback here and in cuda mode, my worry is this code will be slow so will need some benchmarks and it's a big loss to lose torch.compile support
Per cuda mode chat, let's just fix the merge conflict and ensure the perf isn't regressed on some microbenchmarks relative to code in main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few minor pieces of feedback: comments would be nice too
* untested unified pack/unpack * tests written, issues fixed * removed conversion * works with compile + use pytest params * added hqq int4 fp16 mixed matmul benchmark for pack * added more repeats for benchmark and removed unused vars * added 1 more benchmark and now tests pass * added order to unpack and updated tests * removed main code and added a text example * added example * organized benchmarks
Improved code structure of the pack/unpack functions
Now supports trinary for bitnet applications
Now supports packing along any dimension for any size tensors
Now supports packing/unpacking elements such that lower indexed elements can be placed in the highest order bits or lowest order bits in the container