-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[StaticQuant] Update how block_size is calculated with Observers #815
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/815
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit de0b7fd with merge base 848e123 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
82265e7
to
2579f23
Compare
591472c
to
155e41c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, thanks for adding docs and the additional test! had one comment inline, please address the comment before merging
155e41c
to
7caa703
Compare
stack-info: PR: #815, branch: drisspg/stack/10
7caa703
to
de0b7fd
Compare
…ytorch#815) By wrapping attempt to load a model with `try {} catch (std::runtime_error) {}` and attempting to create model on GPU first, as attempt to load CPU model on CUDA destroys CUDA context (bugs/fixes againt PyTorch are coming, tracked in pytorch/pytorch#126547 ) Also, fix two bugs in the repo: - Initialize `Tokenizer::initialized_` to false - Change name of the tokenizer file in a workflow from `tokenizer.bin` to `tokenizer.model` Fixes pytorch/torchchat#709 Test plan: ``` python3 torchchat.py export --checkpoint-path checkpoints/stories15M/model.pth --output-dso-path model_cpu.so --device cpu python3 torchchat.py export --checkpoint-path checkpoints/stories15M/model.pth --output-dso-path model.so ./cmake-out/aoti_run ./model.so -z checkpoints/stories15M/tokenizer.model ./cmake-out/aoti_run ./model_cpu.so -z checkpoints/stories15M/tokenizer.model ```
Stacked PRs:
[StaticQuant] Update how block_size is calculated with Observers