-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[float8] improve eager numerics for dynamic scales and gets on par with torch.compile #904
Commits on Sep 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 6bf0f5c - Browse repository at this point
Copy the full SHA 6bf0f5cView commit details -
leave torch.linalg.vector_norm for another PR
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 553687f - Browse repository at this point
Copy the full SHA 553687fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 19a592d - Browse repository at this point
Copy the full SHA 19a592dView commit details -
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 218290e - Browse repository at this point
Copy the full SHA 218290eView commit details -
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 24ec914 - Browse repository at this point
Copy the full SHA 24ec914View commit details
Commits on Sep 21, 2024
-
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for c099486 - Browse repository at this point
Copy the full SHA c099486View commit details -
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for b93ffc8 - Browse repository at this point
Copy the full SHA b93ffc8View commit details -
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for ebff416 - Browse repository at this point
Copy the full SHA ebff416View commit details -
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 8978ab2 - Browse repository at this point
Copy the full SHA 8978ab2View commit details
Commits on Sep 22, 2024
-
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for f17dc12 - Browse repository at this point
Copy the full SHA f17dc12View commit details
Commits on Sep 26, 2024
-
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 511c751 - Browse repository at this point
Copy the full SHA 511c751View commit details -
Add tutorial for trainable tensor subclass (pytorch#908)
Summary: The new tutorial provides an example of how to implement a trainable tensor subclass that wraps quantized data. This extends the existing `MyDTypeTensor` with a few necessary steps to ensure proper gradient updates, namely: 1. Define a differentiable constructor 2. Define backward pass for ops of interest (e.g. torch.nn.functional.linear) 3. Handle special ops used by the optimizer (e.g. aten.add, aten.add_) Test Plan: python tutorials/developer_api_guide/my_trainable_tensor_subclass.py
Configuration menu - View commit details
-
Copy full SHA for 9becda1 - Browse repository at this point
Copy the full SHA 9becda1View commit details -
Introducing 1-bit quantization for Llama in torchchat (pytorch#910)
Differential Revision: D63052325 Pull Request resolved: pytorch#911
Configuration menu - View commit details
-
Copy full SHA for e4fdca9 - Browse repository at this point
Copy the full SHA e4fdca9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0cd4d37 - Browse repository at this point
Copy the full SHA 0cd4d37View commit details -
[float8] fix typo in bitwise_identical unit test (pytorch#918)
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 014558d - Browse repository at this point
Copy the full SHA 014558dView commit details -
Adding example for quantized tensor + tensor parallelism (pytorch#785)
* [WIP] Adding example for quantized tensor + tensor parallelism Summary: This PR adds an example of how quantized tensor subclass can work with DTensor: https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md End goal is to rewrite https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/llama2.py with normal llama2 implementation and show case with DTensor + AffineQuantizedTensor + torch.compile we can get on par performance with the custom tensor parallel implementation Test Plan: torchrun --standalone --nnodes=1 --nproc-per-node=4 tutorials/developer_api_guide/tensor_parallel.py Reviewers: Subscribers: Tasks: Tags: * tensor parallel file * Use DTensor.from instead of distribute_tensor * implementing aten.slice.Tensor (WIP) * working * some shape fix and use more quant primitive ops * Add rowwise test * make rowwise sharding work * compile still not working yet * fake tensor didn't pick up shape changes from transpose * backend='eager' * change transpose to non-inplace op * add error message * works now with torch nightly * remove print * ruff * Clean up * Fix device id --------- Co-authored-by: Ke Wen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3267402 - Browse repository at this point
Copy the full SHA 3267402View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1e07eff - Browse repository at this point
Copy the full SHA 1e07effView commit details -
Add workaround to recover the perf for quantized vit in torch.compile (…
…pytorch#926) Add temporary workaround to recover the perf for quantized vit under torch.compile Summary: Recently we found a perf drop in quantized vit due to pytorch#898 (comment) This PR add a temp fix until we figure out the longer term fix. I think ideally we should figure out why the tensor subclass check failed in torch.compile (https://github.com/pytorch/pytorch/blob/e4d294221b140fdbb49a64f297bc60c9fcc2f80e/torch/nn/modules/activation.py#L1286) and fix that Test Plan: python tutorials/quantize_vit/run_vit_b_quant.py Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for ebdeed0 - Browse repository at this point
Copy the full SHA ebdeed0View commit details -
clean up device checks in float8 unit test files (pytorch#923)
Summary: While working on rowwise scaling I noticed that some of the CUDA device capability checks we had in the test files did not make sense, cleaning this up. Test Plan: tests pass on my H100 CI, it should skip less tests now since CI only has CUDA capability 8, 9 Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 09ffa22 - Browse repository at this point
Copy the full SHA 09ffa22View commit details -
[low-bit optim] Change 8-bit and FP8 optim block size from 2048 to 25…
…6 to match new bnb v0.44 (pytorch#927)
Configuration menu - View commit details
-
Copy full SHA for 0b8dd85 - Browse repository at this point
Copy the full SHA 0b8dd85View commit details -
Configuration menu - View commit details
-
Copy full SHA for 87faf04 - Browse repository at this point
Copy the full SHA 87faf04View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3a9fdb0 - Browse repository at this point
Copy the full SHA 3a9fdb0View commit details -
Remove two if statements in fp8 padding (pytorch#935)
Reviewed By: vkuzo Differential Revision: D63051205 Pull Request resolved: pytorch#935 Approved by: https://github.com/vkuzo
Configuration menu - View commit details
-
Copy full SHA for fc6c393 - Browse repository at this point
Copy the full SHA fc6c393View commit details -
[Distributed] Improve sharding example (pytorch#937)
* [Distributed] Improve sharding example * Add comment
Configuration menu - View commit details
-
Copy full SHA for 0043ace - Browse repository at this point
Copy the full SHA 0043aceView commit details -
Add composable QAT quantizer (pytorch#938)
Summary: This is a utility for users who wish to apply multiple QAT quantizers to their models. In the near future, we expect to add an embedding QAT quantizer that composes with the existing linear QAT quantizers. Test Plan: python test/quantization/test_qat.py -k test_composable_qat_quantizer
Configuration menu - View commit details
-
Copy full SHA for ab3435c - Browse repository at this point
Copy the full SHA ab3435cView commit details -
resolve conflict with latest main
Differential Revision: D63048850 Pull Request resolved: pytorch#912
Configuration menu - View commit details
-
Copy full SHA for a05a40f - Browse repository at this point
Copy the full SHA a05a40fView commit details -
Differential Revision: D62394341 Pull Request resolved: pytorch#897
Configuration menu - View commit details
-
Copy full SHA for 334891b - Browse repository at this point
Copy the full SHA 334891bView commit details -
Add compile tests to test suite (pytorch#906)
* Add compile tests to test suite Summary: This is a follow up PR addressing pytorch#839 (comment) We can add more compiler related tests in the future. Next * refactor a bit to use quantize_ API directly * use the test suite in existing API tests Test Plan: python torchao/testing/utils.py Reviewers: Subscribers: Tasks: Tags: * rename * add result check
Configuration menu - View commit details
-
Copy full SHA for c706139 - Browse repository at this point
Copy the full SHA c706139View commit details -
Fix up CMakeLists and reorganize some code locations
Differential Revision: D62711903 Pull Request resolved: pytorch#948
Configuration menu - View commit details
-
Copy full SHA for 93554c0 - Browse repository at this point
Copy the full SHA 93554c0View commit details -
[float8] all-reduce amax on dp mesh instead of global pg (pytorch#933)
* [float8] all-reduce amax on dp mesh instead of global pg Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * liner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * improve comments Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * move hp tensor inside if Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for efd9bb9 - Browse repository at this point
Copy the full SHA efd9bb9View commit details -
int8 dynamic quant + bsr support (pytorch#821)
This PR, adds in int8 dynamicquant + bsr support. Changes: * Use i8i8 -> bf16 matmul to maintain accuracy * Added a block sparse layout type to AffineQuantizedTensor + check/impl. * Cleaned up benchmark.py script and add a single line `benchmark.sh` file for acceleration numbers * Updated eval.py and added a single line `evaluate.sh` file for accuracy numbers * Lots of lint formatting and README updates * torch.compile now working and is correct
Configuration menu - View commit details
-
Copy full SHA for 85126cc - Browse repository at this point
Copy the full SHA 85126ccView commit details -
fixing some issues with our support for 70/405B models (pytorch#941)
Summary: download and convert scripts needed to be updated alongside model.py config files Test Plan: python generate.py --checkpoint_path ../../../checkpoints/meta-llama/Meta-Llama-3.1-70B/model.pth Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for a5a426e - Browse repository at this point
Copy the full SHA a5a426eView commit details -
Configuration menu - View commit details
-
Copy full SHA for e7270f1 - Browse repository at this point
Copy the full SHA e7270f1View commit details -
Differential Revision: D62711909 Pull Request resolved: pytorch#953
Configuration menu - View commit details
-
Copy full SHA for 352685c - Browse repository at this point
Copy the full SHA 352685cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 168cfe9 - Browse repository at this point
Copy the full SHA 168cfe9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5900c3e - Browse repository at this point
Copy the full SHA 5900c3eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 37e1479 - Browse repository at this point
Copy the full SHA 37e1479View commit details -
better comment on why upcasting
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 2efde49 - Browse repository at this point
Copy the full SHA 2efde49View commit details -
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 8c04f4f - Browse repository at this point
Copy the full SHA 8c04f4fView commit details -
move unit test to test_compile
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 04b229b - Browse repository at this point
Copy the full SHA 04b229bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8b7c2ef - Browse repository at this point
Copy the full SHA 8b7c2efView commit details
Commits on Sep 27, 2024
-
float64 upcasting after allreduce
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 9346afd - Browse repository at this point
Copy the full SHA 9346afdView commit details
Commits on Sep 30, 2024
-
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 3d0da20 - Browse repository at this point
Copy the full SHA 3d0da20View commit details