Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
float8: remove unneeded kernel for scale generation (#616)
Summary: The code to create a float8 scale is unnecessarily creating an extra GPU kernel launch by calling `torch.empty`, removing this. There is no performance impact, but it does make things easier to debug by reducing log size / making GPU traces simpler. Test Plan: ``` // extract trace of a linear fwd+bwd with python benchmarks/float8/profile_linear_float8.py ~/local/tmp/test // verify that the GPU kernel creating an empty scale tensor is no longer there // unit tests pass ./test/float8/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags:
- Loading branch information