Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
float8: remove unneeded kernel for scale generation
Summary: The code to create a float8 scale is unnecessarily creating an extra GPU kernel launch by calling `torch.empty`, removing this. Test Plan: ``` // extract trace of a linear fwd+bwd with python benchmarks/float8/profile_linear_float8.py ~/local/tmp/test // verify that the GPU kernel creating an empty scale tensor is no longer there // unit tests pass ./test/float8/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 3dd0c3bbf08e9e03321599b68334cf6dcc88f77b ghstack-comment-id: 2272205849 Pull Request resolved: #616
- Loading branch information