-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable float8 CI on sm89 #587
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/587
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 01f0ba1 with merge base 013cce3 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
So question here
EDIT: Ok that's a lot of test failures, mind opening up a tracker with them so we can assign them to right owners
|
I noticed the fp8 test is not being triggered here so instead what I did was
So you can go ahead and make the updates here from this PR #603 I won't land 603 For Github Actions because you can't iterate fast because you need to wait on CI, would recommend copy pasting as much so you minimize surface area of missing tpos as possible or running tiny experiments where you in your branch temporarily delete all jobs except the one you're interested in iterating on. It makes iterating far more pleasant Also regarding pricing if we compare g5.12x prices vs g6.4x prices https://aws.amazon.com/ec2/instance-types/g6/ and https://aws.amazon.com/ec2/instance-types/g5/
So you could reduce our CI costs by about 4x just changing the default machine type but we can do this in a future PR |
26371be
to
0d948fb
Compare
The current CI/CD pipeline skipped float8 tests as they weren't compatible with the A10G GPUs. This PR adds a new CI job which runs on NVIDIA L4 Tensor Core GPUs for float8 tests.
Fixes Issue: #575
Test Plan : Compare the logs of Regression Test and Float8 Test to check for the skipped tests in Regression Test which are now being run on the new CI job of Float8 Test.
Sample of some test cases which were being skipped in CUDA Nightly in Regression Tests are running in Float8.
Screenshot of logs of CUDA Nightly
Screenshot of logs of Float8 test
For full list of passed tests: https://gist.github.com/jainapurva/a5ddd17c809219151485de8d1708d078