enable torch.compile for mxfp8_cublas recipe #1841

vkuzo · 2025-03-05T17:58:20Z

Summary:

This PR enables MXLinear with mxfp8_cublas recipe to use
torch.compile.

The current approach is a short term workaround until
pytorch/pytorch#148461 is done. Since we can't
use e8m0 in torchinductor or triton yet, we create a custom op wrapper
around torch._scaled_mm which takes uint8 scales and does the cast to
e8m0 inside the wrapper, where torchinductor can't see it.

Test Plan:

// this now works (although performance is not ideal due to https://github.com/pytorch/ao/issues/1788)
python benchmarks/float8/profile_lowp_training.py ~/local/tmp/20250305_test --mx_recipe_name mxfp8_cublas

// we can also uncomment the hardware check and run the unit test
pytest test/prototype/mx_formats -s -k test_linear_compile

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-03-05T17:58:21Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2025-03-05T17:58:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1841

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit cb824b2 with merge base 4a5ab2d ():

NEW FAILURES - The following jobs have failed:

Run Regression Tests / test (CUDA 2.5.1, linux.g5.12xlarge.nvidia.gpu, torch==2.5.1 --index-url https://download.pytorch... / linux-job (gh)
RuntimeError: Command docker exec -t d82fad38907e1f018983a56d9edcb9c68ad5af5bbf13f047d1c920bc7a91dab7 /exec failed with exit code 1
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch==2.7.0.dev20250122 --index-... / linux-job (gh)
RuntimeError: Command docker exec -t c380c7931a275914ec68fa225674119be2760cfb80f252afebc6923f94c35f42 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: This PR enables `MXLinear` with `mxfp8_cublas` recipe to use torch.compile. The current approach is a short term workaround until pytorch/pytorch#148461 is done. Since we can't use e8m0 in torchinductor or triton yet, we create a custom op wrapper around `torch._scaled_mm` which takes `uint8` scales and does the cast to e8m0 inside the wrapper, where torchinductor can't see it. Test Plan: ``` // this now works (although performance is not ideal due to #1788) python benchmarks/float8/profile_lowp_training.py ~/local/tmp/20250305_test --mx_recipe_name mxfp8_cublas // we can also uncomment the hardware check and run the unit test pytest test/prototype/mx_formats -s -k test_linear_compile ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 033d817549f80d7d0d8cf549f748411cc1f3ac6a ghstack-comment-id: 2701679811 Pull Request resolved: #1841

[ghstack-poisoned]

Summary: This PR enables `MXLinear` with `mxfp8_cublas` recipe to use torch.compile. The current approach is a short term workaround until pytorch/pytorch#147873 is done. Since we can't use e8m0 in torchinductor or triton yet, we create a custom op wrapper around `torch._scaled_mm` which takes `uint8` scales and does the cast to e8m0 inside the wrapper, where torchinductor can't see it. Test Plan: ``` // this now works (although performance is not ideal due to #1788) python benchmarks/float8/profile_lowp_training.py ~/local/tmp/20250305_test --mx_recipe_name mxfp8_cublas // we can also uncomment the hardware check and run the unit test pytest test/prototype/mx_formats -s -k test_linear_compile ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: f3ebd12edcb746b8abf992d00711ce2bdbb7fcf2 ghstack-comment-id: 2701679811 Pull Request resolved: #1841

[ghstack-poisoned]

Summary: This PR enables `MXLinear` with `mxfp8_cublas` recipe to use torch.compile. The current approach is a short term workaround until pytorch/pytorch#147873 is done. Since we can't use e8m0 in torchinductor or triton yet, we create a custom op wrapper around `torch._scaled_mm` which takes `uint8` scales and does the cast to e8m0 inside the wrapper, where torchinductor can't see it. Test Plan: ``` // this now works (although performance is not ideal due to #1788) python benchmarks/float8/profile_lowp_training.py ~/local/tmp/20250305_test --mx_recipe_name mxfp8_cublas // we can also uncomment the hardware check and run the unit test pytest test/prototype/mx_formats -s -k test_linear_compile ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e5687e308db0a54c6083c58cfec5cc49626622f1 ghstack-comment-id: 2701679811 Pull Request resolved: #1841

[ghstack-poisoned]

vkuzo added 17 commits March 4, 2025 12:53

Update

535a51a

[ghstack-poisoned]

Update

600dc5a

[ghstack-poisoned]

Update

cf4d538

[ghstack-poisoned]

Update

1c3f6af

[ghstack-poisoned]

Update

2290767

[ghstack-poisoned]

Update

be3430f

[ghstack-poisoned]

Update

cacff4e

[ghstack-poisoned]

Update

f7b8e37

[ghstack-poisoned]

Update

9c36ad5

[ghstack-poisoned]

Update

04621a1

[ghstack-poisoned]

Update

65a997b

[ghstack-poisoned]

Update

bd70141

[ghstack-poisoned]

Update

f7099cd

[ghstack-poisoned]

Update

d80d0e2

[ghstack-poisoned]

Update

eb89bb2

[ghstack-poisoned]

Update

ee76770

[ghstack-poisoned]

Update

f9cfbaf

[ghstack-poisoned]

vkuzo mentioned this pull request Mar 5, 2025

enable compile with mxfp8 and mxfp4 cutlass gemm #1838

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 5, 2025

vkuzo added the topic: performance Use this tag if this PR improves the performance of a feature label Mar 5, 2025

Update

4a3bb73

[ghstack-poisoned]

vkuzo requested review from drisspg and danielvegamyhre March 5, 2025 18:06

vkuzo added 2 commits March 5, 2025 10:23

Update

8f82ae7

[ghstack-poisoned]

Update

cc0ae0e

[ghstack-poisoned]

vkuzo mentioned this pull request Mar 5, 2025

enable mxfp8_cublas recipe in roofline script #1843

Open

vkuzo added 4 commits March 5, 2025 10:39

Update

ef4975f

[ghstack-poisoned]

Update

e2383bb

[ghstack-poisoned]

Update

55df7fc

[ghstack-poisoned]

Update

cb824b2

[ghstack-poisoned]

vkuzo changed the base branch from gh/vkuzo/52/head to main March 5, 2025 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable torch.compile for mxfp8_cublas recipe #1841

enable torch.compile for mxfp8_cublas recipe #1841

vkuzo commented Mar 5, 2025

vkuzo commented Mar 5, 2025 •

edited

Loading

pytorch-bot bot commented Mar 5, 2025 •

edited

Loading

enable torch.compile for mxfp8_cublas recipe #1841

Are you sure you want to change the base?

enable torch.compile for mxfp8_cublas recipe #1841

Conversation

vkuzo commented Mar 5, 2025

vkuzo commented Mar 5, 2025 • edited Loading

pytorch-bot bot commented Mar 5, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1841

❌ 2 New Failures

vkuzo commented Mar 5, 2025 •

edited

Loading

pytorch-bot bot commented Mar 5, 2025 •

edited

Loading