Skip to content

Conversation

@Aya-ZIbra
Copy link
Contributor

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2080

This diff generalizes the work in (D85155388) based on Gefei's diff D85631781 .

Compared to D85631781, we avoid registers warp shuffling by using 32b TMEM atoms.

This diff supports:

  1. Different dtypes (fp8, bf16)
  2. Different mtiles (128, 64)

Reviewed By: v0i0

Differential Revision: D85893883

@netlify
Copy link

netlify bot commented Oct 31, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 96eead8
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/690550ff8a2a35000835ed86
😎 Deploy Preview https://deploy-preview-5075--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Oct 31, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Oct 31, 2025

@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85893883.

Summary:

X-link: facebookresearch/FBGEMM#2080

This diff generalizes the work in (D85155388) based on Gefei's diff D85631781 .

Compared to D85631781, we avoid registers warp shuffling by using 32b TMEM atoms.

This diff supports:
1. Different dtypes (fp8, bf16)
2. Different mtiles (128, 64)

Reviewed By: v0i0

Differential Revision: D85893883
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 1, 2025

This pull request has been merged in ecf2ac9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants