Skip to content

Conversation

@Aya-ZIbra
Copy link
Contributor

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2079

Compile-time static/const mapping utilities for:

  1. constexpr value -> constexpr value
  2. constexpr value -> type

Useful when developing template-heavy cutlass code.

Differential Revision: D85893168

Aya-ZIbra and others added 2 commits October 30, 2025 17:45
Summary:

X-link: facebookresearch/FBGEMM#2078

Changing the QtileSize to 64. I see good improvement  > 20 %..
For correctness this includes changing the TMEM atoms and introducing warp sync for row stats.

Perf:
```
(Batch, SeqLenQ, SeqLenKV, MaxLenKV, HeadQ, HeadKV, HeadD)	cutlass_blackwell_fmha_decode-gbps			Improvment with Qtile = 64
(16, 1, 256, 256, 8, 1, 128)	238.2206209			1.31463193
(16, 1, 512, 512, 8, 1, 128)	410.8838061			1.315872068
(16, 1, 1024, 1024, 8, 1, 128)	660.5696208			1.335567769
(16, 1, 2048, 2048, 8, 1, 128)	916.5460174			1.310093116
(16, 1, 4096, 4096, 8, 1, 128)	1133.690174			1.258896694
(16, 1, 8192, 8192, 8, 1, 128)	1271.341515			1.229311967
(32, 1, 256, 256, 8, 1, 128)	468.9034945			1.295635241
(32, 1, 512, 512, 8, 1, 128)	799.2689835			1.280831124
(32, 1, 1024, 1024, 8, 1, 128)	1285.452285			1.293538886
(32, 1, 2048, 2048, 8, 1, 128)	1797.074701			1.269787171
(32, 1, 4096, 4096, 8, 1, 128)	2210.946865			1.229703361
(32, 1, 8192, 8192, 8, 1, 128)	2498.665399			1.212166122
(64, 1, 256, 256, 8, 1, 128)	893.9747894			1.302172409
(64, 1, 512, 512, 8, 1, 128)	1493.150844			1.274679551
(64, 1, 1024, 1024, 8, 1, 128)	2309.825211			1.220419935
(64, 1, 2048, 2048, 8, 1, 128)	3012.271892			1.159444905
(64, 1, 4096, 4096, 8, 1, 128)	3552.001019			1.089389445
(64, 1, 8192, 8192, 8, 1, 128)	4348.016208			1.131298153
(128, 1, 256, 256, 8, 1, 128)	1549.388365			1.233405251
(128, 1, 512, 512, 8, 1, 128)	2480.52007			1.210676964
(128, 1, 1024, 1024, 8, 1, 128)	3360.125922			1.145674899
(128, 1, 2048, 2048, 8, 1, 128)	4103.461192			1.093136854
(128, 1, 4096, 4096, 8, 1, 128)	4783.429328			1.095583284

```

Reviewed By: jianyuh, v0i0

Differential Revision: D85155388
Summary:
X-link: facebookresearch/FBGEMM#2079

Compile-time static/const mapping utilities for:
1. constexpr value -> constexpr value
2. constexpr value -> type

Useful when developing template-heavy cutlass code.

Differential Revision: D85893168
@netlify
Copy link

netlify bot commented Oct 31, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit cf4d1c2
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/690406d370d39f0007d8c68f
😎 Deploy Preview https://deploy-preview-5073--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Oct 31, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Oct 31, 2025

@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85893168.

Aya-ZIbra pushed a commit to Aya-ZIbra/FBGEMM that referenced this pull request Oct 31, 2025
Summary:

X-link: facebookresearch/FBGEMM#2079

Compile-time static/const mapping utilities for:
1. constexpr value -> constexpr value
2. constexpr value -> type

Useful when developing template-heavy cutlass code.

Reviewed By: jianyuh

Differential Revision: D85893168
@meta-codesync meta-codesync bot closed this in 44d0f95 Oct 31, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Oct 31, 2025

This pull request has been merged in 44d0f95.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants