Skip to content

Support head dim 256#329

Closed
woct0rdho wants to merge 2 commits into
thu-ml:mainfrom
woct0rdho:head_dim_256
Closed

Support head dim 256#329
woct0rdho wants to merge 2 commits into
thu-ml:mainfrom
woct0rdho:head_dim_256

Conversation

@woct0rdho
Copy link
Copy Markdown

@woct0rdho woct0rdho commented Dec 13, 2025

Head dim 256 is needed for models like AuraFlow. Recently new models based on AuraFlow are emerging and I'd like to help it get adopted.

In transpose_pad_permute_cuda, we need to reduce CTA_SIZE on device to 32 so the assert CTA_SIZE * HEAD_DIM <= 8192 is satisfied.

I've tested on RTX 3090 (sm86) and RTX 4090 (sm89). Currently sm90 is not supported due to constraint of WGMMA size. Let me still put it here in case anyone needs it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants