-
-
Notifications
You must be signed in to change notification settings - Fork 15.5k
Add support for Mistral Large 3 inference with Flashinfer MoE #33174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
487151d
389f2ea
186aecc
4e5484d
026b4a6
463b08d
eb1dd0f
56830ee
a135959
6fa035d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,147 @@ | ||
| { | ||
| "triton_version": "3.4.0", | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will note that these triton versions do seem out of date for modern torch+triton. It should be triton==3.5.1 for what we use on main
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, this was generated a while ago. I'll see if anything changes if I run the benchmark in the current environment.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be ok to leave these as they are? It would take quite a bit of time to regenerate. Also, please keep in mind that the older ones are for the per-tensor FP8 quantization, which is only used in the Eagle draft model. The main model uses blockwise quantization and those configurations are newer (3.5.x). |
||
| "1": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 32, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 64, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "2": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 64, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "4": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "8": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "16": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "24": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 256, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 2 | ||
| }, | ||
| "32": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "48": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 3 | ||
| }, | ||
| "64": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "96": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 4 | ||
| }, | ||
| "128": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 256, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 2 | ||
| }, | ||
| "256": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 256, | ||
| "BLOCK_SIZE_K": 256, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 2 | ||
| }, | ||
| "512": { | ||
| "BLOCK_SIZE_M": 16, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "1024": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 256, | ||
| "BLOCK_SIZE_K": 64, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 5 | ||
| }, | ||
| "1536": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 256, | ||
| "BLOCK_SIZE_K": 64, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 5 | ||
| }, | ||
| "2048": { | ||
| "BLOCK_SIZE_M": 64, | ||
| "BLOCK_SIZE_N": 128, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 4, | ||
| "num_stages": 3 | ||
| }, | ||
| "3072": { | ||
| "BLOCK_SIZE_M": 128, | ||
| "BLOCK_SIZE_N": 256, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 4 | ||
| }, | ||
| "4096": { | ||
| "BLOCK_SIZE_M": 128, | ||
| "BLOCK_SIZE_N": 256, | ||
| "BLOCK_SIZE_K": 128, | ||
| "GROUP_SIZE_M": 1, | ||
| "num_warps": 8, | ||
| "num_stages": 4 | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.