benchmarks: add MFU% column to benchmark output by Johnsonms · Pull Request #2377 · Dao-AILab/flash-attention

Johnsonms · 2026-03-20T20:52:41Z

Adds get_peak_flops() for known NVIDIA GPUs (B300, B200, H200, H100, A100, etc.) and shows ms/TFLOPS/MFU% per cell in the benchmark table.
H100 Before

H100 After:

B200 Before:

B200 After:

B300 Before:

B300 Afer:

- Add MFU% column to benchmark output - Add dtype parameter to get_peak_flops to correctly scale peak FLOPS for FP8 (2x), FP32 (0.5x), and FP16/BF16 (1x, identical throughput) - Fix H200 (989 TFLOPS) and H20 (148 TFLOPS) values - Add H100 NVL (835 TFLOPS), L40S (362 TFLOPS), B300 (3.5 PFLOPS), GB200/GB300 (2.5 PFLOPS) entries - Add source URLs and sparsity notes from NVIDIA datasheets

Johnsonms force-pushed the add-mfu-benchmark branch from a7f1b9d to 6b41e16 Compare March 21, 2026 04:33

Johnsonms marked this pull request as ready for review March 21, 2026 05:17

tridao approved these changes Mar 22, 2026

View reviewed changes

tridao merged commit 3cafddf into Dao-AILab:main Mar 22, 2026

Johnsonms deleted the add-mfu-benchmark branch March 22, 2026 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks: add MFU% column to benchmark output#2377

benchmarks: add MFU% column to benchmark output#2377
tridao merged 1 commit intoDao-AILab:mainfrom
Johnsonms:add-mfu-benchmark

Johnsonms commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Johnsonms commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Johnsonms commented Mar 20, 2026 •

edited

Loading