[nvidia backend] Replace cvt instructions with bitwise operations in s8->bf16 conversions #4563

chsigg · 2024-08-23T11:07:54Z

Hopper has very low throughput of conversion instructions that cause this operations to quickly become an ALU bottleneck. Restating it in terms of bitwise ops and SIMD bf16 instructions increases the throughput significantly and translates to meaningful speedups (e.g. 10% end-to-end on one matmul I was looking at).

ThomasRaoux

LGTM

…s8->bf16 conversions (triton-lang#4563) Hopper has very low throughput of conversion instructions that cause this operations to quickly become an ALU bottleneck. Restating it in terms of bitwise ops and SIMD bf16 instructions increases the throughput significantly and translates to meaningful speedups (e.g. 10% end-to-end on one matmul I was looking at). Co-authored-by: Adam Paszke <[email protected]>

chsigg requested a review from ptillet as a code owner August 23, 2024 11:07

chsigg force-pushed the export_cl666252733 branch from 56f40ca to f5fb50f Compare August 26, 2024 10:44

chsigg requested a review from ThomasRaoux August 26, 2024 11:02

chsigg mentioned this pull request Aug 26, 2024

Replace cvt instructions with bitwise operations in s8->bf16 conversions openxla/triton#9

Closed

ThomasRaoux approved these changes Aug 26, 2024

View reviewed changes

chsigg merged commit 241e89c into triton-lang:main Aug 28, 2024

jlebar mentioned this pull request Sep 3, 2024

Build LLVMAarch64CodeGen if CMAKE_OSX_ARCHITECTURES is arm64. #4637

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[nvidia backend] Replace cvt instructions with bitwise operations in s8->bf16 conversions #4563

[nvidia backend] Replace cvt instructions with bitwise operations in s8->bf16 conversions #4563

Uh oh!

chsigg commented Aug 23, 2024

Uh oh!

ThomasRaoux left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[nvidia backend] Replace cvt instructions with bitwise operations in s8->bf16 conversions #4563

[nvidia backend] Replace cvt instructions with bitwise operations in s8->bf16 conversions #4563

Uh oh!

Conversation

chsigg commented Aug 23, 2024

Uh oh!

ThomasRaoux left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants