Skip to content

[Release-2.2.x] Use max clock for analytical calculations of peak flops#2870

Merged
malfet merged 1 commit intotriton-lang:release/2.2.xfrom
bertmaher:release/2.2.x
Jan 3, 2024
Merged

[Release-2.2.x] Use max clock for analytical calculations of peak flops#2870
malfet merged 1 commit intotriton-lang:release/2.2.xfrom
bertmaher:release/2.2.x

Conversation

@bertmaher
Copy link
Copy Markdown
Collaborator

By reading the current clock, our analytical calculations can vary while we're evaluating different configs. It turns out the choice of config is very sensitive to the clock, such that a slight throttling can make us reject very good configs, in favor of very bad ones.

A reproducer can be found here:
https://gist.github.com/bertmaher/8ff5e9631666846fff55d81326cacb4d

$ python thermal_throttle.py
chosen config BLOCK_M: 128, BLOCK_N: 256, BLOCK_K: 32, SPLIT_K: 1, num_warps: 8, num_ctas: 1, num_stages: 3, enable_warp_specialization: False, enable_persistent: False
tflops/s: 107.92460196062149

$ python thermal_throttle.py --preheat
chosen config BLOCK_M: 32, BLOCK_N: 32, BLOCK_K: 32, SPLIT_K: 1, num_warps: 2, num_ctas: 1, num_stages: 6, enable_warp_specialization: False, enable_persistent: False
tflops/s: 39.29629633970286

…2801)

By reading the current clock, our analytical calculations can vary while
we're evaluating different configs. It turns out the choice of config is
very sensitive to the clock, such that a slight throttling can make us
reject very good configs, in favor of very bad ones.

A reproducer can be found here:
https://gist.github.com/bertmaher/8ff5e9631666846fff55d81326cacb4d

```
$ python thermal_throttle.py
chosen config BLOCK_M: 128, BLOCK_N: 256, BLOCK_K: 32, SPLIT_K: 1, num_warps: 8, num_ctas: 1, num_stages: 3, enable_warp_specialization: False, enable_persistent: False
tflops/s: 107.92460196062149

$ python thermal_throttle.py --preheat
chosen config BLOCK_M: 32, BLOCK_N: 32, BLOCK_K: 32, SPLIT_K: 1, num_warps: 2, num_ctas: 1, num_stages: 6, enable_warp_specialization: False, enable_persistent: False
tflops/s: 39.29629633970286
```
@bertmaher bertmaher requested a review from ptillet as a code owner January 3, 2024 21:09
@malfet malfet changed the title Use max clock for analytical calculations of peak flops (#2801) [Release-2.2.x] Use max clock for analytical calculations of peak flops Jan 3, 2024
@malfet malfet merged commit 54bba97 into triton-lang:release/2.2.x Jan 3, 2024
pingzhuu pushed a commit to siliconflow/triton that referenced this pull request Apr 2, 2024
…ps (triton-lang#2870)

By reading the current clock, our analytical calculations can vary while
we're evaluating different configs. It turns out the choice of config is
very sensitive to the clock, such that a slight throttling can make us
reject very good configs, in favor of very bad ones.

A reproducer can be found here:
https://gist.github.com/bertmaher/8ff5e9631666846fff55d81326cacb4d

```
$ python thermal_throttle.py
chosen config BLOCK_M: 128, BLOCK_N: 256, BLOCK_K: 32, SPLIT_K: 1, num_warps: 8, num_ctas: 1, num_stages: 3, enable_warp_specialization: False, enable_persistent: False
tflops/s: 107.92460196062149

$ python thermal_throttle.py --preheat
chosen config BLOCK_M: 32, BLOCK_N: 32, BLOCK_K: 32, SPLIT_K: 1, num_warps: 2, num_ctas: 1, num_stages: 6, enable_warp_specialization: False, enable_persistent: False
tflops/s: 39.29629633970286
```

Cherry-pick of  triton-lang#2801 into release/2.2.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants