-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
gfx1030 and gfx1035 benchmark results
- pt231 with old aotriton - pt240 with old aotriton Signed-off-by: Mika Laitio <[email protected]>
- Loading branch information
Showing
12 changed files
with
242 additions
and
0 deletions.
There are no files selected for viewing
9 changes: 9 additions & 0 deletions
9
benchmarks/results/rocm_sdk_612/pytorch_231/gfx1030/20240807_192355_cpu_vs_gpu_simple.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Benchmarking CPU and GPUs | ||
Pytorch version: 2.3.1 | ||
ROCM HIP version: 6.1.40093-6a0232ced | ||
Device: cpu-16 | ||
'CPU time: 31.879 sec | ||
Device: AMD Radeon RX 6800 | ||
'GPU time: 0.336 sec | ||
Benchmark ready | ||
|
51 changes: 51 additions & 0 deletions
51
benchmarks/results/rocm_sdk_612/pytorch_231/gfx1030/20240807_192355_pytorch_dot_products.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
Pytorch version: 2.3.1 | ||
dot product calculation test | ||
tensor([[[ 0.9455, -0.6972, 0.6711, -0.6345, 0.4092, -0.7703, 0.0519, | ||
0.0941], | ||
[ 0.9826, -0.8131, 0.9687, 0.2898, 0.4251, -0.4728, -0.3721, | ||
0.3235], | ||
[ 0.9581, -0.4088, 0.8697, 1.8528, 0.7221, 0.2384, -0.9122, | ||
0.6484]], | ||
|
||
[[-0.3769, 1.5725, -0.1406, 1.0675, 0.4376, -0.1497, 0.1996, | ||
-0.6237], | ||
[ 0.0979, 1.1114, 0.4869, 0.9094, 0.1796, -0.7216, -0.2395, | ||
0.1582], | ||
[ 0.1717, 1.1014, 0.3889, 1.0819, 0.0485, -0.6566, 0.5223, | ||
-0.4733]]], device='cuda:0') | ||
|
||
Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends | ||
Device: AMD Radeon RX 6800 | ||
Default cuda:0 benchmark: | ||
25619.943 microseconds, 0.025619942622142844 sec | ||
Math cuda:0 benchmark: | ||
26119.020 microseconds, 0.026119020103942604 sec | ||
Flash Attention cuda:0 benchmark: | ||
Flash Attention cuda:0 is not supported. See warnings for reasons. | ||
Memory Efficient cuda:0 benchmark: | ||
Memory Efficient cuda:0 is not supported. See warnings for reasons. | ||
Device: cpu-16 | ||
Default cpu benchmark: | ||
29142019.198 microseconds, 29.142019197985064 sec | ||
Math cpu benchmark: | ||
32081706.194 microseconds, 32.08170619397424 sec | ||
Flash Attention cpu benchmark: | ||
28997640.633 microseconds, 28.997640632966068 sec | ||
Memory Efficient cpu benchmark: | ||
Memory Efficient cpu is not supported. See warnings for reasons. | ||
Summary | ||
|
||
Pytorch version: 2.3.1 | ||
ROCM HIP version: 6.1.40093-6a0232ced | ||
Device: AMD Radeon RX 6800 | ||
Default cuda:0: 25619.943 ms | ||
Math cuda:0: 26119.020 ms | ||
Flash Attention cuda:0: -1.000 ms | ||
Memory Efficient cuda:0: -1.000 ms | ||
|
||
Device: cpu-16 | ||
Default cpu: 29142019.198 ms | ||
Math cpu: 32081706.194 ms | ||
Flash Attention cpu: 28997640.633 ms | ||
Memory Efficient cpu: -1.000 ms | ||
|
9 changes: 9 additions & 0 deletions
9
benchmarks/results/rocm_sdk_612/pytorch_231/gfx1035/20240807_202419_cpu_vs_gpu_simple.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Benchmarking CPU and GPUs | ||
Pytorch version: 2.3.1 | ||
ROCM HIP version: 6.1.40093-e0d934acc | ||
Device: cpu-16 | ||
'CPU time: 30.518 sec | ||
Device: AMD Radeon Graphics | ||
'GPU time: 0.433 sec | ||
Benchmark ready | ||
|
51 changes: 51 additions & 0 deletions
51
benchmarks/results/rocm_sdk_612/pytorch_231/gfx1035/20240807_202419_pytorch_dot_products.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
Pytorch version: 2.3.1 | ||
dot product calculation test | ||
tensor([[[ 0.0689, 0.2939, 0.7003, 0.1261, 1.0665, 1.0418, 0.4118, | ||
0.1683], | ||
[-0.5260, 1.1823, 2.0751, 0.5867, 1.0490, 0.8184, -0.4882, | ||
0.5669], | ||
[-0.2747, 0.7315, 1.5438, 0.5053, 0.9644, 0.6745, -0.0424, | ||
0.4179]], | ||
|
||
[[-0.0273, 0.0843, 1.3197, -0.2101, 0.2967, -0.0361, 1.1482, | ||
-1.6203], | ||
[-0.1350, 0.4006, 1.6299, -0.0231, -1.1513, -0.6066, 0.9720, | ||
-0.6136], | ||
[-0.1251, 0.3714, 1.5972, -0.0524, -1.0149, -0.5489, 0.9840, | ||
-0.6991]]], device='cuda:0') | ||
|
||
Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends | ||
Device: AMD Radeon Graphics | ||
Default cuda:0 benchmark: | ||
220142.783 microseconds, 0.22014278299957368 sec | ||
Math cuda:0 benchmark: | ||
221367.763 microseconds, 0.22136776300021666 sec | ||
Flash Attention cuda:0 benchmark: | ||
Flash Attention cuda:0 is not supported. See warnings for reasons. | ||
Memory Efficient cuda:0 benchmark: | ||
Memory Efficient cuda:0 is not supported. See warnings for reasons. | ||
Device: cpu-16 | ||
Default cpu benchmark: | ||
29291631.042 microseconds, 29.291631042000063 sec | ||
Math cpu benchmark: | ||
33746072.789 microseconds, 33.74607278900021 sec | ||
Flash Attention cpu benchmark: | ||
29435830.007 microseconds, 29.435830006999822 sec | ||
Memory Efficient cpu benchmark: | ||
Memory Efficient cpu is not supported. See warnings for reasons. | ||
Summary | ||
|
||
Pytorch version: 2.3.1 | ||
ROCM HIP version: 6.1.40093-e0d934acc | ||
Device: AMD Radeon Graphics | ||
Default cuda:0: 220142.783 ms | ||
Math cuda:0: 221367.763 ms | ||
Flash Attention cuda:0: -1.000 ms | ||
Memory Efficient cuda:0: -1.000 ms | ||
|
||
Device: cpu-16 | ||
Default cpu: 29291631.042 ms | ||
Math cpu: 33746072.789 ms | ||
Flash Attention cpu: 29435830.007 ms | ||
Memory Efficient cpu: -1.000 ms | ||
|
9 changes: 9 additions & 0 deletions
9
benchmarks/results/rocm_sdk_612/pytorch_241/gfx1035/20240807_194210_cpu_vs_gpu_simple.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Benchmarking CPU and GPUs | ||
Pytorch version: 2.4.1-rc1 | ||
ROCM HIP version: 6.1.40093-e0d934acc | ||
Device: cpu-16 | ||
'CPU time: 26.592 sec | ||
Device: AMD Radeon Graphics | ||
'GPU time: 0.704 sec | ||
Benchmark ready | ||
|
51 changes: 51 additions & 0 deletions
51
benchmarks/results/rocm_sdk_612/pytorch_241/gfx1035/20240807_194210_pytorch_dot_products.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
Pytorch version: 2.4.1-rc1 | ||
dot product calculation test | ||
tensor([[[ 1.0907, 0.9275, -0.5733, 1.0613, 0.8276, 0.3546, -0.1415, | ||
0.5558], | ||
[-0.1007, 1.3336, 1.4849, 0.4857, -0.4688, -1.1529, -1.4936, | ||
-0.2379], | ||
[ 0.3383, 1.1955, 0.4283, 0.9084, 0.0747, -0.4756, -0.9940, | ||
0.2775]], | ||
|
||
[[ 1.3691, -0.0811, 0.9837, 0.0884, 0.1267, -0.6009, 1.4974, | ||
1.0018], | ||
[ 2.2687, -1.3112, 1.4178, -0.9110, -0.6768, 0.7281, 0.2645, | ||
0.0655], | ||
[ 1.4861, -0.5155, 1.0940, -0.1645, -0.1212, -0.2863, 1.0591, | ||
0.6412]]], device='cuda:0') | ||
|
||
Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends | ||
Device: AMD Radeon Graphics | ||
Default cuda:0 benchmark: | ||
221673.677 microseconds, 0.2216736770000125 sec | ||
Math cuda:0 benchmark: | ||
219790.358 microseconds, 0.21979035800001157 sec | ||
Flash Attention cuda:0 benchmark: | ||
Flash Attention cuda:0 is not supported. See warnings for reasons. | ||
Memory Efficient cuda:0 benchmark: | ||
Memory Efficient cuda:0 is not supported. See warnings for reasons. | ||
Device: cpu-16 | ||
Default cpu benchmark: | ||
29583910.213 microseconds, 29.583910212999996 sec | ||
Math cpu benchmark: | ||
33457111.382 microseconds, 33.45711138200002 sec | ||
Flash Attention cpu benchmark: | ||
29748300.851 microseconds, 29.74830085100001 sec | ||
Memory Efficient cpu benchmark: | ||
Memory Efficient cpu is not supported. See warnings for reasons. | ||
Summary | ||
|
||
Pytorch version: 2.4.1-rc1 | ||
ROCM HIP version: 6.1.40093-e0d934acc | ||
Device: AMD Radeon Graphics | ||
Default cuda:0: 221673.677 ms | ||
Math cuda:0: 219790.358 ms | ||
Flash Attention cuda:0: -1.000 ms | ||
Memory Efficient cuda:0: -1.000 ms | ||
|
||
Device: cpu-16 | ||
Default cpu: 29583910.213 ms | ||
Math cpu: 33457111.382 ms | ||
Flash Attention cpu: 29748300.851 ms | ||
Memory Efficient cpu: -1.000 ms | ||
|
9 changes: 9 additions & 0 deletions
9
benchmarks/results/rocm_sdk_612/pytorch_241/gfx103x/20240807_190756_cpu_vs_gpu_simple.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Benchmarking CPU and GPUs | ||
Pytorch version: 2.4.1-rc1 | ||
ROCM HIP version: 6.1.40093-2b15d6049 | ||
Device: cpu-16 | ||
'CPU time: 38.230 sec | ||
Device: AMD Radeon RX 6800 | ||
'GPU time: 0.347 sec | ||
Benchmark ready | ||
|
51 changes: 51 additions & 0 deletions
51
benchmarks/results/rocm_sdk_612/pytorch_241/gfx103x/20240807_190756_pytorch_dot_products.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
Pytorch version: 2.4.1-rc1 | ||
dot product calculation test | ||
tensor([[[ 0.0377, -0.4681, -0.5734, 1.1364, -0.2186, -0.7082, 0.5863, | ||
-0.2069], | ||
[-0.3001, -0.7798, -0.3014, 1.4297, -0.1664, -1.0023, 0.7803, | ||
-0.3309], | ||
[ 0.7447, -0.7807, 0.2690, 1.2052, 0.2125, -0.5166, 0.4624, | ||
0.2819]], | ||
|
||
[[ 0.7447, -0.0688, 0.0293, -0.5658, -0.2677, 0.1797, -0.5611, | ||
0.6280], | ||
[ 1.9031, -0.0560, -0.3467, 0.3541, 0.0653, -1.1427, 0.3638, | ||
1.4265], | ||
[ 2.2887, -0.1055, -0.4986, 0.6864, 0.1769, -1.6577, 0.6076, | ||
1.8052]]], device='cuda:0') | ||
|
||
Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends | ||
Device: AMD Radeon RX 6800 | ||
Default cuda:0 benchmark: | ||
25666.238 microseconds, 0.02566623799066292 sec | ||
Math cuda:0 benchmark: | ||
26148.054 microseconds, 0.026148054201621564 sec | ||
Flash Attention cuda:0 benchmark: | ||
Flash Attention cuda:0 is not supported. See warnings for reasons. | ||
Memory Efficient cuda:0 benchmark: | ||
Memory Efficient cuda:0 is not supported. See warnings for reasons. | ||
Device: cpu-16 | ||
Default cpu benchmark: | ||
29144372.636 microseconds, 29.144372635986656 sec | ||
Math cpu benchmark: | ||
32452335.613 microseconds, 32.452335612964816 sec | ||
Flash Attention cpu benchmark: | ||
28638718.318 microseconds, 28.638718318019528 sec | ||
Memory Efficient cpu benchmark: | ||
Memory Efficient cpu is not supported. See warnings for reasons. | ||
Summary | ||
|
||
Pytorch version: 2.4.1-rc1 | ||
ROCM HIP version: 6.1.40093-2b15d6049 | ||
Device: AMD Radeon RX 6800 | ||
Default cuda:0: 25666.238 ms | ||
Math cuda:0: 26148.054 ms | ||
Flash Attention cuda:0: -1.000 ms | ||
Memory Efficient cuda:0: -1.000 ms | ||
|
||
Device: cpu-16 | ||
Default cpu: 29144372.636 ms | ||
Math cpu: 32452335.613 ms | ||
Flash Attention cpu: 28638718.318 ms | ||
Memory Efficient cpu: -1.000 ms | ||
|
2 changes: 2 additions & 0 deletions
2
benchmarks/results/rocm_sdk_612/pytorch_241/gfx103x/notes.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
gfx1030 results with hacked pytorch from 20240805 | ||
- older aotriton version than the one which has gfx110x tuning fix |
File renamed without changes.
File renamed without changes.
File renamed without changes.