Skip to content

Commit

Permalink
gfx1030 and gfx1035 benchmark results
Browse files Browse the repository at this point in the history
- pt231 with old aotriton
- pt240 with old aotriton

Signed-off-by: Mika Laitio <[email protected]>
  • Loading branch information
lamikr committed Aug 8, 2024
1 parent 380e018 commit 319b639
Show file tree
Hide file tree
Showing 12 changed files with 242 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Benchmarking CPU and GPUs
Pytorch version: 2.3.1
ROCM HIP version: 6.1.40093-6a0232ced
Device: cpu-16
'CPU time: 31.879 sec
Device: AMD Radeon RX 6800
'GPU time: 0.336 sec
Benchmark ready

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Pytorch version: 2.3.1
dot product calculation test
tensor([[[ 0.9455, -0.6972, 0.6711, -0.6345, 0.4092, -0.7703, 0.0519,
0.0941],
[ 0.9826, -0.8131, 0.9687, 0.2898, 0.4251, -0.4728, -0.3721,
0.3235],
[ 0.9581, -0.4088, 0.8697, 1.8528, 0.7221, 0.2384, -0.9122,
0.6484]],

[[-0.3769, 1.5725, -0.1406, 1.0675, 0.4376, -0.1497, 0.1996,
-0.6237],
[ 0.0979, 1.1114, 0.4869, 0.9094, 0.1796, -0.7216, -0.2395,
0.1582],
[ 0.1717, 1.1014, 0.3889, 1.0819, 0.0485, -0.6566, 0.5223,
-0.4733]]], device='cuda:0')

Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
Device: AMD Radeon RX 6800
Default cuda:0 benchmark:
25619.943 microseconds, 0.025619942622142844 sec
Math cuda:0 benchmark:
26119.020 microseconds, 0.026119020103942604 sec
Flash Attention cuda:0 benchmark:
Flash Attention cuda:0 is not supported. See warnings for reasons.
Memory Efficient cuda:0 benchmark:
Memory Efficient cuda:0 is not supported. See warnings for reasons.
Device: cpu-16
Default cpu benchmark:
29142019.198 microseconds, 29.142019197985064 sec
Math cpu benchmark:
32081706.194 microseconds, 32.08170619397424 sec
Flash Attention cpu benchmark:
28997640.633 microseconds, 28.997640632966068 sec
Memory Efficient cpu benchmark:
Memory Efficient cpu is not supported. See warnings for reasons.
Summary

Pytorch version: 2.3.1
ROCM HIP version: 6.1.40093-6a0232ced
Device: AMD Radeon RX 6800
Default cuda:0: 25619.943 ms
Math cuda:0: 26119.020 ms
Flash Attention cuda:0: -1.000 ms
Memory Efficient cuda:0: -1.000 ms

Device: cpu-16
Default cpu: 29142019.198 ms
Math cpu: 32081706.194 ms
Flash Attention cpu: 28997640.633 ms
Memory Efficient cpu: -1.000 ms

Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Benchmarking CPU and GPUs
Pytorch version: 2.3.1
ROCM HIP version: 6.1.40093-e0d934acc
Device: cpu-16
'CPU time: 30.518 sec
Device: AMD Radeon Graphics
'GPU time: 0.433 sec
Benchmark ready

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Pytorch version: 2.3.1
dot product calculation test
tensor([[[ 0.0689, 0.2939, 0.7003, 0.1261, 1.0665, 1.0418, 0.4118,
0.1683],
[-0.5260, 1.1823, 2.0751, 0.5867, 1.0490, 0.8184, -0.4882,
0.5669],
[-0.2747, 0.7315, 1.5438, 0.5053, 0.9644, 0.6745, -0.0424,
0.4179]],

[[-0.0273, 0.0843, 1.3197, -0.2101, 0.2967, -0.0361, 1.1482,
-1.6203],
[-0.1350, 0.4006, 1.6299, -0.0231, -1.1513, -0.6066, 0.9720,
-0.6136],
[-0.1251, 0.3714, 1.5972, -0.0524, -1.0149, -0.5489, 0.9840,
-0.6991]]], device='cuda:0')

Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
Device: AMD Radeon Graphics
Default cuda:0 benchmark:
220142.783 microseconds, 0.22014278299957368 sec
Math cuda:0 benchmark:
221367.763 microseconds, 0.22136776300021666 sec
Flash Attention cuda:0 benchmark:
Flash Attention cuda:0 is not supported. See warnings for reasons.
Memory Efficient cuda:0 benchmark:
Memory Efficient cuda:0 is not supported. See warnings for reasons.
Device: cpu-16
Default cpu benchmark:
29291631.042 microseconds, 29.291631042000063 sec
Math cpu benchmark:
33746072.789 microseconds, 33.74607278900021 sec
Flash Attention cpu benchmark:
29435830.007 microseconds, 29.435830006999822 sec
Memory Efficient cpu benchmark:
Memory Efficient cpu is not supported. See warnings for reasons.
Summary

Pytorch version: 2.3.1
ROCM HIP version: 6.1.40093-e0d934acc
Device: AMD Radeon Graphics
Default cuda:0: 220142.783 ms
Math cuda:0: 221367.763 ms
Flash Attention cuda:0: -1.000 ms
Memory Efficient cuda:0: -1.000 ms

Device: cpu-16
Default cpu: 29291631.042 ms
Math cpu: 33746072.789 ms
Flash Attention cpu: 29435830.007 ms
Memory Efficient cpu: -1.000 ms

Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Benchmarking CPU and GPUs
Pytorch version: 2.4.1-rc1
ROCM HIP version: 6.1.40093-e0d934acc
Device: cpu-16
'CPU time: 26.592 sec
Device: AMD Radeon Graphics
'GPU time: 0.704 sec
Benchmark ready

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Pytorch version: 2.4.1-rc1
dot product calculation test
tensor([[[ 1.0907, 0.9275, -0.5733, 1.0613, 0.8276, 0.3546, -0.1415,
0.5558],
[-0.1007, 1.3336, 1.4849, 0.4857, -0.4688, -1.1529, -1.4936,
-0.2379],
[ 0.3383, 1.1955, 0.4283, 0.9084, 0.0747, -0.4756, -0.9940,
0.2775]],

[[ 1.3691, -0.0811, 0.9837, 0.0884, 0.1267, -0.6009, 1.4974,
1.0018],
[ 2.2687, -1.3112, 1.4178, -0.9110, -0.6768, 0.7281, 0.2645,
0.0655],
[ 1.4861, -0.5155, 1.0940, -0.1645, -0.1212, -0.2863, 1.0591,
0.6412]]], device='cuda:0')

Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
Device: AMD Radeon Graphics
Default cuda:0 benchmark:
221673.677 microseconds, 0.2216736770000125 sec
Math cuda:0 benchmark:
219790.358 microseconds, 0.21979035800001157 sec
Flash Attention cuda:0 benchmark:
Flash Attention cuda:0 is not supported. See warnings for reasons.
Memory Efficient cuda:0 benchmark:
Memory Efficient cuda:0 is not supported. See warnings for reasons.
Device: cpu-16
Default cpu benchmark:
29583910.213 microseconds, 29.583910212999996 sec
Math cpu benchmark:
33457111.382 microseconds, 33.45711138200002 sec
Flash Attention cpu benchmark:
29748300.851 microseconds, 29.74830085100001 sec
Memory Efficient cpu benchmark:
Memory Efficient cpu is not supported. See warnings for reasons.
Summary

Pytorch version: 2.4.1-rc1
ROCM HIP version: 6.1.40093-e0d934acc
Device: AMD Radeon Graphics
Default cuda:0: 221673.677 ms
Math cuda:0: 219790.358 ms
Flash Attention cuda:0: -1.000 ms
Memory Efficient cuda:0: -1.000 ms

Device: cpu-16
Default cpu: 29583910.213 ms
Math cpu: 33457111.382 ms
Flash Attention cpu: 29748300.851 ms
Memory Efficient cpu: -1.000 ms

Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Benchmarking CPU and GPUs
Pytorch version: 2.4.1-rc1
ROCM HIP version: 6.1.40093-2b15d6049
Device: cpu-16
'CPU time: 38.230 sec
Device: AMD Radeon RX 6800
'GPU time: 0.347 sec
Benchmark ready

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Pytorch version: 2.4.1-rc1
dot product calculation test
tensor([[[ 0.0377, -0.4681, -0.5734, 1.1364, -0.2186, -0.7082, 0.5863,
-0.2069],
[-0.3001, -0.7798, -0.3014, 1.4297, -0.1664, -1.0023, 0.7803,
-0.3309],
[ 0.7447, -0.7807, 0.2690, 1.2052, 0.2125, -0.5166, 0.4624,
0.2819]],

[[ 0.7447, -0.0688, 0.0293, -0.5658, -0.2677, 0.1797, -0.5611,
0.6280],
[ 1.9031, -0.0560, -0.3467, 0.3541, 0.0653, -1.1427, 0.3638,
1.4265],
[ 2.2887, -0.1055, -0.4986, 0.6864, 0.1769, -1.6577, 0.6076,
1.8052]]], device='cuda:0')

Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
Device: AMD Radeon RX 6800
Default cuda:0 benchmark:
25666.238 microseconds, 0.02566623799066292 sec
Math cuda:0 benchmark:
26148.054 microseconds, 0.026148054201621564 sec
Flash Attention cuda:0 benchmark:
Flash Attention cuda:0 is not supported. See warnings for reasons.
Memory Efficient cuda:0 benchmark:
Memory Efficient cuda:0 is not supported. See warnings for reasons.
Device: cpu-16
Default cpu benchmark:
29144372.636 microseconds, 29.144372635986656 sec
Math cpu benchmark:
32452335.613 microseconds, 32.452335612964816 sec
Flash Attention cpu benchmark:
28638718.318 microseconds, 28.638718318019528 sec
Memory Efficient cpu benchmark:
Memory Efficient cpu is not supported. See warnings for reasons.
Summary

Pytorch version: 2.4.1-rc1
ROCM HIP version: 6.1.40093-2b15d6049
Device: AMD Radeon RX 6800
Default cuda:0: 25666.238 ms
Math cuda:0: 26148.054 ms
Flash Attention cuda:0: -1.000 ms
Memory Efficient cuda:0: -1.000 ms

Device: cpu-16
Default cpu: 29144372.636 ms
Math cpu: 32452335.613 ms
Flash Attention cpu: 28638718.318 ms
Memory Efficient cpu: -1.000 ms

2 changes: 2 additions & 0 deletions benchmarks/results/rocm_sdk_612/pytorch_241/gfx103x/notes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
gfx1030 results with hacked pytorch from 20240805
- older aotriton version than the one which has gfx110x tuning fix

0 comments on commit 319b639

Please sign in to comment.