gfx1030 and gfx1035 benchmark results

- pt231 with old aotriton - pt240 with old aotriton Signed-off-by: Mika Laitio <[email protected]>
lamikr · Aug 8, 2024 · 319b639 · 319b639
1 parent 380e018
commit 319b639
Show file tree

Hide file tree

Showing 12 changed files with 242 additions and 0 deletions.
diff --git a/benchmarks/results/rocm_sdk_612/pytorch_231/gfx1030/20240807_192355_cpu_vs_gpu_simple.txt b/benchmarks/results/rocm_sdk_612/pytorch_231/gfx1030/20240807_192355_cpu_vs_gpu_simple.txt
@@ -0,0 +1,9 @@
+Benchmarking CPU and GPUs
+Pytorch version: 2.3.1
+ROCM HIP version: 6.1.40093-6a0232ced
+       Device: cpu-16
+    'CPU time: 31.879 sec
+       Device: AMD Radeon RX 6800
+    'GPU time: 0.336 sec
+Benchmark ready
+
diff --git a/benchmarks/results/rocm_sdk_612/pytorch_231/gfx1030/20240807_192355_pytorch_dot_products.txt b/benchmarks/results/rocm_sdk_612/pytorch_231/gfx1030/20240807_192355_pytorch_dot_products.txt
@@ -0,0 +1,51 @@
+Pytorch version: 2.3.1
+dot product calculation test
+tensor([[[ 0.9455, -0.6972,  0.6711, -0.6345,  0.4092, -0.7703,  0.0519,
+           0.0941],
+         [ 0.9826, -0.8131,  0.9687,  0.2898,  0.4251, -0.4728, -0.3721,
+           0.3235],
+         [ 0.9581, -0.4088,  0.8697,  1.8528,  0.7221,  0.2384, -0.9122,
+           0.6484]],
+
+        [[-0.3769,  1.5725, -0.1406,  1.0675,  0.4376, -0.1497,  0.1996,
+          -0.6237],
+         [ 0.0979,  1.1114,  0.4869,  0.9094,  0.1796, -0.7216, -0.2395,
+           0.1582],
+         [ 0.1717,  1.1014,  0.3889,  1.0819,  0.0485, -0.6566,  0.5223,
+          -0.4733]]], device='cuda:0')
+
+Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
+Device: AMD Radeon RX 6800
+    Default cuda:0 benchmark:
+        25619.943 microseconds, 0.025619942622142844 sec
+    Math cuda:0 benchmark:
+        26119.020 microseconds, 0.026119020103942604 sec
+    Flash Attention cuda:0 benchmark:
+    Flash Attention cuda:0 is not supported. See warnings for reasons.
+    Memory Efficient cuda:0 benchmark:
+    Memory Efficient cuda:0 is not supported. See warnings for reasons.
+Device: cpu-16
+    Default cpu benchmark:
+        29142019.198 microseconds, 29.142019197985064 sec
+    Math cpu benchmark:
+        32081706.194 microseconds, 32.08170619397424 sec
+    Flash Attention cpu benchmark:
+        28997640.633 microseconds, 28.997640632966068 sec
+    Memory Efficient cpu benchmark:
+    Memory Efficient cpu is not supported. See warnings for reasons.
+Summary
+
+Pytorch version: 2.3.1
+ROCM HIP version: 6.1.40093-6a0232ced
+Device: AMD Radeon RX 6800
+               Default cuda:0:            25619.943 ms
+                  Math cuda:0:            26119.020 ms
+       Flash Attention cuda:0:               -1.000 ms
+      Memory Efficient cuda:0:               -1.000 ms
+
+Device: cpu-16
+                  Default cpu:         29142019.198 ms
+                     Math cpu:         32081706.194 ms
+          Flash Attention cpu:         28997640.633 ms
+         Memory Efficient cpu:               -1.000 ms
+
diff --git a/benchmarks/results/rocm_sdk_612/pytorch_231/gfx1035/20240807_202419_cpu_vs_gpu_simple.txt b/benchmarks/results/rocm_sdk_612/pytorch_231/gfx1035/20240807_202419_cpu_vs_gpu_simple.txt
@@ -0,0 +1,9 @@
+Benchmarking CPU and GPUs
+Pytorch version: 2.3.1
+ROCM HIP version: 6.1.40093-e0d934acc
+       Device: cpu-16
+    'CPU time: 30.518 sec
+       Device: AMD Radeon Graphics
+    'GPU time: 0.433 sec
+Benchmark ready
+
diff --git a/benchmarks/results/rocm_sdk_612/pytorch_231/gfx1035/20240807_202419_pytorch_dot_products.txt b/benchmarks/results/rocm_sdk_612/pytorch_231/gfx1035/20240807_202419_pytorch_dot_products.txt
@@ -0,0 +1,51 @@
+Pytorch version: 2.3.1
+dot product calculation test
+tensor([[[ 0.0689,  0.2939,  0.7003,  0.1261,  1.0665,  1.0418,  0.4118,
+           0.1683],
+         [-0.5260,  1.1823,  2.0751,  0.5867,  1.0490,  0.8184, -0.4882,
+           0.5669],
+         [-0.2747,  0.7315,  1.5438,  0.5053,  0.9644,  0.6745, -0.0424,
+           0.4179]],
+
+        [[-0.0273,  0.0843,  1.3197, -0.2101,  0.2967, -0.0361,  1.1482,
+          -1.6203],
+         [-0.1350,  0.4006,  1.6299, -0.0231, -1.1513, -0.6066,  0.9720,
+          -0.6136],
+         [-0.1251,  0.3714,  1.5972, -0.0524, -1.0149, -0.5489,  0.9840,
+          -0.6991]]], device='cuda:0')
+
+Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
+Device: AMD Radeon Graphics
+    Default cuda:0 benchmark:
+        220142.783 microseconds, 0.22014278299957368 sec
+    Math cuda:0 benchmark:
+        221367.763 microseconds, 0.22136776300021666 sec
+    Flash Attention cuda:0 benchmark:
+    Flash Attention cuda:0 is not supported. See warnings for reasons.
+    Memory Efficient cuda:0 benchmark:
+    Memory Efficient cuda:0 is not supported. See warnings for reasons.
+Device: cpu-16
+    Default cpu benchmark:
+        29291631.042 microseconds, 29.291631042000063 sec
+    Math cpu benchmark:
+        33746072.789 microseconds, 33.74607278900021 sec
+    Flash Attention cpu benchmark:
+        29435830.007 microseconds, 29.435830006999822 sec
+    Memory Efficient cpu benchmark:
+    Memory Efficient cpu is not supported. See warnings for reasons.
+Summary
+
+Pytorch version: 2.3.1
+ROCM HIP version: 6.1.40093-e0d934acc
+Device: AMD Radeon Graphics
+               Default cuda:0:           220142.783 ms
+                  Math cuda:0:           221367.763 ms
+       Flash Attention cuda:0:               -1.000 ms
+      Memory Efficient cuda:0:               -1.000 ms
+
+Device: cpu-16
+                  Default cpu:         29291631.042 ms
+                     Math cpu:         33746072.789 ms
+          Flash Attention cpu:         29435830.007 ms
+         Memory Efficient cpu:               -1.000 ms
+
diff --git a/benchmarks/results/rocm_sdk_612/pytorch_241/gfx1035/20240807_194210_cpu_vs_gpu_simple.txt b/benchmarks/results/rocm_sdk_612/pytorch_241/gfx1035/20240807_194210_cpu_vs_gpu_simple.txt
@@ -0,0 +1,9 @@
+Benchmarking CPU and GPUs
+Pytorch version: 2.4.1-rc1
+ROCM HIP version: 6.1.40093-e0d934acc
+       Device: cpu-16
+    'CPU time: 26.592 sec
+       Device: AMD Radeon Graphics
+    'GPU time: 0.704 sec
+Benchmark ready
+
diff --git a/benchmarks/results/rocm_sdk_612/pytorch_241/gfx1035/20240807_194210_pytorch_dot_products.txt b/benchmarks/results/rocm_sdk_612/pytorch_241/gfx1035/20240807_194210_pytorch_dot_products.txt
@@ -0,0 +1,51 @@
+Pytorch version: 2.4.1-rc1
+dot product calculation test
+tensor([[[ 1.0907,  0.9275, -0.5733,  1.0613,  0.8276,  0.3546, -0.1415,
+           0.5558],
+         [-0.1007,  1.3336,  1.4849,  0.4857, -0.4688, -1.1529, -1.4936,
+          -0.2379],
+         [ 0.3383,  1.1955,  0.4283,  0.9084,  0.0747, -0.4756, -0.9940,
+           0.2775]],
+
+        [[ 1.3691, -0.0811,  0.9837,  0.0884,  0.1267, -0.6009,  1.4974,
+           1.0018],
+         [ 2.2687, -1.3112,  1.4178, -0.9110, -0.6768,  0.7281,  0.2645,
+           0.0655],
+         [ 1.4861, -0.5155,  1.0940, -0.1645, -0.1212, -0.2863,  1.0591,
+           0.6412]]], device='cuda:0')
+
+Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
+Device: AMD Radeon Graphics
+    Default cuda:0 benchmark:
+        221673.677 microseconds, 0.2216736770000125 sec
+    Math cuda:0 benchmark:
+        219790.358 microseconds, 0.21979035800001157 sec
+    Flash Attention cuda:0 benchmark:
+    Flash Attention cuda:0 is not supported. See warnings for reasons.
+    Memory Efficient cuda:0 benchmark:
+    Memory Efficient cuda:0 is not supported. See warnings for reasons.
+Device: cpu-16
+    Default cpu benchmark:
+        29583910.213 microseconds, 29.583910212999996 sec
+    Math cpu benchmark:
+        33457111.382 microseconds, 33.45711138200002 sec
+    Flash Attention cpu benchmark:
+        29748300.851 microseconds, 29.74830085100001 sec
+    Memory Efficient cpu benchmark:
+    Memory Efficient cpu is not supported. See warnings for reasons.
+Summary
+
+Pytorch version: 2.4.1-rc1
+ROCM HIP version: 6.1.40093-e0d934acc
+Device: AMD Radeon Graphics
+               Default cuda:0:           221673.677 ms
+                  Math cuda:0:           219790.358 ms
+       Flash Attention cuda:0:               -1.000 ms
+      Memory Efficient cuda:0:               -1.000 ms
+
+Device: cpu-16
+                  Default cpu:         29583910.213 ms
+                     Math cpu:         33457111.382 ms
+          Flash Attention cpu:         29748300.851 ms
+         Memory Efficient cpu:               -1.000 ms
+
diff --git a/benchmarks/results/rocm_sdk_612/pytorch_241/gfx103x/20240807_190756_cpu_vs_gpu_simple.txt b/benchmarks/results/rocm_sdk_612/pytorch_241/gfx103x/20240807_190756_cpu_vs_gpu_simple.txt
@@ -0,0 +1,9 @@
+Benchmarking CPU and GPUs
+Pytorch version: 2.4.1-rc1
+ROCM HIP version: 6.1.40093-2b15d6049
+       Device: cpu-16
+    'CPU time: 38.230 sec
+       Device: AMD Radeon RX 6800
+    'GPU time: 0.347 sec
+Benchmark ready
+
diff --git a/benchmarks/results/rocm_sdk_612/pytorch_241/gfx103x/20240807_190756_pytorch_dot_products.txt b/benchmarks/results/rocm_sdk_612/pytorch_241/gfx103x/20240807_190756_pytorch_dot_products.txt
@@ -0,0 +1,51 @@
+Pytorch version: 2.4.1-rc1
+dot product calculation test
+tensor([[[ 0.0377, -0.4681, -0.5734,  1.1364, -0.2186, -0.7082,  0.5863,
+          -0.2069],
+         [-0.3001, -0.7798, -0.3014,  1.4297, -0.1664, -1.0023,  0.7803,
+          -0.3309],
+         [ 0.7447, -0.7807,  0.2690,  1.2052,  0.2125, -0.5166,  0.4624,
+           0.2819]],
+
+        [[ 0.7447, -0.0688,  0.0293, -0.5658, -0.2677,  0.1797, -0.5611,
+           0.6280],
+         [ 1.9031, -0.0560, -0.3467,  0.3541,  0.0653, -1.1427,  0.3638,
+           1.4265],
+         [ 2.2887, -0.1055, -0.4986,  0.6864,  0.1769, -1.6577,  0.6076,
+           1.8052]]], device='cuda:0')
+
+Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
+Device: AMD Radeon RX 6800
+    Default cuda:0 benchmark:
+        25666.238 microseconds, 0.02566623799066292 sec
+    Math cuda:0 benchmark:
+        26148.054 microseconds, 0.026148054201621564 sec
+    Flash Attention cuda:0 benchmark:
+    Flash Attention cuda:0 is not supported. See warnings for reasons.
+    Memory Efficient cuda:0 benchmark:
+    Memory Efficient cuda:0 is not supported. See warnings for reasons.
+Device: cpu-16
+    Default cpu benchmark:
+        29144372.636 microseconds, 29.144372635986656 sec
+    Math cpu benchmark:
+        32452335.613 microseconds, 32.452335612964816 sec
+    Flash Attention cpu benchmark:
+        28638718.318 microseconds, 28.638718318019528 sec
+    Memory Efficient cpu benchmark:
+    Memory Efficient cpu is not supported. See warnings for reasons.
+Summary
+
+Pytorch version: 2.4.1-rc1
+ROCM HIP version: 6.1.40093-2b15d6049
+Device: AMD Radeon RX 6800
+               Default cuda:0:            25666.238 ms
+                  Math cuda:0:            26148.054 ms
+       Flash Attention cuda:0:               -1.000 ms
+      Memory Efficient cuda:0:               -1.000 ms
+
+Device: cpu-16
+                  Default cpu:         29144372.636 ms
+                     Math cpu:         32452335.613 ms
+          Flash Attention cpu:         28638718.318 ms
+         Memory Efficient cpu:               -1.000 ms
+
diff --git a/benchmarks/results/rocm_sdk_612/pytorch_241/gfx103x/notes.txt b/benchmarks/results/rocm_sdk_612/pytorch_241/gfx103x/notes.txt
@@ -0,0 +1,2 @@
+gfx1030 results with hacked pytorch from 20240805
+- older aotriton version than the one which has gfx110x tuning fix
diff --git a/...241/20240807_175108_cpu_vs_gpu_simple.txt → ...10x/20240807_175108_cpu_vs_gpu_simple.txt b/...241/20240807_175108_cpu_vs_gpu_simple.txt → ...10x/20240807_175108_cpu_vs_gpu_simple.txt
diff --git a/.../20240807_175108_pytorch_dot_products.txt → .../20240807_175108_pytorch_dot_products.txt b/.../20240807_175108_pytorch_dot_products.txt → .../20240807_175108_pytorch_dot_products.txt
diff --git a/...hmarks/rocm_sdk_612/pytorch_241/notes.txt → ...esults/rocm_sdk_612/pytorch_241/notes.txt b/...hmarks/rocm_sdk_612/pytorch_241/notes.txt → ...esults/rocm_sdk_612/pytorch_241/notes.txt
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		gfx1030 results with hacked pytorch from 20240805
		- older aotriton version than the one which has gfx110x tuning fix