-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add 3 more benchmarks #2364
base: main
Are you sure you want to change the base?
add 3 more benchmarks #2364
Conversation
Compute Benchmarks level_zero run (with params: ): |
Compute Benchmarks level_zero run (): SummaryTotal 77 benchmarks in mean. (result is better) Performance change in benchmark groupsRelative perf in group api (6): 94.825%
Relative perf in group memory (3): 102.717%
Relative perf in group miscellaneous (1): 99.890%
Relative perf in group multithread (8): 99.574%
Relative perf in group Runtime (8): 103.058%
Relative perf in group MicroBench (14): 100.113%
Relative perf in group Pattern (10): 100.042%
Relative perf in group ScalarProduct (6): 100.044%
Relative perf in group USM (7): 99.530%
Relative perf in group VectorAddition (3): 99.740%
Relative perf in group Polybench (3): 100.086%
Relative perf in group Kmeans (1): 100.012%
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Relative perf in group MolecularDynamics (1): 103.846%
Relative perf in group llama.cpp (6): 100.719%
Relative perf in group Velocity-Bench (6): cannot calculate
DetailsBenchmark details - environment, command, output...api_overhead_benchmark_sycl SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type miscellaneous_benchmark_sycl VectorSumEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type Runtime_IndependentDAGTaskThroughput_HierarchicalParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Output:['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.273299', '0.273271', '0.272456', '0.272456 0.273180 0.273271 0.273401 0.274184', '0.000616', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_IndependentDAGTaskThroughput_SingleTaskEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Output:['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.262816', '0.262387', '0.253176', '0.253176 0.260528 0.262387 0.263326 0.274661', '0.007729', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_IndependentDAGTaskThroughput_BasicParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Output:['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.273175', '0.273405', '0.270476', '0.270476 0.271232 0.273405 0.274709 0.276054', '0.002332', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_IndependentDAGTaskThroughput_NDRangeParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Output:['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.276156', '0.270529', '0.270016', '0.270016 0.270392 0.270529 0.271280 0.298563', '0.012534', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_DAGTaskThroughput_BasicParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Output:['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.731921', '1.731725', '1.729209', '1.729209 1.730248 1.731725 1.731873 1.736552', '0.002812', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_DAGTaskThroughput_NDRangeParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Output:['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.677170', '1.676520', '1.675415', '1.675415 1.675698 1.676520 1.678950 1.679270', '0.001820', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_DAGTaskThroughput_HierarchicalParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Output:['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.702001', '1.702231', '1.700349', '1.700349 1.701259 1.702231 1.702753 1.703414', '0.001214', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_DAGTaskThroughput_SingleTaskEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Output:['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.678307', '1.678318', '1.675849', '1.675849 1.676296 1.678318 1.679879 1.681195', '0.002286', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] MicroBench_HostDeviceBandwidth_1D_H2D_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004369', '0.004326', '0.004289', '0.004289 0.004314 0.004326 0.004455 0.004460', '0.000082', '29.141890', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_1D_D2H_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004667', '0.004650', '0.004629', '0.004629 0.004636 0.004650 0.004706 0.004713', '0.000040', '27.001672', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_1D_H2D_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004672', '0.004363', '0.004301', '0.004301 0.004316 0.004363 0.004385 0.005993', '0.000740', '29.063022', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_3D_H2D_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004543', '0.004552', '0.004483', '0.004483 0.004545 0.004552 0.004564 0.004572', '0.000035', '27.884333', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_2D_D2H_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617443', '0.617464', '0.617339', '0.617339 0.617459 0.617464 0.617475 0.617479', '0.000059', '0.202482', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_3D_D2H_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618081', '0.618073', '0.618023', '0.618023 0.618046 0.618073 0.618123 0.618140', '0.000050', '0.202258', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_3D_H2D_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004435', '0.004442', '0.004355', '0.004355 0.004381 0.004442 0.004467 0.004529', '0.000069', '28.703010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_3D_D2H_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617447', '0.617434', '0.617427', '0.617427 0.617432 0.617434 0.617465 0.617475', '0.000022', '0.202453', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_2D_D2H_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618119', '0.618141', '0.617996', '0.617996 0.618111 0.618141 0.618166 0.618182', '0.000074', '0.202267', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_1D_D2H_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004468', '0.004412', '0.004395', '0.004395 0.004399 0.004412 0.004519 0.004615', '0.000097', '28.442006', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_2D_H2D_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004565', '0.004557', '0.004446', '0.004446 0.004520 0.004557 0.004629 0.004672', '0.000089', '28.115179', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_2D_H2D_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004491', '0.004541', '0.004383', '0.004383 0.004391 0.004541 0.004557 0.004584', '0.000096', '28.519806', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_LocalMem_int32_4096Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000 Output:['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.029904', '0.029881', '0.029862', '0.029862 0.029878 0.029881 0.029910 0.029988', '0.000050', '10448.042887', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000'] MicroBench_LocalMem_fp32_4096Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000 Output:['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.029853', '0.029847', '0.029805', '0.029805 0.029811 0.029847 0.029876 0.029927', '0.000050', '10468.030333', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000'] Pattern_Reduction_Hierarchical_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000 Output:['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016963', '0.017019', '0.016736', '0.016736 0.016958 0.017019 0.017023 0.017081', '0.000134', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_Reduction_NDRange_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000 Output:['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016828', '0.016831', '0.016604', '0.016604 0.016793 0.016831 0.016950 0.016963', '0.000145', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] ScalarProduct_NDRange_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Output:['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003755', '0.003749', '0.003737', '0.003737 0.003742 0.003749 0.003766 0.003783', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] ScalarProduct_NDRange_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Output:['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003756', '0.003749', '0.003731', '0.003731 0.003746 0.003749 0.003760 0.003795', '0.000024', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] ScalarProduct_Hierarchical_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Output:['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.009957', '0.009955', '0.009943', '0.009943 0.009947 0.009955 0.009956 0.009985', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] ScalarProduct_NDRange_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Output:['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005446', '0.005443', '0.005434', '0.005434 0.005443 0.005443 0.005444 0.005465', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] ScalarProduct_Hierarchical_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Output:['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010306', '0.010309', '0.010272', '0.010272 0.010300 0.010309 0.010317 0.010332', '0.000022', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] ScalarProduct_Hierarchical_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Output:['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011293', '0.011299', '0.011268', '0.011268 0.011271 0.011299 0.011313 0.011316', '0.000023', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_Hierarchical_int16Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011802', '0.011802', '0.011784', '0.011784 0.011798 0.011802 0.011806 0.011820', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_NDRange_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.002168', '0.002165', '0.002162', '0.002162 0.002164 0.002165 0.002165 0.002183', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_NDRange_int16Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.002275', '0.002271', '0.002268', '0.002268 0.002270 0.002271 0.002275 0.002293', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_Hierarchical_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011785', '0.011772', '0.011757', '0.011757 0.011768 0.011772 0.011782 0.011845', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_NDRange_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.002173', '0.002171', '0.002170', '0.002170 0.002170 0.002171 0.002172 0.002183', '0.000006', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_NDRange_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.002349', '0.002348', '0.002341', '0.002341 0.002345 0.002348 0.002353 0.002359', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_Hierarchical_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011594', '0.011591', '0.011574', '0.011574 0.011589 0.011591 0.011595 0.011618', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_Hierarchical_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011593', '0.011595', '0.011572', '0.011572 0.011589 0.011595 0.011597 0.011612', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Allocation_latency_fp32_deviceEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 Output:['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000064', '0.000069', '0.000046', '0.000046 0.000052 0.000069 0.000069 0.000085', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Allocation_latency_fp32_hostEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 Output:['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037514', '0.037628', '0.037325', '0.037325 0.037359 0.037628 0.037629 0.037629', '0.000158', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Allocation_latency_fp32_sharedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 Output:['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000059', '0.000055', '0.000050', '0.000050 0.000051 0.000055 0.000065 0.000072', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetchEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 Output:['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001073', '0.001043', '0.001041', '0.001041 0.001041 0.001043 0.001082 0.001157', '0.000050', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetchEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 Output:['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001207', '0.001207', '0.001202', '0.001202 0.001206 0.001207 0.001207 0.001211', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetchEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 Output:['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001958', '0.001647', '0.001642', '0.001642 0.001642 0.001647 0.001652 0.003207', '0.000698', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetchEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 Output:['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001815', '0.001798', '0.001792', '0.001792 0.001796 0.001798 0.001805 0.001886', '0.000040', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] VectorAddition_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000 Output:['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001462', '0.001449', '0.001440', '0.001440 0.001449 0.001449 0.001474 0.001500', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] VectorAddition_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000 Output:['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003064', '0.003060', '0.003053', '0.003053 0.003058 0.003060 0.003068 0.003079', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] VectorAddition_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000 Output:['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001468', '0.001462', '0.001452', '0.001452 0.001457 0.001462 0.001464 0.001504', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Polybench_2mmEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/2mm.csv --size=512 Output:['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001218', '0.001218', '0.001211', '0.001211 0.001212 0.001218 0.001221 0.001229', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Polybench_3mmEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/3mm.csv --size=512 Output:['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001732', '0.001727', '0.001722', '0.001722 0.001724 0.001727 0.001737 0.001749', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Polybench_AtaxEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Atax.csv --size=8192 Output:['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006794', '0.006849', '0.006699', '0.006699 0.006702 0.006849 0.006851 0.006871', '0.000086', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Kmeans_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Kmeans.csv --size=700000000 Output:['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016060', '0.016049', '0.016045', '0.016045 0.016047 0.016049 0.016079 0.016080', '0.000018', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] LinearRegressionCoeff_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/LinearRegressionCoeff.csv --size=1638400000 Output:['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.858873', '0.856672', '0.841709', '0.841709 0.841853 0.856672 0.863435 0.890694', '0.020140', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] MolecularDynamicsEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/MolecularDynamics.csv --size=8196 Output:['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000033', '0.000026', '0.000025', '0.000025 0.000026 0.000026 0.000030 0.000058', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] llama.cpp Prompt Processing Batched 128Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf Output:build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts llama.cpp Text Generation Batched 128Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf Output:build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts llama.cpp Text Generation Batched 512Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf Output:build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts llama.cpp Text Generation Batched 256Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf Output:build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts llama.cpp Prompt Processing Batched 256Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf Output:build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts llama.cpp Prompt Processing Batched 512Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf Output:build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts |
Compute Benchmarks level_zero_v2 run (with params: --compare baseline-v2): |
c49546c
to
804d200
Compare
Compute Benchmarks level_zero_v2 run (--compare baseline-v2): SummaryNo diffs to calculate performance change (result is better) Performance change in benchmark groupsRelative perf in group api (6): cannot calculate
Relative perf in group memory (3): cannot calculate
Relative perf in group miscellaneous (1): cannot calculate
Relative perf in group multithread (8): cannot calculate
Relative perf in group Runtime (8): cannot calculate
Relative perf in group MicroBench (14): cannot calculate
Relative perf in group Pattern (10): cannot calculate
Relative perf in group ScalarProduct (6): cannot calculate
Relative perf in group USM (7): cannot calculate
Relative perf in group VectorAddition (3): cannot calculate
Relative perf in group Polybench (3): cannot calculate
Relative perf in group Kmeans (1): cannot calculate
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Relative perf in group MolecularDynamics (1): cannot calculate
Relative perf in group llama.cpp (6): cannot calculate
Relative perf in group Velocity-Bench (6): cannot calculate
DetailsBenchmark details - environment, command, output...api_overhead_benchmark_sycl SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type miscellaneous_benchmark_sycl VectorSumEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type Runtime_IndependentDAGTaskThroughput_BasicParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Output:['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.190922', '0.186652', '0.182696', '0.182696 0.183532 0.186652 0.186689 0.215041', '0.013603', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_IndependentDAGTaskThroughput_HierarchicalParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Output:['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.197491', '0.190006', '0.189277', '0.189277 0.189470 0.190006 0.190348 0.228352', '0.017257', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_IndependentDAGTaskThroughput_NDRangeParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Output:['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.186533', '0.186364', '0.185129', '0.185129 0.185568 0.186364 0.187079 0.188525', '0.001341', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_IndependentDAGTaskThroughput_SingleTaskEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Output:['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.190598', '0.181620', '0.176535', '0.176535 0.177970 0.181620 0.183571 0.233294', '0.024032', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_DAGTaskThroughput_NDRangeParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Output:['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.300368', '1.301018', '1.294351', '1.294351 1.296758 1.301018 1.304569 1.305147', '0.004747', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_DAGTaskThroughput_HierarchicalParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Output:['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.325413', '1.323247', '1.314688', '1.314688 1.321380 1.323247 1.326673 1.341077', '0.009785', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_DAGTaskThroughput_BasicParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Output:['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.346098', '1.346561', '1.339927', '1.339927 1.344470 1.346561 1.348534 1.350998', '0.004210', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Runtime_DAGTaskThroughput_SingleTaskEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Output:['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.277738', '1.276034', '1.270503', '1.270503 1.275233 1.276034 1.276356 1.290565', '0.007549', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] MicroBench_HostDeviceBandwidth_1D_H2D_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004349', '0.004417', '0.004218', '0.004218 0.004246 0.004417 0.004427 0.004438', '0.000108', '29.634238', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_2D_H2D_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004445', '0.004458', '0.004377', '0.004377 0.004451 0.004458 0.004464 0.004474', '0.000039', '28.561192', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_3D_H2D_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004461', '0.004486', '0.004349', '0.004349 0.004446 0.004486 0.004503 0.004523', '0.000069', '28.743700', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_3D_H2D_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004509', '0.004483', '0.004473', '0.004473 0.004482 0.004483 0.004508 0.004601', '0.000053', '27.943058', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_1D_D2H_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.003671', '0.003677', '0.003621', '0.003621 0.003626 0.003677 0.003709 0.003723', '0.000047', '34.518439', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_1D_D2H_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.003705', '0.003707', '0.003592', '0.003592 0.003634 0.003707 0.003792 0.003803', '0.000094', '34.803905', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_2D_H2D_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004467', '0.004470', '0.004416', '0.004416 0.004463 0.004470 0.004474 0.004513', '0.000035', '28.308121', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_3D_D2H_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617413', '0.617413', '0.617393', '0.617393 0.617401 0.617413 0.617423 0.617434', '0.000016', '0.202464', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_3D_D2H_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618141', '0.618144', '0.618104', '0.618104 0.618122 0.618144 0.618161 0.618173', '0.000028', '0.202231', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_2D_D2H_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617436', '0.617438', '0.617385', '0.617385 0.617401 0.617438 0.617449 0.617506', '0.000047', '0.202467', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_1D_H2D_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004657', '0.004351', '0.004252', '0.004252 0.004349 0.004351 0.004369 0.005963', '0.000732', '29.396064', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_HostDeviceBandwidth_2D_D2H_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 Output:['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618138', '0.618130', '0.618120', '0.618120 0.618124 0.618130 0.618134 0.618184', '0.000026', '0.202226', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000'] MicroBench_LocalMem_int32_4096Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000 Output:['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.029885', '0.029879', '0.029780', '0.029780 0.029863 0.029879 0.029933 0.029968', '0.000072', '10476.654538', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000'] MicroBench_LocalMem_fp32_4096Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000 Output:['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.029860', '0.029857', '0.029831', '0.029831 0.029849 0.029857 0.029879 0.029882', '0.000021', '10459.077401', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000'] Pattern_Reduction_NDRange_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000 Output:['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016856', '0.016825', '0.016701', '0.016701 0.016774 0.016825 0.016940 0.017040', '0.000135', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_Reduction_Hierarchical_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000 Output:['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.017090', '0.017079', '0.017030', '0.017030 0.017078 0.017079 0.017117 0.017146', '0.000044', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] ScalarProduct_Hierarchical_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Output:['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010321', '0.010320', '0.010312', '0.010312 0.010315 0.010320 0.010321 0.010337', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] ScalarProduct_NDRange_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Output:['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003809', '0.003798', '0.003792', '0.003792 0.003796 0.003798 0.003815 0.003842', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] ScalarProduct_NDRange_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Output:['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005499', '0.005497', '0.005485', '0.005485 0.005489 0.005497 0.005500 0.005526', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] ScalarProduct_Hierarchical_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Output:['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.009970', '0.009961', '0.009926', '0.009926 0.009955 0.009961 0.009999 0.010007', '0.000033', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] ScalarProduct_Hierarchical_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Output:['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011349', '0.011362', '0.011313', '0.011313 0.011317 0.011362 0.011364 0.011387', '0.000032', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] ScalarProduct_NDRange_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Output:['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003813', '0.003809', '0.003805', '0.003805 0.003808 0.003809 0.003814 0.003831', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_NDRange_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.002166', '0.002166', '0.002164', '0.002164 0.002165 0.002166 0.002166 0.002171', '0.000003', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_Hierarchical_int16Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011793', '0.011793', '0.011783', '0.011783 0.011793 0.011793 0.011798 0.011800', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_NDRange_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.002346', '0.002344', '0.002341', '0.002341 0.002343 0.002344 0.002349 0.002355', '0.000006', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_NDRange_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.002163', '0.002161', '0.002161', '0.002161 0.002161 0.002161 0.002163 0.002169', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_Hierarchical_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011595', '0.011595', '0.011588', '0.011588 0.011594 0.011595 0.011597 0.011602', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_NDRange_int16Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.002255', '0.002252', '0.002251', '0.002251 0.002251 0.002252 0.002257 0.002264', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_Hierarchical_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011599', '0.011598', '0.011591', '0.011591 0.011592 0.011598 0.011605 0.011610', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Pattern_SegmentedReduction_Hierarchical_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Output:['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011778', '0.011777', '0.011755', '0.011755 0.011761 0.011777 0.011777 0.011818', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Allocation_latency_fp32_sharedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 Output:['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000067', '0.000065', '0.000063', '0.000063 0.000064 0.000065 0.000071 0.000073', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Allocation_latency_fp32_hostEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 Output:['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037373', '0.037388', '0.037308', '0.037308 0.037354 0.037388 0.037391 0.037423', '0.000044', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetchEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 Output:['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001613', '0.001603', '0.001600', '0.001600 0.001602 0.001603 0.001608 0.001651', '0.000022', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetchEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 Output:['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001027', '0.001012', '0.001005', '0.001005 0.001007 0.001012 0.001049 0.001064', '0.000027', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetchEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 Output:['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001621', '0.001316', '0.001307', '0.001307 0.001310 0.001316 0.001325 0.002846', '0.000685', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetchEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 Output:['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001169', '0.001168', '0.001157', '0.001157 0.001167 0.001168 0.001173 0.001178', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] VectorAddition_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000 Output:['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001509', '0.001495', '0.001488', '0.001488 0.001491 0.001495 0.001519 0.001554', '0.000028', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] VectorAddition_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000 Output:['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001508', '0.001494', '0.001487', '0.001487 0.001487 0.001494 0.001525 0.001548', '0.000027', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] VectorAddition_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000 Output:['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003114', '0.003113', '0.003104', '0.003104 0.003107 0.003113 0.003119 0.003126', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Polybench_2mmEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/2mm.csv --size=512 Output:['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001214', '0.001212', '0.001204', '0.001204 0.001209 0.001212 0.001221 0.001225', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Polybench_3mmEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/3mm.csv --size=512 Output:['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001815', '0.001812', '0.001804', '0.001804 0.001810 0.001812 0.001818 0.001834', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Polybench_AtaxEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Atax.csv --size=8192 Output:['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006866', '0.006880', '0.006814', '0.006814 0.006843 0.006880 0.006882 0.006909', '0.000037', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] Kmeans_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/Kmeans.csv --size=700000000 Output:['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016052', '0.016050', '0.016045', '0.016045 0.016047 0.016050 0.016056 0.016061', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] LinearRegressionCoeff_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/LinearRegressionCoeff.csv --size=1638400000 Output:['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.691469', '0.686779', '0.686624', '0.686624 0.686706 0.686779 0.686985 0.710250', '0.010500', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] MolecularDynamicsEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=5 --output=/home/pmdk/bench_workdir/MolecularDynamics.csv --size=8196 Output:['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000033', '0.000027', '0.000025', '0.000025 0.000027 0.000027 0.000031 0.000054', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A'] llama.cpp Text Generation Batched 512Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf Output:build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts llama.cpp Text Generation Batched 256Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf Output:build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts llama.cpp Prompt Processing Batched 256Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf Output:build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts llama.cpp Text Generation Batched 128Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf Output:build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts llama.cpp Prompt Processing Batched 128Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf Output:build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts llama.cpp Prompt Processing Batched 512Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf Output:build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts |
Compute Benchmarks level_zero run (with params: --filter Velocity): |
Compute Benchmarks level_zero run (--filter Velocity): SummaryNo diffs to calculate performance change (result is better) Performance change in benchmark groupsRelative perf in group Velocity-Bench (7): cannot calculate
Relative perf in group api (6): cannot calculate
Relative perf in group memory (3): cannot calculate
Relative perf in group miscellaneous (1): cannot calculate
Relative perf in group multithread (8): cannot calculate
Relative perf in group Runtime (8): cannot calculate
Relative perf in group MicroBench (14): cannot calculate
Relative perf in group Pattern (10): cannot calculate
Relative perf in group ScalarProduct (6): cannot calculate
Relative perf in group USM (7): cannot calculate
Relative perf in group VectorAddition (3): cannot calculate
Relative perf in group Polybench (3): cannot calculate
Relative perf in group Kmeans (1): cannot calculate
Relative perf in group MolecularDynamics (1): cannot calculate
Relative perf in group llama.cpp (6): cannot calculate
DetailsBenchmark details - environment, command, output...Velocity-Bench svmEnvironment Variables:Command:/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m Output:Number of args 3 Buffering input text file (6989624 B). Loading elapsed time : 0.0621 s |
Compute Benchmarks level_zero_v2 run (with params: --compare baseline-v2 --filter Velocity): |
Compute Benchmarks level_zero_v2 run (--compare baseline-v2 --filter Velocity): SummaryNo diffs to calculate performance change (result is better) Performance change in benchmark groupsRelative perf in group api (6): cannot calculate
Relative perf in group memory (3): cannot calculate
Relative perf in group miscellaneous (1): cannot calculate
Relative perf in group multithread (8): cannot calculate
Relative perf in group Velocity-Bench (6): cannot calculate
Relative perf in group Runtime (8): cannot calculate
Relative perf in group MicroBench (14): cannot calculate
Relative perf in group Pattern (10): cannot calculate
Relative perf in group ScalarProduct (6): cannot calculate
Relative perf in group USM (7): cannot calculate
Relative perf in group VectorAddition (3): cannot calculate
Relative perf in group Polybench (3): cannot calculate
Relative perf in group Kmeans (1): cannot calculate
Relative perf in group MolecularDynamics (1): cannot calculate
Relative perf in group llama.cpp (6): cannot calculate
DetailsBenchmark details - environment, command, output... |
This patch implements support for: - dl-cifar - dl-mnist - svm These are all workloads from Velocity Bench that require oneMKL.
804d200
to
eb9ab15
Compare
Compute Benchmarks level_zero run (with params: --filter Velocity): |
Compute Benchmarks level_zero run (--filter Velocity): SummaryTotal 6 benchmarks in mean. (result is better) Performance change in benchmark groupsRelative perf in group Velocity-Bench (9): 99.869%
Relative perf in group api (6): cannot calculate
Relative perf in group memory (3): cannot calculate
Relative perf in group miscellaneous (1): cannot calculate
Relative perf in group multithread (8): cannot calculate
Relative perf in group Runtime (8): cannot calculate
Relative perf in group MicroBench (14): cannot calculate
Relative perf in group Pattern (10): cannot calculate
Relative perf in group ScalarProduct (6): cannot calculate
Relative perf in group USM (7): cannot calculate
Relative perf in group VectorAddition (3): cannot calculate
Relative perf in group Polybench (3): cannot calculate
Relative perf in group Kmeans (1): cannot calculate
Relative perf in group MolecularDynamics (1): cannot calculate
Relative perf in group llama.cpp (6): cannot calculate
DetailsBenchmark details - environment, command, output...Velocity-Bench HashtableEnvironment Variables:Command:/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify Output:hashtable - total time for whole calculation: 0.353958 s Velocity-Bench BitcrackerEnvironment Variables:Command:/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Output:---------> BitCracker: BitLocker password cracking tool <--------- ==================================
|
Compute Benchmarks level_zero_v2 run (with params: --filter Velocity --compare baseline-v2): |
Compute Benchmarks level_zero_v2 run (--filter Velocity --compare baseline-v2): SummaryNo diffs to calculate performance change (result is better) Performance change in benchmark groupsRelative perf in group Velocity-Bench (8): cannot calculate
Relative perf in group api (6): cannot calculate
Relative perf in group memory (3): cannot calculate
Relative perf in group miscellaneous (1): cannot calculate
Relative perf in group multithread (8): cannot calculate
Relative perf in group Runtime (8): cannot calculate
Relative perf in group MicroBench (14): cannot calculate
Relative perf in group Pattern (10): cannot calculate
Relative perf in group ScalarProduct (6): cannot calculate
Relative perf in group USM (7): cannot calculate
Relative perf in group VectorAddition (3): cannot calculate
Relative perf in group Polybench (3): cannot calculate
Relative perf in group Kmeans (1): cannot calculate
Relative perf in group MolecularDynamics (1): cannot calculate
Relative perf in group llama.cpp (6): cannot calculate
DetailsBenchmark details - environment, command, output...Velocity-Bench HashtableEnvironment Variables:Command:/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify Output:hashtable - total time for whole calculation: 0.350935 s Velocity-Bench BitcrackerEnvironment Variables:Command:/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Output:---------> BitCracker: BitLocker password cracking tool <--------- ==================================
|
This patch implements support for:
These are all workloads from Velocity Bench that require oneMKL.