This sample is a simple LD_PRELOAD based tool that allows to collect executed Level Zero kernels and buffer transfers within an application along with their total execution time and call count.
As a result, table like the following will be printed:
=== Device Timing Results: ===
Total Execution Time (ns): 368598201
Total Device Time (ns): 172440915
Kernel, Calls, SIMD, Time (ns), Time (%), Average (ns), Min (ns), Max (ns)
GEMM, 4, 32, 172440915, 100.00, 43110228, 42880167, 43415416
- Linux
- Windows (under development)
- CMake (version 3.12 and above)
- Git (version 1.8 and above)
- Python (version 2.7 and above)
- oneAPI Level Zero loader
- Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver
Run the following commands to build the sample:
cd <pti>/samples/ze_hot_kernels
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
Use this command line to run the tool:
./ze_hot_kernels <target_application>
One may use ze_gemm or dpc_gemm as target application:
./ze_hot_kernels ../../ze_gemm/build/ze_gemm
./ze_hot_kernels ../../dpc_gemm/build/dpc_gemm
Use Microsoft* Visual Studio x64 command prompt to run the following commands and build the sample:
cd <pti>\samples\ze_hot_kernels
mkdir build
cd build
cmake -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_LIBRARY_PATH=<level_zero_loader>\lib -DCMAKE_INCLUDE_PATH=<level_zero_loader>\include ..
nmake
Use this command line to run the tool:
ze_hot_kernels.exe <target_application>
One may use ze_gemm or dpc_gemm as target application:
ze_hot_kernels.exe ..\..\ze_gemm\build\ze_gemm.exe
ze_hot_kernels.exe ..\..\dpc_gemm\build\dpc_gemm.exe