Trying to write a hgemm using opencl for tensor cores. Involves inline assembly
On Windows, please put into the same folder as the generated hgemmtest.exe.
Tensor cores are available on NVIDIA GPUs with Volta or Turing architecture, including (from Wikipedia):
GeForce RTX 2080 Ti
GeForce RTX 2080
GeForce RTX 2070
Quadro RTX 8000
Quadro RTX 6000
Quadro RTX 5000
Tesla T4
Tesla V100
Titan V
Titan V CEO Edition
Quadro GV100