You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks a lot for your feature request. Given the performance improvements that can be achieved using Tensor Cores on NVIDIA hardware, it definitely makes sense to add support for Tensor Cores in 2.0 (which is going to be the next big release after v1.5).
A way to utilize tensor cores is needed, which should draw from the family of
VectorXXX
intrinsics in .NET and/or Vulkan Cooperative Matrix extension proposed by NVidia.Related CUDA documentation: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions
This is also mentioned in #923 , but the later is more about the support for shorter floats in general.
The text was updated successfully, but these errors were encountered: