Running Hopper TMA kernels with pycuda #443
Replies: 4 comments
-
Not yet, but contributions that wrap these should be straightforward (just by following the existing patterns) and are definitely welcome. |
Beta Was this translation helpful? Give feedback.
-
@inducer Can you point me to some of these existing patterns? Anyways, any progress in supporting that would be greatly appreciated! |
Beta Was this translation helpful? Give feedback.
-
@inducer pycuda uses nvrtc under the hood, correct? |
Beta Was this translation helpful? Give feedback.
-
There was a proposed change at a point, but that sort of petered out. Currently, we run |
Beta Was this translation helpful? Give feedback.
-
Hi, I would like to use pycuda to compile and launch kernels targeting NVIDIAs Hopper architecture. These kernels might use the TMA hardware unit (https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/) which requires creating CUTensorMap objects (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TENSOR__MEMORY.html) on the host side and pass them as additional kernel parameters to the kernel.
Would it be possible to express this somehow in pycuda?
Beta Was this translation helpful? Give feedback.
All reactions