Skip to content

Interfacing OpenACC with cuFFT, BLAS, MKL, FFTW

Michel Müller edited this page Jul 6, 2015 · 1 revision

Many of the EuroHack projects require basic numerical operations like FFTs or level-1/2/3 BLAS. For these it is highly advantageous to use the NVIDIA cuXXX libraries. The corresponding user manuals can be found here:

These libraries are heavily optimized for GPUs, and there is no reason to try to beat their performance by writing one's own. The problem becomes one of interfacing the user's OpenACC code to these libraries. Fortunately there are a number of comprehensive examples available:

OpenACC Interoperability Tricks by Jeff Larkin

Interfacing an OpenACC program to cuFFT

OpenACC/cuFFT interoperability by Adam Simpson

Implementing FFT in a performant and portable way (possibility to run (fast) on Teslas, MICs, CPUs) has come up as an issue. The FFTW interface has been discussed, however, it is a host only interface so doesn’t work well if your data is already on the device. Intel “Math Kernel Library” (MKL) has FFT functionality built in, which seems to be the equivalent for cuFFT on MIC [1]. It should be possible to build a common interface to both cuFFT and MKL-FFT that supports device pointers. This still leaves open how to fall back on non-Intel x86-systems - on CPU there should be a software fallback option, potentially using FFTW. A common interface could be built using preprocessor macros.

[1] https://software.intel.com/en-us/articles/the-intel-math-kernel-library-and-its-fast-fourier-transform-routines