You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@orvedahl documented some relevant tests for running Rayleigh on GPUs using OpenACC in this repository.
Before starting the actual port we should enable support for the relevant compilers in the Rayleigh build infrastructure. For Nvidia GPUs that would be the PGI compiler nvfortran. As it lacks support for quad-precision floating-point numbers, which we use in Math_Layer/Legendre_Polynomials.F90, we need to find a way around this.
There are several options to fix this:
Rewrite Legendre_Polynomials.F90 to avoid the use of quad precision numbers. It needs to be checked if the precision is needed after all.
Implement quad-precision through an external library, such as GMP.
Build the code using gfortran with its OpenACC support for Nvidia-PTX offloading. I have tested this and it works, but the downside here is that we will essentially always have to build our own compiler on each Nvidia GPU enabled cluster.
Build only Legendre_Polynomials.F90 with a different compiler (e.g., gfortran). This is hard to do, because the .mod file format of nvfortran and gfortran is not compatible.
I am leaning towards option 1 and if that doesn't work using option 2.
After the build system works, we should implement @orvedahl's changes to the loops and also explore if we can make use of direct GPU-to-GPU MPI communication. That would hopefully allow us to keep the data on the GPU for the whole computation, except for I/O.
The text was updated successfully, but these errors were encountered:
@orvedahl documented some relevant tests for running Rayleigh on GPUs using OpenACC in this repository.
Before starting the actual port we should enable support for the relevant compilers in the Rayleigh build infrastructure. For Nvidia GPUs that would be the PGI compiler
nvfortran
. As it lacks support for quad-precision floating-point numbers, which we use inMath_Layer/Legendre_Polynomials.F90
, we need to find a way around this.There are several options to fix this:
Legendre_Polynomials.F90
to avoid the use of quad precision numbers. It needs to be checked if the precision is needed after all.gfortran
with its OpenACC support for Nvidia-PTX offloading. I have tested this and it works, but the downside here is that we will essentially always have to build our own compiler on each Nvidia GPU enabled cluster.Legendre_Polynomials.F90
with a different compiler (e.g.,gfortran
). This is hard to do, because the.mod
file format ofnvfortran
andgfortran
is not compatible.I am leaning towards option 1 and if that doesn't work using option 2.
After the build system works, we should implement @orvedahl's changes to the loops and also explore if we can make use of direct GPU-to-GPU MPI communication. That would hopefully allow us to keep the data on the GPU for the whole computation, except for I/O.
The text was updated successfully, but these errors were encountered: