risk of integer overflow when flattening 4D-5D arrays in parallel_for on GPUs #296
francois-rincon
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I have been confronted to a sneaky potential bugger which can happen in 4D and 5D (only in my fork) versions of idefix_for, on GPUs or possibly other loop patterns for different architectures. On GPU, with the RANGE loopattern, the multiD loop is flattened on a single IDX which is declared as int, which loops from 0 to NNxNXxNYxNZ(xNNPRIME)- 1 in 5D ! To recover the inner multiD indexes, one also to take integer divisions with large numbers such as NXxNYxNZ and in my 5D case NNPRIMExNXxNYxNZ.
The problem is that these and IDX can be very big numbers, and this can overflow the int register, resulting in wrong multiD n,nprime,i,j,k indexes in the loop. The fix, I think (as it currently appears to at least not break running the code, I had memory addressing errors before that because of the wrong n,nprime,i,j,k) seems to be to declare these big numbers and IDX of type long.
It's not really a bug for most uses as most of you are not using 5D loops and are at low risk of overflowing but thought I'd drop that in the discussion . I started having the problem doing 128 species on a 1x576x288 grid for my code, resulting in a 5D loop size of 128^2x1x576x288 . But the problem only occured on a single GPU, not on 2 (as then each loop on a single GPU is /2 ) , and it seemingly disappeared for even higher resolution (I think it produced crap, it's just that the long int was not negative anymore for even higher res, so at least it did not produce crappy multiD indexes...)
I point this out here in case this is useful for people who may cycle over dust fluids using 4D loops at high-res, or even in 3D at every high-res using a single GPU.
The problem may actually not be realizable for most fluid problems on a single GPUs because you can't fit huge 3D fields on the memory of a single GPU anyways, but in my case the fields themselves are NspeciesxNxxNyx1 , so not huge and fit well within memory of my A30s, however the Nspecies^2 factor in the 5D loop killed my idefix_for because of the int overflow ! Problem may be more important if you are looping in 4D with a significant number of points/fluids in the inner directions.
Also, I hope something similar does not happen under the carpet for reduce which uses MDRangePolicy ? I suspect the kokkos people will have thought that they need long stencils to loop over arrays of up to 8 dimensions ?
Let me know if you have faced this already and if you think this should be filed as a bug. I have fixed my fork for the time being and will make sure declaring IDX and the big stencils as long really solves the issue.
François
Beta Was this translation helpful? Give feedback.
All reactions