diagnostics kinda clean#113
Conversation
| ! endif | ||
|
|
||
| if (present(net_err)) net_err = uh_err | ||
| ! GPU: present() not supported on GPU — net_err is never passed from diag remap callers |
There was a problem hiding this comment.
what happens for you when using present? I've used it before in procedures in gpu code when trying different porting strategies for continuity.
There was a problem hiding this comment.
did you try it inside a routine that is declared as !$omp declare target ? I was segfaulting
There was a problem hiding this comment.
HUH maybe my issue was something else
There was a problem hiding this comment.
I think it was a dual fold issue with smething else going wrong
| !> Maximum number of vertical levels supported for GPU-resident local arrays. | ||
| !! Variable-length locals inside declare-target routines force nvfortran to use | ||
| !! GPU heap (NVCOMPILER_ACC_CUDA_HEAPSIZE). Fixed-size arrays use stack instead. | ||
| integer, parameter, public :: NK_GPU_MAX = 500 |
There was a problem hiding this comment.
an alternative is to create the tmp arrays (if necessary, make them private) outside the routine and pass them in. This gets around the heap issue.
Not yet sure if that's useful here, but mentioning it in case we need it.
| ! Local variables | ||
| integer :: c, nk, i, j, k | ||
| type(axes_grp), pointer :: axes => NULL(), h_axes => NULL() ! Current axes, for convenience | ||
| ! Local pointer aliases to avoid derived-type components in OpenMP map clauses |
There was a problem hiding this comment.
was this done for convenience, or was there a real limitation?
There was a problem hiding this comment.
I was having issues with mapping the underlying structures, I would get it to compile and then at runtime they'd not be mapped on the GPU and this was a convenient workaround
| ! call CS%reconstruction%reconstruct(h0, u0) | ||
| ! call CS%reconstruction%remap_to_sub_grid(h0, u0, n1, h_sub, & | ||
| ! isrc_start, isrc_end, isrc_max, isub_src, & | ||
| ! u_sub, uh_sub, u02_err) |
There was a problem hiding this comment.
is these two methods that are dispatched by class? since this routine seems to operate columnwise with the ij loop being outside, could it be easier to move the ij loops inside the class-dispatched routines?
There was a problem hiding this comment.
hmmm could be a nice idea, let me look how big of a change this might be
There was a problem hiding this comment.
I think this would add a lot of boilerplate right? we'd need a GPU impl for mapping
|
This is now being handled in #153 ? Can we close this one? |
No description provided.