Invalid CUDA code: host computation on device buffer after GPU transformations #1773

edopao · 2024-11-18T15:50:40Z

The below SDFG produces invalid CUDA code, after applying apply_gpu_transformations:

The problem is that the tlet_1_scalar_expr nodes in the second nested level results in a symbolic expression computed as host code, but the result is written to a device buffer in GPU global memory. This causes a segmentation fault when the SDFG is called.

Generated host code:

int * __tmp3;
DACE_GPU_CHECK(cudaMalloc((void**)&__tmp3, __out_size_0 * sizeof(int)));

int __tmp0;
__tmp0 = (__tmp1 * __tmp1);

This SDFG can be reproduced from the GT4Py test case:
tests/next_tests/integration_tests/feature_tests/ffront_tests/test_execution.py::test_double_use_scalar

The current workaround is to run the simplify pass before calling apply_gpu_transformations, so that InlineSDFGs will bring the SDFG to a canonical form (see GridTools/gt4py#1741).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid CUDA code: host computation on device buffer after GPU transformations #1773

Invalid CUDA code: host computation on device buffer after GPU transformations #1773

edopao commented Nov 18, 2024

Invalid CUDA code: host computation on device buffer after GPU transformations #1773

Invalid CUDA code: host computation on device buffer after GPU transformations #1773

Comments

edopao commented Nov 18, 2024