You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The below SDFG produces invalid CUDA code, after applying apply_gpu_transformations:
The problem is that the tlet_1_scalar_expr nodes in the second nested level results in a symbolic expression computed as host code, but the result is written to a device buffer in GPU global memory. This causes a segmentation fault when the SDFG is called.
Generated host code:
int * __tmp3;
DACE_GPU_CHECK(cudaMalloc((void**)&__tmp3, __out_size_0 * sizeof(int)));
int __tmp0;
__tmp0 = (__tmp1 * __tmp1);
This SDFG can be reproduced from the GT4Py test case: tests/next_tests/integration_tests/feature_tests/ffront_tests/test_execution.py::test_double_use_scalar
The current workaround is to run the simplify pass before calling apply_gpu_transformations, so that InlineSDFGs will bring the SDFG to a canonical form (see GridTools/gt4py#1741).
The text was updated successfully, but these errors were encountered:
The below SDFG produces invalid CUDA code, after applying
apply_gpu_transformations
:The problem is that the
tlet_1_scalar_expr
nodes in the second nested level results in a symbolic expression computed as host code, but the result is written to a device buffer in GPU global memory. This causes a segmentation fault when the SDFG is called.Generated host code:
This SDFG can be reproduced from the GT4Py test case:
tests/next_tests/integration_tests/feature_tests/ffront_tests/test_execution.py::test_double_use_scalar
The current workaround is to run the simplify pass before calling
apply_gpu_transformations
, so thatInlineSDFGs
will bring the SDFG to a canonical form (see GridTools/gt4py#1741).The text was updated successfully, but these errors were encountered: