Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jittor1.3.9的3d算子报错 #560

Open
yykmeng opened this issue Jun 16, 2024 · 1 comment
Open

jittor1.3.9的3d算子报错 #560

yykmeng opened this issue Jun 16, 2024 · 1 comment

Comments

@yykmeng
Copy link

yykmeng commented Jun 16, 2024

Describe the bug

python -m jittor.test.test_cudnn_op时报错

  • python=3.9
  • jittor=1.3.9
  • cuda=12.2
  • cudnn=8
  • g++=11.4

Full Log

/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32__Ty_float32__Tw_float32__JIT_1__JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.cc(37): error: function "jittor::getDataType<T_ELEM>() [with T_ELEM=half1]" has already been defined
  template <> __inline__ cudnnDataType_t getDataType<half1>() { return CUDNN_DATA_HALF; }
                                         ^

/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32__Ty_float32__Tw_float32__JIT_1__JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.cc(38): error: function "jittor::getDataType<T_ELEM>() [with T_ELEM=float]" has already been defined
  template <> __inline__ cudnnDataType_t getDataType<float>() { return CUDNN_DATA_FLOAT; }
                                         ^

2 errors detected in the compilation of "/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32__Ty_float32__Tw_float32__JIT_1__JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.cc".
/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32__Ty_float32__Tw_float32__JIT_1__JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.cc(37): error: function "jittor::getDataType<T_ELEM>() [with T_ELEM=half1]" has already been defined
  template <> __inline__ cudnnDataType_t getDataType<half1>() { return CUDNN_DATA_HALF; }
                                         ^

/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32__Ty_float32__Tw_float32__JIT_1__JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.cc(38): error: function "jittor::getDataType<T_ELEM>() [with T_ELEM=float]" has already been defined
  template <> __inline__ cudnnDataType_t getDataType<float>() { return CUDNN_DATA_FLOAT; }
                                         ^

2 errors detected in the compilation of "/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32__Ty_float32__Tw_float32__JIT_1__JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.cc".
EE
======================================================================
ERROR: test_conv3d (__main__.TestCudnnConvOp)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/test/test_cudnn_op.py", line 142, in check
    jt.sync_all()
RuntimeError: [f 0616 09:51:29.555904 84 executor.cc:686]
Execute fused operator(2/7) failed.
[JIT Source]: /home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32__Ty_float32__Tw_float32__JIT_1__JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.cc
[OP TYPE]: cudnn_conv3d
[Input]: float32[2,4,10,10,10,], float32[5,4,3,3,3,],
[Output]: float32[2,5,10,10,10,],
[Async Backtrace]: not found, please set env JT_SYNC=1, trace_py_var=3
[Reason]: [f 0616 09:51:29.555525 84 log.cc:605] Check failed: ret>=0 && ret<=256  Run cmd failed: "/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/bin/nvcc" "/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32__Ty_float32__Tw_float32__JIT_1__JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.cc"      -std=c++14 -Xcompiler -fPIC  -Xcompiler -march=native  -Xcompiler -fdiagnostics-color=always  -lstdc++ -ldl -shared  -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/src" -I/home2/ykm2023/miniconda3/envs/dl/include/python3.9 -I/home2/ykm2023/miniconda3/envs/dl/include/python3.9 -DHAS_CUDA -DIS_CUDA -I"/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/include" -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/inc"  -lcudart -L"/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64"  -I"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86" -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86" -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default"  -l:"jit_utils_core.cpython-39-x86_64-linux-gnu".so  -l:"jittor_core.cpython-39-x86_64-linux-gnu".so  -x cu --cudart=shared -ccbin="/usr/bin/g++" --use_fast_math  -w  -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/inc"  -arch=compute_86  -code=sm_86  -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/cudnn/inc"  -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/cudnn/ops"  -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/inc" -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/cudnn/inc" -lcudnn -L"/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64"  -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/cuda" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/cuda" -l:libcuda_extern.so   -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/custom_ops" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/custom_ops" -l:"gen_ops_cudnn_rnn_backward_x_cudnn_conv_cudnn_test___hashddba11.cpython-39-x86_64-linux-gnu".so   -o "/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32__Ty_float32__Tw_float32__JIT_1__JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.so"

return 512. This might be an overcommit issue or out of memory. Try : sudo sysctl vm.overcommit_memory=1, or set enviroment variable `export DISABLE_MULTIPROCESSING=1`
**********
Async error was detected. To locate the async backtrace and get better error report, please rerun your code with two enviroment variables set:
>>> export JT_SYNC=1
>>> export trace_py_var=3


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/test/test_cudnn_op.py", line 150, in test_conv3d
    check((2,4,10,10,10), (5,4,3,3,3), (1,1,1), (1,1,1))
  File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/test/test_cudnn_op.py", line 142, in check
    jt.sync_all()
  File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/__init__.py", line 160, in __exit__
    setattr(flags, k, v)
RuntimeError: [f 0616 09:51:30.378681 84 executor.cc:686]
Execute fused operator(0/5) failed.
[JIT Source]: /home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32__Ty_float32__Tw_float32__JIT_1__JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.cc
[OP TYPE]: cudnn_conv3d
[Input]: float32[2,4,10,10,10,], float32[5,4,3,3,3,],
[Output]: float32[2,5,10,10,10,],
[Async Backtrace]: not found, please set env JT_SYNC=1, trace_py_var=3
[Reason]: [f 0616 09:51:30.378391 84 log.cc:605] Check failed: ret>=0 && ret<=256  Run cmd failed: "/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/bin/nvcc" "/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32__Ty_float32__Tw_float32__JIT_1__JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.cc"      -std=c++14 -Xcompiler -fPIC  -Xcompiler -march=native  -Xcompiler -fdiagnostics-color=always  -lstdc++ -ldl -shared  -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/src" -I/home2/ykm2023/miniconda3/envs/dl/include/python3.9 -I/home2/ykm2023/miniconda3/envs/dl/include/python3.9 -DHAS_CUDA -DIS_CUDA -I"/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/include" -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/inc"  -lcudart -L"/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64"  -I"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86" -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86" -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default"  -l:"jit_utils_core.cpython-39-x86_64-linux-gnu".so  -l:"jittor_core.cpython-39-x86_64-linux-gnu".so  -x cu --cudart=shared -ccbin="/usr/bin/g++" --use_fast_math  -w  -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/inc"  -arch=compute_86  -code=sm_86  -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/cudnn/inc"  -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/cudnn/ops"  -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/inc" -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/cudnn/inc" -lcudnn -L"/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64"  -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/cuda" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/cuda" -l:libcuda_extern.so   -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/custom_ops" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/custom_ops" -l:"gen_ops_cudnn_rnn_backward_x_cudnn_conv_cudnn_test___hashddba11.cpython-39-x86_64-linux-gnu".so   -o "/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32__Ty_float32__Tw_float32__JIT_1__JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.so"

return 512. This might be an overcommit issue or out of memory. Try : sudo sysctl vm.overcommit_memory=1, or set enviroment variable `export DISABLE_MULTIPROCESSING=1`
**********
Async error was detected. To locate the async backtrace and get better error report, please rerun your code with two enviroment variables set:
>>> export JT_SYNC=1
>>> export trace_py_var=3


======================================================================
ERROR: test_conv_transpose3d (__main__.TestCudnnConvOp)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/test/test_cudnn_op.py", line 184, in test_conv_transpose3d
    check((2,5,10,10,10), (5,4,3,3,3), (1,1,1), (1,1,1))
  File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/test/test_cudnn_op.py", line 168, in check
    y2 = jt.nn.conv_transpose3d(x, w, None, stride, padding, 0, group, dilation)
  File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/nn.py", line 1611, in conv_transpose3d
    if stride <= 0:
TypeError: '<=' not supported between instances of 'tuple' and 'int'

----------------------------------------------------------------------
Ran 5 tests in 2.355s

FAILED (errors=2)
@Exusial
Copy link
Contributor

Exusial commented Jul 8, 2024

可以尝试把gcc版本降到9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants