-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Build] Cuda Failure 716:misaligned address when building onnxruntime with Cuda #15981
Comments
I observed the same error too, on an A10 machine with CUDA 11.6 and VS 2019. |
Were you building the code from the main branch? |
@snnn I assume that release build 1.14.1 does not have this problem? and yes, I am building code from the main branch. |
I just noticed it last month, haven't find the root cause yet. It happens on some hardware with some GPU driver versions. |
@snnn I am on 531.79 with 3060ti, with Cuda 11.8 toolkit and Cudnn version 8.9.0. I have included below Dxdiag log although it might be useless. |
I talked to @souptc offline. He will take a look when he finishes his current work on hand. |
Full log: |
@ninjatall12 I looked into this error on A10. Please check your environment.
|
|
I found the test that was causing problem is FusedMatMulOpTest.FloatTypeTransposeBatch
It was added in PR #9734 . |
I think I found the root cause. It's because of the CublasMathModeSetter Our team's build service doesn't have access to A-series GPUs due to GPU shortage. We only tested it on T4 and M60 GPUs. |
Fixed in ONNX Runtime 1.15.1 release. |
Describe the issue
I try to build Onnxruntime with Cuda 11.8, the binaries for cudnn are placed inside the 11.8 folder so cudnn is not an issue. I have tried changing the cudnn version and checked the cuda version and it is compatible with my GPU and Onnxruntime but i seem to get this issue. My GPU is 3060ti for anyone wondering and i am on the latest drivers.
Urgency
No response
Target platform
Windows 11
Build script
build.bat --config Release --use_cuda --cuda_version 11.8 --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8" --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8"
Error / output
1: [ FAILED ] MLOpTest.TreeRegressorMultiTargetBatchTreeE2 (0 ms)
1: [ RUN ] MLOpTest.TreeRegressorMultiTargetAverage
1: D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_call.cc:121 onnxruntime::CudaCall D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_call.cc:114 onnxruntime::CudaCall CUDA failure 716: misaligned address ; GPU=0 ; hostname=WIN-QHBHHD67V51 ; file=D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_execution_provider.cc ; line=241 ; expr=cudaDeviceSynchronize();
1:
1:
1: Provider:CUDAExecutionProvider
1: unknown file: error: C++ exception with description "D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_call.cc:121 onnxruntime::CudaCall D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_call.cc:114 onnxruntime::CudaCall CUDA failure 716: misaligned address ; GPU=0 ; hostname=WIN-QHBHHD67V51 ; file=D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_execution_provider.cc ; line=241 ; expr=cudaDeviceSynchronize();
1:
The following tests FAILED:
1 - onnxruntime_test_all (Failed)
Errors while running CTest
Output from these tests are in: D:/onnxruntime/build/Windows/Release/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
Traceback (most recent call last):
File "D:\onnxruntime\tools\ci_build\build.py", line 2601, in
sys.exit(main())
File "D:\onnxruntime\tools\ci_build\build.py", line 2504, in main
run_onnxruntime_tests(args, source_dir, ctest_path, build_dir, configs)
File "D:\onnxruntime\tools\ci_build\build.py", line 1744, in run_onnxruntime_tests
run_subprocess(ctest_cmd, cwd=cwd, dll_path=dll_path)
File "D:\onnxruntime\tools\ci_build\build.py", line 780, in run_subprocess
return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
File "D:\onnxruntime\tools\python\util\run.py", line 49, in run
completed_process = subprocess.run(
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['C:\Program Files\CMake\bin\ctest.EXE', '--build-config', 'Release', '--verbose', '--timeout', '10800']' returned non-zero exit status 8.
Visual Studio Version
Visual Studio 2022
GCC / Compiler Version
No response
The text was updated successfully, but these errors were encountered: