This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Support building source against CUDA 12.1 #21190
Labels
Comments
Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. |
Hi same here [ 4%] Building CXX object CMakeFiles/mxnet.dir/src/api/cached_op_api.cc.o
/usr/bin/c++ -DDMLC_CORE_USE_CMAKE -DDMLC_LOG_FATAL_THROW=1 -DDMLC_LOG_STACK_TRACE_SIZE=0 -DDMLC_MODERN_THREAD_LOCAL=0 -DDMLC_STRICT_CXX11 -DDMLC_USE_CXX11 -DDMLC_USE_CXX11=1 -DDMLC_USE_CXX14 -DMSHADOW_FORCE_STREAM -DMSHADOW_INT64_TENSOR_SIZE=1 -DMSHADOW_IN_CXX11 -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_CUDA=1 -DMSHADOW_USE_CUDNN -DMSHADOW_USE_CUTENSOR -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_SSE -DMXNET_BRANCH=\"master\" -DMXNET_COMMIT_HASH=\"b84609d3fc73d20929c114eab95faaa56e6c5ede\" -DMXNET_USE_BLAS_OPEN=1 -DMXNET_USE_CUDA=1 -DMXNET_USE_INTGEMM=1 -DMXNET_USE_LAPACK=1 -DMXNET_USE_LAPACKE_INTERFACE=1 -DMXNET_USE_LIBJPEG_TURBO=1 -DMXNET_USE_NCCL=1 -DMXNET_USE_NVTX=1 -DMXNET_USE_OPENCV=1 -DMXNET_USE_OPENMP=1 -DMXNET_USE_OPERATOR_TUNING=1 -DMXNET_USE_SIGNAL_HANDLER=1 -DNDEBUG=1 -DUSE_CUDNN -DUSE_CUTENSOR -D__USE_XOPEN2K8 -Dmxnet_EXPORTS -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/tvm/nnvm/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/tvm/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/dmlc-core/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/dlpack/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/mshadow -I/tmp/makepkg/sl1-mxnet-git/src/build/3rdparty/intgemm -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/intgemm -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/miniz -I/tmp/makepkg/sl1-mxnet-git/src/build/3rdparty/dmlc-core/include -isystem /usr/include/opencv4 -isystem /opt/cuda/include -march=native -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -fdiagnostics-color=always -march=native -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -Wall -Wno-sign-compare -O3 -fopenmp -O3 -DNDEBUG -std=gnu++17 -fPIC -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -mf16c -fopenmp -MD -MT CMakeFiles/mxnet.dir/src/api/cached_op_api.cc.o -MF CMakeFiles/mxnet.dir/src/api/cached_op_api.cc.o.d -o CMakeFiles/mxnet.dir/src/api/cached_op_api.cc.o -c /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/cached_op_api.cc
In file included from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/../imperative/./imperative_utils.h:31,
from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/../imperative/cached_op.h:34,
from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/cached_op_api.cc:27:
/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/../imperative/././cuda_graphs.h: In member function 'void mxnet::cuda_graphs::CudaGraphsSubSegExec::Update(const std::vector<std::shared_ptr<mxnet::exec::OpExecutor> >&, const mxnet::RunContext&, bool, bool)':
/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/../imperative/././cuda_graphs.h:205:62: error: cannot convert 'CUgraphNode_st**' to 'cudaGraphExecUpdateResultInfo*' {aka 'cudaGraphExecUpdateResultInfo_st*'}
205 | cudaGraphExecUpdate(graph_exec_.get(), graph_.get(), &error_node, &update_result);
| ^~~~~~~~~~~
| |
| CUgraphNode_st**
In file included from /opt/cuda/include/channel_descriptor.h:61,
from /opt/cuda/include/cuda_runtime.h:95,
from /opt/cuda/include/curand.h:59,
from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mshadow/./base.h:195,
from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mshadow/tensor.h:34,
from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mxnet/./base.h:32,
from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mxnet/ndarray.h:39,
from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mxnet/runtime/ndarray_handle.h:26,
from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mxnet/runtime/packed_func.h:34,
from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mxnet/runtime/registry.h:49,
from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mxnet/api_registry.h:31,
from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/cached_op_api.cc:24:
/opt/cuda/include/cuda_runtime_api.h:12423:138: note: initializing argument 3 of 'cudaError_t cudaGraphExecUpdate(cudaGraphExec_t, cudaGraph_t, cudaGraphExecUpdateResultInfo*)'
12423 | extern __host__ cudaError_t CUDARTAPI cudaGraphExecUpdate(cudaGraphExec_t hGraphExec, cudaGraph_t hGraph, cudaGraphExecUpdateResultInfo *resultInfo);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
make[2]: *** [CMakeFiles/mxnet.dir/build.make:90: CMakeFiles/mxnet.dir/src/api/cached_op_api.cc.o] Error 1 |
This is blocking some work we are trying to do, downgrading to CUDA 10 isnt possible - any idea when MX net for CUDA 12 will be available? |
I'm facing the same issue as @kevnzhao Using cuda 12 version.
|
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Description
(A clear and concise description of what the bug is.)
CUDA Toolkit 12.x is released last month. This is a major version so there are API breaking changes.
When building MXNET against CUDA 12.1, the build failed. Error message is pasted in below section.
Error Message
(Paste the complete error message. Please also include stack trace by setting environment variable
DMLC_LOG_STACK_TRACE_DEPTH=100
before running your script.)To Reproduce
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
Steps to reproduce
(Paste the commands you ran that produced the error.)
Install CUDA Toolkit 12.1 in the build machine.
Build with below commands.
What have you tried to solve it?
This looks like caused below API breaking change. Code changes are needed to support CUDA 12.x.
Environment
We recommend using our script for collecting the diagnostic information with the following command
curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3
Environment Information
The text was updated successfully, but these errors were encountered: