Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with new CUDA version #11

Open
TNemes-3141 opened this issue Jun 17, 2020 · 8 comments
Open

Issues with new CUDA version #11

TNemes-3141 opened this issue Jun 17, 2020 · 8 comments

Comments

@TNemes-3141
Copy link

Hello,
First of all: Many thanks for making this modifications, it fixed a whole lot of my problems with installing Torch so far! In the install process though, I came across an error that is likely due to the new version of CUDA. Recently, CUDA 11 came out and I tried to build with it, the following error appears:

/home/tamas/torch/extra/cutorch/init.c: In function ‘cutorch_isManagedPtr’:
/home/tamas/torch/extra/cutorch/init.c:938:34: error: ‘struct cudaPointerAttributes’ has no member named ‘isManaged’
  938 |     lua_pushboolean(L, attributes.isManaged);
      |                                  ^
make[2]: *** [CMakeFiles/cutorch.dir/build.make:80: CMakeFiles/cutorch.dir/init.c.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:115: CMakeFiles/cutorch.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

Error: Build error: Failed building.

It seems that the used attribute is deprecated and no longer supported (see https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaPointerAttributes.html#structcudaPointerAttributes). Is there a chance you can fix this or am I forced to switch to CUDA 10.1?

@nagadomi
Copy link
Owner

nagadomi commented Jun 17, 2020

I haven't tried building with CUDA11 yet.
Maybe the error can be fixed with the following changes. Also probably this function(cutorch.isManaged, cutorch.toCudaUVATensor and cutorch.toFloatUVATensor) is not called from any program.

diff --git a/init.c b/init.c
index 8b32a1a..a2307bb 100644
--- a/init.c
+++ b/init.c
@@ -935,7 +935,7 @@ static int cutorch_isManagedPtr(lua_State *L)
     lua_pushboolean(L, 0);
   } else {
     THCudaCheck(res);
-    lua_pushboolean(L, attributes.isManaged);
+    lua_pushboolean(L, attributes.type == cudaMemoryTypeManaged);
   }
   return 1;
 }

@TNemes-3141
Copy link
Author

Hello,
I tried your solution, and it seems to make it work, however, a new error showed up. I suppose it doesn't have to do with the fix you provided, but it would have been there otherwise.

/home/tamas/torch/extra/cunn/lib/THCUNN/generic/SparseLinear.cu(95): error: identifier "cusparseScsrmm" is undefined

/home/tamas/torch/extra/cunn/lib/THCUNN/generic/SparseLinear.cu(194): error: identifier "cusparseScsrmm" is undefined

/home/tamas/torch/extra/cunn/lib/THCUNN/generic/SparseLinear.cu(97): error: identifier "cusparseDcsrmm" is undefined

/home/tamas/torch/extra/cunn/lib/THCUNN/generic/SparseLinear.cu(196): error: identifier "cusparseDcsrmm" is undefined

4 errors detected in the compilation of "/home/tamas/torch/extra/cunn/lib/THCUNN/SparseLinear.cu".
CMake Error at THCUNN_generated_SparseLinear.cu.o.cmake:267 (message):
  Error generating file
  /home/tamas/torch/extra/cunn/build/lib/THCUNN/CMakeFiles/THCUNN.dir//./THCUNN_generated_SparseLinear.cu.o


make[2]: *** [lib/THCUNN/CMakeFiles/THCUNN.dir/build.make:268: lib/THCUNN/CMakeFiles/THCUNN.dir/THCUNN_generated_SparseLinear.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:111: lib/THCUNN/CMakeFiles/THCUNN.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

Error: Build error: Failed building.

Any ideas of what I can do?

@JackGarbiec
Copy link

Nvidia deprecated those functions in 11 release, they recommend using a different one in the docs, but it has different arguments and would require messing with the matrices in that file to make them fit, does anyone know what this THCUNN lib even is?

@nagadomi
Copy link
Owner

nagadomi commented Dec 15, 2020

It is linear module for sparse matrix format.
I haven't used sparse matrix in torch7. So I can fix it, but I'm not confident to test it.
If you're not using it, I think the easiest solution is to remove it from the library.

@mw66
Copy link

mw66 commented Oct 5, 2021

If you're not using it, I think the easiest solution is to remove it from the library.

How to remove? you mean remove the whole dir extra/cunn ?

I have tried just rename these 2 files:

$ mv extra/cunn/lib/THCUNN/generic/SparseLinear.cu  extra/cunn/lib/THCUNN/generic/SparseLinear.cu.orig
$ mv extra/cunn/lib/THCUNN/SparseLinear.cu extra/cunn/lib/THCUNN/SparseLinear.cu.orig

Torch complains about some undefined symbol (e.g. THNN_CudaSparseLinear_updateOutput), but otherwise seems working, as long as your code does not call any functions in these files.

@nagadomi
Copy link
Owner

nagadomi commented Oct 6, 2021

How to remove? you mean remove the whole dir extra/cunn ?

No, it only removes functions related to sparse matrix where CUDA is used. However, I have not tried it.
On Ubuntu 21.04, qt4 is also removed and there is no ppa package.
I think it is better to use the Docker version (Ubuntu 18.04 and CUDA 10).

@bluedevils23
Copy link

RTX30 series card only support CUDA11, so we cannot run torch on latest card now.

@kadok
Copy link

kadok commented Jan 3, 2022

How to remove? you mean remove the whole dir extra/cunn ?

No, it only removes functions related to sparse matrix where CUDA is used. However, I have not tried it. On Ubuntu 21.04, qt4 is also removed and there is no ppa package. I think it is better to use the Docker version (Ubuntu 18.04 and CUDA 10).

Hello,

I tried to follow this approach and everything works fine.

Just comment these lines in SparserLinear.cu:
Line 94

/*#ifdef THC_REAL_IS_FLOAT
  cusparseScsrmm(cusparse_handle,
  #elif defined(THC_REAL_IS_DOUBLE)
  cusparseDcsrmm(cusparse_handle,
  #endif
      CUSPARSE_OPERATION_NON_TRANSPOSE,
      batchnum, outDim, inDim, nnz,
      &one,
      descr,
      THCTensor_(data)(state, values),
      THCudaIntTensor_data(state, csrPtrs),
      THCudaIntTensor_data(state, colInds),
      THCTensor_(data)(state, weight), inDim,
      &one, THCTensor_(data)(state, buffer), batchnum
  );*/

Line 193

 /*#ifdef THC_REAL_IS_FLOAT
  cusparseScsrmm(cusparse_handle,
  #elif defined(THC_REAL_IS_DOUBLE)
  cusparseDcsrmm(cusparse_handle,
  #endif
      CUSPARSE_OPERATION_NON_TRANSPOSE,
      inDim, outDim, batchnum, nnz,
      &one,
      descr,
      THCTensor_(data)(state, values),
      THCudaIntTensor_data(state, colPtrs),
      THCudaIntTensor_data(state, rowInds),
      THCTensor_(data)(state, buf), batchnum,
      &one, THCTensor_(data)(state, gradWeight), inDim
  );*/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants