Issues with new CUDA version #11

TNemes-3141 · 2020-06-17T17:03:03Z

Hello,
First of all: Many thanks for making this modifications, it fixed a whole lot of my problems with installing Torch so far! In the install process though, I came across an error that is likely due to the new version of CUDA. Recently, CUDA 11 came out and I tried to build with it, the following error appears:

/home/tamas/torch/extra/cutorch/init.c: In function ‘cutorch_isManagedPtr’:
/home/tamas/torch/extra/cutorch/init.c:938:34: error: ‘struct cudaPointerAttributes’ has no member named ‘isManaged’
  938 |     lua_pushboolean(L, attributes.isManaged);
      |                                  ^
make[2]: *** [CMakeFiles/cutorch.dir/build.make:80: CMakeFiles/cutorch.dir/init.c.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:115: CMakeFiles/cutorch.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

Error: Build error: Failed building.

It seems that the used attribute is deprecated and no longer supported (see https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaPointerAttributes.html#structcudaPointerAttributes). Is there a chance you can fix this or am I forced to switch to CUDA 10.1?

The text was updated successfully, but these errors were encountered:

nagadomi · 2020-06-17T23:55:45Z

I haven't tried building with CUDA11 yet.
Maybe the error can be fixed with the following changes. Also probably this function(cutorch.isManaged, cutorch.toCudaUVATensor and cutorch.toFloatUVATensor) is not called from any program.

diff --git a/init.c b/init.c
index 8b32a1a..a2307bb 100644
--- a/init.c
+++ b/init.c
@@ -935,7 +935,7 @@ static int cutorch_isManagedPtr(lua_State *L)
     lua_pushboolean(L, 0);
   } else {
     THCudaCheck(res);
-    lua_pushboolean(L, attributes.isManaged);
+    lua_pushboolean(L, attributes.type == cudaMemoryTypeManaged);
   }
   return 1;
 }

TNemes-3141 · 2020-06-18T13:05:00Z

Hello,
I tried your solution, and it seems to make it work, however, a new error showed up. I suppose it doesn't have to do with the fix you provided, but it would have been there otherwise.

/home/tamas/torch/extra/cunn/lib/THCUNN/generic/SparseLinear.cu(95): error: identifier "cusparseScsrmm" is undefined

/home/tamas/torch/extra/cunn/lib/THCUNN/generic/SparseLinear.cu(194): error: identifier "cusparseScsrmm" is undefined

/home/tamas/torch/extra/cunn/lib/THCUNN/generic/SparseLinear.cu(97): error: identifier "cusparseDcsrmm" is undefined

/home/tamas/torch/extra/cunn/lib/THCUNN/generic/SparseLinear.cu(196): error: identifier "cusparseDcsrmm" is undefined

4 errors detected in the compilation of "/home/tamas/torch/extra/cunn/lib/THCUNN/SparseLinear.cu".
CMake Error at THCUNN_generated_SparseLinear.cu.o.cmake:267 (message):
  Error generating file
  /home/tamas/torch/extra/cunn/build/lib/THCUNN/CMakeFiles/THCUNN.dir//./THCUNN_generated_SparseLinear.cu.o


make[2]: *** [lib/THCUNN/CMakeFiles/THCUNN.dir/build.make:268: lib/THCUNN/CMakeFiles/THCUNN.dir/THCUNN_generated_SparseLinear.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:111: lib/THCUNN/CMakeFiles/THCUNN.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

Error: Build error: Failed building.

Any ideas of what I can do?

JackGarbiec · 2020-12-14T19:47:39Z

Nvidia deprecated those functions in 11 release, they recommend using a different one in the docs, but it has different arguments and would require messing with the matrices in that file to make them fit, does anyone know what this THCUNN lib even is?

nagadomi · 2020-12-15T07:34:00Z

It is linear module for sparse matrix format.
I haven't used sparse matrix in torch7. So I can fix it, but I'm not confident to test it.
If you're not using it, I think the easiest solution is to remove it from the library.

mw66 · 2021-10-05T06:48:33Z

If you're not using it, I think the easiest solution is to remove it from the library.

How to remove? you mean remove the whole dir extra/cunn ?

I have tried just rename these 2 files:

$ mv extra/cunn/lib/THCUNN/generic/SparseLinear.cu  extra/cunn/lib/THCUNN/generic/SparseLinear.cu.orig
$ mv extra/cunn/lib/THCUNN/SparseLinear.cu extra/cunn/lib/THCUNN/SparseLinear.cu.orig

Torch complains about some undefined symbol (e.g. THNN_CudaSparseLinear_updateOutput), but otherwise seems working, as long as your code does not call any functions in these files.

nagadomi · 2021-10-06T10:30:53Z

How to remove? you mean remove the whole dir extra/cunn ?

No, it only removes functions related to sparse matrix where CUDA is used. However, I have not tried it.
On Ubuntu 21.04, qt4 is also removed and there is no ppa package.
I think it is better to use the Docker version (Ubuntu 18.04 and CUDA 10).

bluedevils23 · 2021-11-06T07:48:29Z

RTX30 series card only support CUDA11, so we cannot run torch on latest card now.

kadok · 2022-01-03T16:25:43Z

How to remove? you mean remove the whole dir extra/cunn ?

No, it only removes functions related to sparse matrix where CUDA is used. However, I have not tried it. On Ubuntu 21.04, qt4 is also removed and there is no ppa package. I think it is better to use the Docker version (Ubuntu 18.04 and CUDA 10).

Hello,

I tried to follow this approach and everything works fine.

Just comment these lines in SparserLinear.cu:
Line 94

/*#ifdef THC_REAL_IS_FLOAT
  cusparseScsrmm(cusparse_handle,
  #elif defined(THC_REAL_IS_DOUBLE)
  cusparseDcsrmm(cusparse_handle,
  #endif
      CUSPARSE_OPERATION_NON_TRANSPOSE,
      batchnum, outDim, inDim, nnz,
      &one,
      descr,
      THCTensor_(data)(state, values),
      THCudaIntTensor_data(state, csrPtrs),
      THCudaIntTensor_data(state, colInds),
      THCTensor_(data)(state, weight), inDim,
      &one, THCTensor_(data)(state, buffer), batchnum
  );*/

Line 193

 /*#ifdef THC_REAL_IS_FLOAT
  cusparseScsrmm(cusparse_handle,
  #elif defined(THC_REAL_IS_DOUBLE)
  cusparseDcsrmm(cusparse_handle,
  #endif
      CUSPARSE_OPERATION_NON_TRANSPOSE,
      inDim, outDim, batchnum, nnz,
      &one,
      descr,
      THCTensor_(data)(state, values),
      THCudaIntTensor_data(state, colPtrs),
      THCudaIntTensor_data(state, rowInds),
      THCTensor_(data)(state, buf), batchnum,
      &one, THCTensor_(data)(state, gradWeight), inDim
  );*/

nagadomi mentioned this issue Mar 26, 2021

Google Colab and CUDA 11.0 #14

Open

mw66 mentioned this issue Jan 14, 2022

Cutorch doesn't build with the newest cuda. (cudaPointerAttributes) torch/cutorch#848

Open

nagadomi mentioned this issue May 5, 2022

Install Fails when Installing Cuda Packages #16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with new CUDA version #11

Issues with new CUDA version #11

TNemes-3141 commented Jun 17, 2020

nagadomi commented Jun 17, 2020 •

edited

Loading

TNemes-3141 commented Jun 18, 2020

JackGarbiec commented Dec 14, 2020

nagadomi commented Dec 15, 2020 •

edited

Loading

mw66 commented Oct 5, 2021 •

edited

Loading

nagadomi commented Oct 6, 2021 •

edited

Loading

bluedevils23 commented Nov 6, 2021

kadok commented Jan 3, 2022 •

edited

Loading

Issues with new CUDA version #11

Issues with new CUDA version #11

Comments

TNemes-3141 commented Jun 17, 2020

nagadomi commented Jun 17, 2020 • edited Loading

TNemes-3141 commented Jun 18, 2020

JackGarbiec commented Dec 14, 2020

nagadomi commented Dec 15, 2020 • edited Loading

mw66 commented Oct 5, 2021 • edited Loading

nagadomi commented Oct 6, 2021 • edited Loading

bluedevils23 commented Nov 6, 2021

kadok commented Jan 3, 2022 • edited Loading

nagadomi commented Jun 17, 2020 •

edited

Loading

nagadomi commented Dec 15, 2020 •

edited

Loading

mw66 commented Oct 5, 2021 •

edited

Loading

nagadomi commented Oct 6, 2021 •

edited

Loading

kadok commented Jan 3, 2022 •

edited

Loading