Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCC/CUDA compatibility table #526

Open
emjotde opened this issue Oct 29, 2019 · 20 comments
Open

GCC/CUDA compatibility table #526

emjotde opened this issue Oct 29, 2019 · 20 comments
Labels

Comments

@emjotde
Copy link
Member

emjotde commented Oct 29, 2019

Putting this here into an issue for later. Should push this tomorrow to fix the Jenkins issues. Gawd, that was horrible :) good FAQ item though. BUG are bugs in GCC or CUDA or combination, not in Marian and will not be fixed. Plain NO means CUDA does not support that particular compiler by design, this cannot be fixed either. In the case of gcc 4.9 and lower this is our choice to not support that compiler as its stdlib does not have full C++11 compatibility.

If you are in a "NO" cell, there is no way we can help you other than suggesting to upgrade/downgrade GCC and upgrade CUDA.

gcc\CUDA CPU-only 8.0 9.0 9.2 10.0 10.1 10.2 comment
<5.0 NO NO NO NO NO NO NO incompatible with c++11 features (codecvt)
5.4 YES YES YES YES YES YES YES Any version of gcc 5 smaller than 5.4 should be fine too
5.5 YES NO/BUG NO/BUG NO/BUG YES YES bugs in GCC and CUDA<9.2
6.5 YES NO NO/BUG YES YES YES bug in GCC and CUDA 9.0
7.4 YES NO NO YES YES YES
8.3 YES NO NO NO NO YES warnings in 3rd-party code
9.2 YES NO NO NO NO NO YES not yet supported by CUDA, warnings in 3rd-party code
  • TensorCore support for CUDA 9.0 and above (hardware dependent)
  • FP16 support (currently only inference) for CUDA 9.2 and above (hardware dependent)
  • CUDA 8.0 is no longer supported since Marian 1.9
@emjotde
Copy link
Member Author

emjotde commented Oct 29, 2019

On a happy note, current master doesn't need boost anymore, so this was more feasible as we now eliminated added complexities from various boost incompatibilities.

@XapaJIaMnu
Copy link
Contributor

GCC 9.2 Works with Cuda 10.2 Tested on my machine

Cheers,

Nick

@snukky
Copy link
Member

snukky commented Mar 25, 2020

I edited the original post adding that CUDA 8 is no longer supported.

@emjotde emjotde pinned this issue Apr 8, 2020
@snukky
Copy link
Member

snukky commented May 1, 2020

I added CUDA 10.2 into the table with GCC versions that I have had a chance to test.

@alvations
Copy link
Collaborator

In most things, these work but the CUSparse tensor operations are complaining at compilation.

  • Marian commit f496a42
  • gcc 9.3.0
  • nvcc 11.0.194
  • CUDA 11.0
[ 94%] Built target marian
Scanning dependencies of target marian_cuda
[ 95%] Building CXX object src/CMakeFiles/marian_cuda.dir/tensors/gpu/prod.cpp.o
/home/username/marian-dev/src/tensors/gpu/prod.cpp: In function ‘cusparseStatus_t marian::gpu::cusparseSgemmiEx(cusparseHandle_t, int, int, int, int, const float*, const float*, int, const float*, const int*, const int*, const float*, float*, int)’:
/home/username/marian-dev/src/tensors/gpu/prod.cpp:374:115: error: ‘cusparseStatus_t cusparseSgemmi(cusparseHandle_t, int, int, int, int, const float*, const float*, int, const float*, const int*, const int*, const float*, float*, int)’ is deprecated: please use cusparseSpMM instead [-Werror=deprecated-declaration]
  374 | handle, m, n1, k, nnz, alpha, A, lda, cscValB, cscColPtrB1, cscRowIndB, beta, C1, ldc);
      |                                                                                      ^

In file included from /home/username/marian-dev/src/tensors/gpu/prod.cpp:7:
/usr/local/cuda/include/cusparse.h:1458:1: note: declared here
 1458 | cusparseSgemmi(cusparseHandle_t handle,
      | ^~~~~~~~~~~~~~
/home/username/marian-dev/src/tensors/gpu/prod.cpp:374:115: error: ‘cusparseStatus_t cusparseSgemmi(cusparseHandle_t, int, int, int, int, const float*, const float*, int, const float*, const int*, const int*, const float*, float*, int)’ is deprecated: please use cusparseSpMM instead [-Werror=deprecated-declaration]
  374 | handle, m, n1, k, nnz, alpha, A, lda, cscValB, cscColPtrB1, cscRowIndB, beta, C1, ldc);
      |                                                                                      ^

In file included from /home/username/marian-dev/src/tensors/gpu/prod.cpp:7:
/usr/local/cuda/include/cusparse.h:1458:1: note: declared here
 1458 | cusparseSgemmi(cusparseHandle_t handle,
      | ^~~~~~~~~~~~~~
In file included from /home/username/marian-dev/src/tensors/gpu/backend.h:5,
                 from /home/username/marian-dev/src/tensors/gpu/prod.cpp:11:
/home/username/marian-dev/src/tensors/gpu/prod.cpp: In function ‘void marian::gpu::CSRProd(marian::Tensor, marian::Ptr<marian::Allocator>, const Tensor&, const Tensor&, const Tensor&, const Tensor&, bool, bool, float)’:
/home/username/marian-dev/src/tensors/gpu/prod.cpp:429:20: error: ‘cusparseScsr2csc’ was not declared in this scope; did you mean ‘cusparseScsr2csru’?
  429 |     CUSPARSE_CHECK(cusparseScsr2csc(cusparseHandle,
      |                    ^~~~~~~~~~~~~~~~
/home/username/marian-dev/src/tensors/gpu/cuda_helpers.h:39:26: note: in definition of macro ‘CUSPARSE_CHEC’
   39 |   cusparseStatus_t rc = (expr);                                                \
      |                          ^~~~
/home/username/marian-dev/src/tensors/gpu/prod.cpp:451:20: error: ‘cusparseScsrmm’ was not declared in this scope; did you mean ‘cusparseSbsrmm’?
  451 |     CUSPARSE_CHECK(cusparseScsrmm(cusparseHandle,
      |                    ^~~~~~~~~~~~~~
/home/username/marian-dev/src/tensors/gpu/cuda_helpers.h:39:26: note: in definition of macro ‘CUSPARSE_CHEC’
   39 |   cusparseStatus_t rc = (expr);                                                \
      |                          ^~~~
At global scope:
cc1plus: error: unrecognized command line option ‘-Wno-unknown-warning-option’ [-Werror]
cc1plus: all warnings being treated as errors
make[2]: *** [src/CMakeFiles/marian_cuda.dir/build.make:133: src/CMakeFiles/marian_cuda.dir/tensors/gpu/prod.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:213: src/CMakeFiles/marian_cuda.dir/all] Error 2
make: *** [Makefile:152: all] Error 2

Note: nvcc is complaining about these but it's not that critical (I think):

nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).

@emjotde
Copy link
Member Author

emjotde commented Jul 20, 2020

We will take a look. I am wonder if we should make compilation support for GPUs below Maxwell optional. Or even below Pascal? It would make compilation quite a bit faster. Currently that can be disabled manually.

@emjotde emjotde pinned this issue Jul 20, 2020
@XapaJIaMnu
Copy link
Contributor

XapaJIaMnu commented Jul 29, 2020

The compilation issue is that function cusparseScsr2csc is deprecated. According to the nvidia manual we should use cusparseCsr2cscEx2 instead which has slightly different interface. Reference: https://docs.nvidia.com/cuda/archive/10.1/pdf/CUSPARSE_Library.pdf

@emjotde , we should just generate the CUDA targets based on the CUDA version. E.G. if you are using CUDA11, don't target old architectures.

@emjotde
Copy link
Member Author

emjotde commented Aug 5, 2020

@XapaJIaMnu there is really no correlation between CUDA version and architectures though. I can run CUDA11 on Maxwell perfectly fine, I just cannot use some newer features, I think?

@XapaJIaMnu
Copy link
Contributor

@emjotde as far as I understand, and reading this and this, the nvcc compiler drops old target architectures with new releases, making it impossible to compile code that will run on a compute_20 GPU with CUDA 11. So, when we use that CUDA version, we should drop some of the architectures from the flags.

@emjotde
Copy link
Member Author

emjotde commented Aug 6, 2020

compute_20 is not supported, I think we start with _35. Need to check.

@david-waterworth
Copy link

david-waterworth commented Sep 22, 2020

I'm trying to build on Ubuntu 20.04, CUDA 11.0, GCC 9.3.0

In addition to the CUSparse tensor errors mentioned by @alvations, and the deprecated targets, I'm getting:

/usr/local/cuda/include/thrust/detail/config/cpp_dialect.h:104:13: error: Thrust requires C++14. Please pass -std=c++14 to your compiler. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-Werror]
104 | THRUST_COMPILER_DEPRECATION(C++14, pass -std=c++14 to your compiler);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

/usr/local/cuda/include/cub/util_cpp_dialect.cuh:129:13: error: CUB requires C++14. Please pass -std=c++14 to your compiler. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-Werror]
129 | CUB_COMPILER_DEPRECATION(C++14, pass -std=c++14 to your compiler);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I updated lines 169 and 304 of CMakeLists.txt (-std=c++11; -> -std=c++14;), this seems to get past this error.

Also removing -Werror didn't help with the CUSparse tensor deprecation warnings - I then started getting nvcc errors.

@graemenail graemenail unpinned this issue Feb 17, 2021
@graemenail graemenail pinned this issue Feb 17, 2021
@kirianguiller
Copy link

Hi everyone. Is there someone here who would know if CUDA 11.0 (or higher ?) is compatible with the latest version of marian (1.10) ? I have two 3090RTX but they require CUDA 11 at minima :/.

Thanks !

@emjotde
Copy link
Member Author

emjotde commented Apr 18, 2021

Hi, yes, we haven't updated the table yet, but should just work. At least with newer Ubuntu versions, like 18.04 and higher.

@emjotde
Copy link
Member Author

emjotde commented Apr 18, 2021

Do these GPUs have Ampere chips? If yes, then we are not using the newest bits for those yet, since I didn't have a chance to test things on those archs, but should arrive very soon. I have access to a couple of shiny A100s now.

@kirianguiller
Copy link

Wow, such a reactivity ! Yes, the RTX 3090 are Ampere arch. Also, I am on Ubuntu 18.04.

So you think it should work, but just not optimized yet ?

@emjotde
Copy link
Member Author

emjotde commented Apr 18, 2021

Yes. Exactly.

@kirianguiller
Copy link

Ok ! Thanks !

(I got multiple errors while compiling marian since this morning, but probably because I am switching between cuda/cudnn/gcc versions and it's creating a mess. If the error persist , I will maybe open an issue (if it's Marian related))

@alvations
Copy link
Collaborator

alvations commented May 22, 2022

Any updates on support for this configuration?

  • Ubuntu 20.04.4 LTS
  • CUDA Version: 11.4
  • gcc 9.4.0

Seems like there is a need to install cublasLt but not sure where to find it, tried installing with:

sudo apt -y install libcublaslt11

But cmake is still complaining =(


Also, normally people would put cuda in /usr/local/ or /usr/local/cuda, but this instance comes with:

$ sudo find /usr/ -name 'libcuda*'
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcudart.so.11.0
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcudart.so.11.6.55
/usr/lib/x86_64-linux-gnu/libcuda.so.510.60.02
/usr/share/lintian/overrides/libcudart11.0
/usr/share/doc/libcudart11.0

Any clues where should CUDA_TOOLKIT_ROOT_DIR point to?

BTW, this didn't work:

 cmake .. -DCUDA_TOOLKIT_ROOT_DIR=/usr/lib/x86_64-linux-gnu/

Note: This config above is the default config that comes with https://lambdalabs.com/service/gpu-cloud/pricing on machine with 1x A100

@alvations
Copy link
Collaborator

For anyone dealing with:

[ 28%] Building CXX object src/3rd_party/faiss/CMakeFiles/faiss.dir/VectorTransform.cpp.o
Generating rules                               > /home/paperspace/marian/build/local/obj/collectives/device/Makefile.rules
In file included from /usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:83,
                 from <command-line>:
/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported!
  138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!
      |  ^~~~~

Try:

sudo apt install gcc-8 g++-8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 80 --slave /usr/bin/g++ g++ /usr/bin/g++-8 --slave /usr/bin/gcov gcov /usr/bin/gcov-8

Then:

sudo update-alternatives --config gcc
sudo update-alternatives --config g++

And check:

gcc --version
g++ --version

@Jordan-Lambda
Copy link

Jordan-Lambda commented May 23, 2022

Any updates on support for this configuration?

  • Ubuntu 20.04.4 LTS
  • CUDA Version: 11.4
  • gcc 9.4.0

Seems like there is a need to install cublasLt but not sure where to find it, tried installing with:

sudo apt -y install libcublaslt11

But cmake is still complaining =(

Also, normally people would put cuda in /usr/local/ or /usr/local/cuda, but this instance comes with:

$ sudo find /usr/ -name 'libcuda*'
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcudart.so.11.0
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcudart.so.11.6.55
/usr/lib/x86_64-linux-gnu/libcuda.so.510.60.02
/usr/share/lintian/overrides/libcudart11.0
/usr/share/doc/libcudart11.0

Any clues where should CUDA_TOOLKIT_ROOT_DIR point to?

BTW, this didn't work:

 cmake .. -DCUDA_TOOLKIT_ROOT_DIR=/usr/lib/x86_64-linux-gnu/

Note: This config above is the default config that comes with https://lambdalabs.com/service/gpu-cloud/pricing on machine with 1x A100

Hi,

I'm a Linux Support Engineer for Lambda Cloud. I may be missing something important here, but I was able to successfully build marian (without testing the resulting executable yet) on a fresh 1x A6000 instance from Lambda Cloud by following the instructions at https://marian-nmt.github.io/docs/ . Since CUDA is installed via Lambda Stack, it's not installed in /usr/local/ like a "non-default" install would be, and so you shouldn't need to follow the "Non-default CUDA" instructions.

I did run into the problem with cuBLASLt not being found by cmake when building a dynamically linked executable, but building a staticly linked executable seems to work. Here's a script confirmed to work on a fresh 1x A6000 instance on May 23rd 2022. (We're pushing out a new VM image later today, I'll try to double check that this works with the new image as well. If it doesn't, I'll try to post an updated version that does).

#!/bin/bash

# Build marian following the instructions at https://marian-nmt.github.io/docs/
# Confirmed to work on a fesh 1x A6000 Lambda Cloud instance.

set -x # Show all commands as they are run

sudo apt update

sudo apt install git cmake build-essential libboost-system-dev libprotobuf17 protobuf-compiler libprotobuf-dev openssl libssl-dev libgoogle-perftools-dev

git clone https://github.com/marian-nmt/marian
mkdir marian/build
cd marian/build

# We shouldn't need to build libraries statically, but this is an example given
# in the instructions, and it works. Without -DUSE_STATIC_LIBS=on , build fails.
# If a statically linked executable is a problem, I'm happy to look into this
# further.
cmake .. -DUSE_STATIC_LIBS=on
make -j20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants