Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will not compile on GCC 11.1.0, CUDA 11.3 #34

Open
norabelrose opened this issue Feb 2, 2022 · 6 comments
Open

Will not compile on GCC 11.1.0, CUDA 11.3 #34

norabelrose opened this issue Feb 2, 2022 · 6 comments

Comments

@norabelrose
Copy link

I've been trying to build this package for the last two nights to no avail. Every time I run python setup.py install, I get a big wall of compiler warnings indicating that it's not allowed to call __host__ functions from __host__ __device__ functions, followed by a few errors:

/home/nora/Code/cule/third_party/agency/agency/cuda/execution/execution_policy/grid_execution_policy.hpp:35:100:   required from here
/home/nora/Code/cule/third_party/agency/agency/detail/operator_traits.hpp:92:88: error: no match for ‘operator*’ (operand types are ‘agency::point<unsigned int, 2>’ and ‘unsigned int’)
   92 | struct has_operator_multiplies
      |                                                                                        ^                      
/home/nora/Code/cule/third_party/agency/agency/detail/tuple/arithmetic_tuple_facade.hpp:278:1: note: candidate: ‘template<class ArithmeticTuple, class> Derived agency::detail::arithmetic_tuple_facade<Derived>::operator*(const ArithmeticTuple&) const [with ArithmeticTuple = ArithmeticTuple; <template-parameter-2-2> = <template-parameter-1-2>; Derived = agency::point<unsigned int, 2>]’
  278 |   Derived operator*(const ArithmeticTuple& rhs) const
      | ^ ~~~~~~
/home/nora/Code/cule/third_party/agency/agency/detail/tuple/arithmetic_tuple_facade.hpp:278:1: note:   template argument deduction/substitution failed:
/home/nora/Code/cule/third_party/agency/agency/detail/tuple/arithmetic_tuple_facade.hpp:158:63: error: incomplete type ‘std::tuple_size<unsigned int>’ used in nested name specifier
  158 |              class = typename std::enable_if<
      |                                                               ^                                           
/home/nora/Code/cule/third_party/agency/agency/coordinate/point.hpp:197:1: note: candidate: ‘template<class T1, class T2, long unsigned int Rank> typename std::enable_if<(std::is_arithmetic<_Tp>::value && agency::detail::has_operator_multiplies<T1, T2>::value), agency::point<T, Rank> >::type agency::operator*(T1, const agency::point<T, Rank>&)’
  197 |   operator*(T1 val, const point<T2,Rank>& p)
      | ^ ~~~~~~
/home/nora/Code/cule/third_party/agency/agency/coordinate/point.hpp:197:1: note:   template argument deduction/substitution failed:
/home/nora/Code/cule/third_party/agency/agency/detail/operator_traits.hpp:92:88: note:   mismatched types ‘const agency::point<T, Rank>’ and ‘unsigned int’
   92 | struct has_operator_multiplies
      |                                                                                        ^                      

GCC indicates that this invalid template instantiation is required from torchcule/backend.cu:44:21, although the chain of dependencies linking that line of code to the final error is way too long and complex for me to understand. I've attached the entire stderr & stdout output from the compiler to this post. Any help toward solving this issue would be greatly appreciated.
compile-errors.txt

@ViktorM
Copy link

ViktorM commented Mar 24, 2022

@sdalton1, @ifrosio do you have any solution to this issue? And in general how to run CuLE with the latest PyTorch?

@Rohan138
Copy link

Rohan138 commented Jun 27, 2022

Hi there-I got it to work on my laptop (GTX 1650 Ti, CUDA 11.3, PyTorch 1.11.0) by fixing the following lines in setup.py:

codes = [arch[-2:] for arch in gpus]
arch_gencode = ['-arch=sm_' + codes[0]] + ['-gencode=arch=compute_{0},code=sm_{0}'.format(code) for code in codes]

You might also want to run it with python setup.py install --fastbuild to reduce the build time.

@ViktorM
Copy link

ViktorM commented Aug 13, 2022

@ifrosio, @sdalton1 any updates on the issue? On how to build and run CuLE on Amper GPUs?

@Denys88
Copy link

Denys88 commented Aug 13, 2022

I got more errors:

/usr/local/cuda/bin/nvcc -I/home/denys/Documents/git/ml/cule -I/home/denys/Documents/git/ml/cule/third_party/agency -I/home/denys/Documents/git/ml/cule/third_party/pybind11/include -I/usr/local/cuda/include -I/home/denys/anaconda3/envs/rlgpu/include/python3.7m -c torchcule/backend.cu -o build/temp.linux-x86_64-cpython-37/torchcule/backend.o -arch=sm_70 -gencode=arch=compute_70,code=sm_70 -O3 -Xptxas=-v -Xcompiler=-Wall,-Wextra,-fPIC -allow-unsupported-compiler -ccbin=gcc
/usr/include/stdio.h(189): error: attribute "__malloc__" does not take arguments

/usr/include/stdio.h(201): error: attribute "__malloc__" does not take arguments

/usr/include/stdio.h(223): error: attribute "__malloc__" does not take arguments

@sdalton1
Copy link
Contributor

I can't reproduce this error on my machine. I am compiling on Ubuntu 20.04.4, torch 1.12.0, gcc 9.4.0 and the cule main branch. I tried recompiling using an older version the cuda toolkit, version 11.3, from the dockerfile but that also worked on my machine. If anyone has a Dockerfile to generate the failure with the configured software that would help a lot.

@Denys88
Copy link

Denys88 commented Aug 13, 2022

Thanks @sdalton1 I just installed the latest ubuntu ( 22.04 I think). Looks like it is related to the wrong gcc version.
Will try to solve it using this link: https://linuxconfig.org/how-to-switch-between-multiple-gcc-and-g-compiler-versions-on-ubuntu-20-04-lts-focal-fossa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants