Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An illegal memory access was encountered in Propagation.cu at line 1117 #53

Open
Eming404 opened this issue Jun 9, 2024 · 9 comments
Open

Comments

@Eming404
Copy link

Eming404 commented Jun 9, 2024

Thanks for your amazing works!
I have some troubles in the environment.
There was no error during installing the modules. But when I run demo.sh in youtube01, the error occurred and the training progress was terminated:

Training progress:   3%|?                    | 1010/30000 [00:57<25:22, 19.04it/s, Loss=0.2464046]
an illegal memory access was encountered in Propagation.cu at line 1117

Then I debug the code and found the error was raised in the line 436 of utils/graphics_utils.py:

results = propagate(images, intrinsics, poses, depth, normal, depth_intervals, patch_size)

I guess there are some wrong configurations when I build the "gaussianpro" module.

Here is my configs:

I followed the readme to install those modules:

conda env create --file environment.yml
conda activate gaussianpro
pip install ./submodules/Propagation

I modified the submodules/Propagation/setup.py as bellow:

setup(
    name='gaussianpro',
    ext_modules=[
        CUDAExtension('gaussianpro',
            include_dirs=['/usr/local/include/opencv4', '/usr/local/cuda11.6/include', '.'],
            library_dirs=['/root/miniconda3/envs/gaussianpro/lib'],  
            libraries=['opencv_core', 'opencv_imgproc', 'opencv_highgui', 'opencv_imgcodecs'],  
            sources=[
                'PatchMatch.cpp', 
                'Propagation.cu',
                'pro.cpp'
            ],
            extra_compile_args={
                'cxx': ['-O3'],
                'nvcc': ['-O3',
                    '-gencode=arch=compute_80,code=sm_80',
                ]
            }),
    ],
    cmdclass={ 'build_ext' : BuildExtension }
)
  • My GPU is A100. So the GPU compute architecture is replaced as "80".
  • There is no '/root/miniconda3/envs/gaussianpro/include/opencv4' like your setup.py in my environment, even I have installed opencv-python with pip. So the opencv path in "include_dirs" is the C++ libs of my opencv, where I can find the "libraries"(e.g. libopencv_core).

It is strange that the building progress could not find my opencv and torch, so I add the path in my ~/.bashrc:

export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/root/miniconda3/envs/gaussianpro/lib/python3.7/site-packages/torch/lib:$LD_LIBRARY_PATH

Then the gaussianpro was installed successfully.

I think there must be some error when I build the gaussianpro but I cannot fix it.
Any help on this would be greatly appreciated.

@flybiubiu
Copy link

Same problem

from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension

import os.path as osp
ROOT = osp.dirname(osp.abspath(file))

setup(
name='gaussianpro',
ext_modules=[
CUDAExtension('gaussianpro',
include_dirs=['/home/ubuntu/anaconda3/envs/gaussianpro/include/opencv4', '/usr/local/cuda-11.3/include', '.'],
library_dirs=['/home/ubuntu/anaconda3/envs/gaussianpro/lib'],
libraries=['opencv_core', 'opencv_imgproc', 'opencv_highgui', 'opencv_imgcodecs'],
sources=[
'PatchMatch.cpp',
'Propagation.cu',
'pro.cpp'
],
extra_compile_args={
'cxx': ['-O3'],
'nvcc': ['-O3',
'-gencode=arch=compute_75,code=sm_75',
]
}),
],
cmdclass={ 'build_ext' : BuildExtension }
)

ImportError: /home/ubuntu/anaconda3/envs/gaussianpro/lib/python3.7/site-packages/gaussianpro.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2cv3MatC1Ev

@kcheng1021
Copy link
Owner

Hi, the opencv library in setup.py is C++ version. So it could be installed by "conda install -c conda-forge opencv".

@nyy618
Copy link

nyy618 commented Jun 27, 2024

@flybiubiu Has your problem been solved? I encounter the same problem. The old version works out. But when I update it to 1.0 version, I can not run the train.py despite the conda environment is installed successfully.

@Rikiruno
Copy link

Rikiruno commented Jun 27, 2024

@flybiubiu Has your problem been solved? I encounter the same problem. The old version works out. But when I update it to 1.0 version, I can not run the train.py despite the conda environment is installed successfully.

I modified these lines according to my installation path and it works. It may help you.

include_dirs=['/data/kcheng/anaconda3/envs/procuda/include/opencv4', '/usr/local/cuda-11.7/include', '.'],

@nyy618
Copy link

nyy618 commented Jun 28, 2024

I modified these lines according to my installation path and it works. It may help you.

include_dirs=['/data/kcheng/anaconda3/envs/procuda/include/opencv4', '/usr/local/cuda-11.7/include', '.'],

Thank you for your reply. I has edited the setup.py, yet it just can't work.
include_dirs=['/usr/local/include/opencv4', '/usr/local/cuda-12.0/include', '.'], library_dirs=['/home/lzr/anaconda3/envs/gaussianpro/lib']
from gaussianpro import propagate ImportError: /home/lzr/anaconda3/envs/gaussianpro/lib/python3.7/site-packages/gaussianpro.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2cv5errorEiRKSsPKcS3_i
Is there something I missed?

@Rikiruno
Copy link

I modified these lines according to my installation path and it works. It may help you.

include_dirs=['/data/kcheng/anaconda3/envs/procuda/include/opencv4', '/usr/local/cuda-11.7/include', '.'],

Thank you for your reply. I has edited the setup.py, yet it just can't work. include_dirs=['/usr/local/include/opencv4', '/usr/local/cuda-12.0/include', '.'], library_dirs=['/home/lzr/anaconda3/envs/gaussianpro/lib'] from gaussianpro import propagate ImportError: /home/lzr/anaconda3/envs/gaussianpro/lib/python3.7/site-packages/gaussianpro.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2cv5errorEiRKSsPKcS3_i Is there something I missed?

After I modified these lines, I removed the build dir in ./submodules/Propagation and reinstalled this package. That's all I did.

@nyy618
Copy link

nyy618 commented Jul 2, 2024

I modified these lines according to my installation path and it works. It may help you.

include_dirs=['/data/kcheng/anaconda3/envs/procuda/include/opencv4', '/usr/local/cuda-11.7/include', '.'],

Thank you for your reply. I has edited the setup.py, yet it just can't work. include_dirs=['/usr/local/include/opencv4', '/usr/local/cuda-12.0/include', '.'], library_dirs=['/home/lzr/anaconda3/envs/gaussianpro/lib'] from gaussianpro import propagate ImportError: /home/lzr/anaconda3/envs/gaussianpro/lib/python3.7/site-packages/gaussianpro.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2cv5errorEiRKSsPKcS3_i Is there something I missed?

After I modified these lines, I removed the build dir in ./submodules/Propagation and reinstalled this package. That's all I did.

I have tried all solutions during the past three days. I finally have to retreat to the old version. I wonder whether 1.0 version can not work well on RTX4090. It seems that 1.0 version doesn't support with sm_89 architecture (Ada Lovelace). Thank you for your help though.

@ParanoidPY
Copy link

I met the same problem, and it cant work with the above suggestions. What is the "1.0 version" you mentioned above. Have you fix the problem? @Rikiruno @nyy618

@nyy618
Copy link

nyy618 commented Aug 30, 2024

@ParanoidPY The 1.0 version I mentioned is the default branch of the project. You can try the 'main' version without pybind. https://github.com/kcheng1021/GaussianPro/tree/main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants