Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: ../Common/CUDA/TIGRE_common.cpp (14): CBCT:CUDA:Atb an illegal memory access was encountered #634

Open
ecoArcGaming opened this issue Jan 30, 2025 · 22 comments

Comments

@ecoArcGaming
Copy link

Hi, I am using the latest TIGRE Python (finally installed after many struggles...), and when I tried to run some FDK reconstruction scripts. I encountered the following error:
./Common/CUDA/TIGRE_common.cpp (7): Main loop fail ../Common/CUDA/TIGRE_common.cpp (14): CBCT:CUDA:Atb an illegal memory access was encountered

No further detail was provided by the interpreter. I tried some of the fixes in this issue: #501 to no avail. I am running this on 1 of the 4 A6000 GPUs on a cluster, not sure if that is relevant. How can I resolve this error? Thanks.

@AnderBiguri
Copy link
Member

Is this the first time that happens, or are you running some ML-pipeline?

There seems to be an issue when running ML-type things (i.e. when calling Atb/FDK thousands of times) #617

@ecoArcGaming
Copy link
Author

ecoArcGaming commented Jan 30, 2025

It is a pipeline but the very first call would terminate and return this error. I am not getting CUDA out of memory but CUDA illegal memory access. I'm not sure if they have the same cause.

I also tried different GPUs, and using all GPUs on a node instead of just 1, but still have the same issue.

@AnderBiguri
Copy link
Member

@ecoArcGaming no it should not be an out of memory issue, TIGRE deals well with smaller GPU memories than the problem at hand.

So you tried in various GPUs? can you tell me which ones?
Also, what is the CUDA version you are using?

Just trying to pinpoint the error

@ecoArcGaming
Copy link
Author

ecoArcGaming commented Jan 30, 2025

@ecoArcGaming no it should not be an out of memory issue, TIGRE deals well with smaller GPU memories than the problem at hand.

So you tried in various GPUs? can you tell me which ones? Also, what is the CUDA version you are using?

Just trying to pinpoint the error

I tried Nvidia RTX 2080TI and A6000. Here are a few package versions: cudatoolkit 11.6.2, cudnn 8.9.2.26. They are installed in my conda environment.

@AnderBiguri
Copy link
Member

Humm, I have access to RTX2080Ti I think, I'll try to run it with CUDA 11.6.2 in conda and see what happens, but its quite strange, there should not be any issue.

I tend to have a custom CUDA installation, rather than the conda one, but this should not be an issue.

@ecoArcGaming
Copy link
Author

ecoArcGaming commented Jan 30, 2025

Thanks. I tried a few things in the meantime. Calling np.ascontiguousarray() on all my inputs, running the python script with CUDA_LAUNCH_BLOCKING=1, setting os.environ['LD_LIBRARY_PATH'] = '/usr/local/cuda/lib64'. I also tried running (with the 2080TI and my conda env) https://github.com/CERN/TIGRE/blob/master/Python/example.py, and I got some slightly different CUDA errors:
0: NVIDIA GeForce RTX 2080 Ti
1: NVIDIA GeForce RTX 2080 Ti
2: NVIDIA GeForce RTX 2080 Ti
3: NVIDIA GeForce RTX 2080 Ti
{'name': 'NVIDIA GeForce RTX 2080 Ti', 'devices': [0, 1, 2, 3]}
../Common/CUDA/TIGRE_common.cpp (7): Texture object creation fail
../Common/CUDA/TIGRE_common.cpp (14): Ax:Siddon_projection invalid argument

@AnderBiguri
Copy link
Member

Yes, the issue in both cases will be due to something going off in texture creation, its just caught at different times. Can you post your script in minimal form (geometry, angles) so I can test it too?

@ecoArcGaming
Copy link
Author

It is unfortunately a part of a larger project which I did not write. Would example.py not suffice for a minimal example for testing purposes?

@AnderBiguri
Copy link
Member

The numerical value of the geometry/angles may be of importance, if you could share something like example.py but with the values you are using, that would help

@ecoArcGaming
Copy link
Author

ecoArcGaming commented Jan 30, 2025

Okay, I am calling algs.fdk(prjs, geo, angles). Here is my geomtry:

-----
Geometry parameters
Distance from source to detector (DSD) = 7.944359081836327 mm
Distance from source to origin (DSO)= 2.6347865868263476 mm
-----
Detector parameters
Number of pixels (nDetector) = [768 972]
Size of each pixel (dDetector) = [0.00597206 0.00597206] mm
Total size of the detector (sDetector) = [4.58653892 5.80483832] mm
-----
Image parameters
Number of voxels (nVoxel) = [501 501 501]
Total size of the image (sVoxel) = [2. 2. 2.] mm
Size of each voxel (dVoxel) = [0.00399202 0.00399202 0.00399202] mm
-----
Offset correction parameters
Offset of image from origin (offOrigin) = [0. 0. 0.] mm
Offset of detector (offDetector) = [0. 0. 0.] mm
-----
Auxillary parameters
Samples per pixel of forward projection (accuracy) = 0.5

I have stored my proj and angles are two numpy arrays in two .npy files. You can download them here:

https://drive.google.com/file/d/1JnJQAlgo9B9pvD7t8conP7V79zNpF2Vk/view?usp=sharing
https://drive.google.com/file/d/1hlmOr2snYEi1HsTCK7I75t-JYpfJh-Xm/view?usp=sharing

@AnderBiguri
Copy link
Member

AnderBiguri commented Jan 30, 2025 via email

@ecoArcGaming
Copy link
Author

You mean the [501, 501, 501] in geo? Changed it to [500, 500, 500] and still have the same error.

@musetee
Copy link

musetee commented Feb 3, 2025

I guess it could be a problem of multi-GPU setting. I can create projections by TIGRE on my own PC (windows, python 3.9.22, cuda 11.8) but not for our server with 4 GPUs. It reported the same error:
../Common/CUDA/TIGRE_common.cpp (7): Texture object creation fail
../Common/CUDA/TIGRE_common.cpp (14): Ax:Siddon_projection invalid argument

You mean the [501, 501, 501] in geo? Changed it to [500, 500, 500] and still have the same error.

@AnderBiguri
Copy link
Member

@musetee Interesting. What about with some size that is divisible by 4, like 512^3?

@musetee
Copy link

musetee commented Feb 3, 2025

@musetee Interesting. What about with some size that is divisible by 4, like 512^3?

yes I used this geometry from the r2_gaussian project: https://github.com/Ruyi-Zha/r2_gaussian/tree/main
btw I have implemented in another server with almost the same environment(windows, python 3.9.22, cuda 11.8, torch 1.12.1, MSVC 2019) but with two GPUs, and it works with only warning
../Common/CUDA/TIGRE_common.cpp (18): Ax:Siddon_projection:GPUselect Detected one (or more) different GPUs.
This code is not smart enough to separate the memory GPU wise if they have different computational times o
ts.
First GPU parameters used. If the code errors you might need to change the way GPU selection is performed.

Mode

mode: cone # X-ray source mode parallel/cone
filter: null

System configuration

DSD: 7.0 # Distance Source Detector
DSO: 5.0 # Distance Source Origin

Detector parameters

nDetector: # Number of pixels (Note: [v, u] not [u,v])

  • 512
  • 512
    sDetector: # Size of image (not pixel)
  • 4.0
  • 4.0

Image parameters

nVoxel: # Number of voxels [x, y, z]

  • 256
  • 256
  • 256
    sVoxel: # size of volume (not voxel)
  • 2.0
  • 2.0
  • 2.0

Offsets

offOrigin: # Offset of image from origin

  • 0 # x direction
  • 0 # y direction
  • 0 # z direction
    offDetector: # Offset of Detector (only in two direction)
  • 0 # u direction
  • 0 # v direction

Auxiliary

accuracy: 0.5 # Accuracy of FWD proj

Angles

totalAngle: 360.0 # Total angle (degree)
startAngle: 0.0 # Start angle (degree)

Noise

noise: true
possion_noise: 10000 # lambda for possion
gaussian_noise: # mean and std for gaussian

  • 0 # mean
  • 10 # std

@AnderBiguri
Copy link
Member

@musetee so it also fails for 4 gpus with 156^3, but works well in 2 GPUs?

Indeed this almost surely looks like a error in the logic for splitting the problem into 4 GPUs, but I don't seem to able to reproduce.
I'll keep trying, I have access to another machine with 4GPUs soon.

@AnderBiguri
Copy link
Member

in the meantime @ecoArcGaming can you try then limiting your use to 2 GPUs, to see if that works? you can use the GPU selection API that TIGRE comes with to just select a couple

@ecoArcGaming
Copy link
Author

in the meantime @ecoArcGaming can you try then limiting your use to 2 GPUs, to see if that works? you can use the GPU selection API that TIGRE comes with to just select a couple

Sounds good. I will try that and let you know if it works.

@ecoArcGaming
Copy link
Author

ecoArcGaming commented Feb 3, 2025

This did not work for me. I did

gpuids = gpu.getGpuIds('A6000')
gpuids.devices = [0]
print(gpuids) # {'name': 'A6000', 'devices': [0]}
vol = algs.fdk(projs, geo, angles, gpuid = gpuids)

Which still gives:

../Common/CUDA/TIGRE_common.cpp (7): Main loop fail 
../Common/CUDA/TIGRE_common.cpp (14): CBCT:CUDA:Atb an illegal memory access was encountereds

@AnderBiguri
Copy link
Member

@ecoArcGaming I wonder if this is an A6000 specific issue... I'll keep looking.

@musetee
Copy link

musetee commented Feb 4, 2025

@ecoArcGaming I wonder if this is an A6000 specific issue... I'll keep looking.

That makes sense!! We have 2 A6000s, 1 A4000 and 1 A5000 on that server (None of them works). Driver version is 555.85
The another server which works has one Quadro P5000 and one TITAN Xp, and the driver version is 461.33

@musetee
Copy link

musetee commented Feb 4, 2025

@ecoArcGaming I wonder if this is an A6000 specific issue... I'll keep looking.

That makes sense!! We have 2 A6000s, 1 A4000 and 1 A5000 on that server (None of them works). Driver version is 555.85 The another server which works has one Quadro P5000 and one TITAN Xp, and the driver version is 461.33

my own PC has one 3090ti and it works fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants