-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: ../Common/CUDA/TIGRE_common.cpp (14): CBCT:CUDA:Atb an illegal memory access was encountered #634
Comments
Is this the first time that happens, or are you running some ML-pipeline? There seems to be an issue when running ML-type things (i.e. when calling Atb/FDK thousands of times) #617 |
It is a pipeline but the very first call would terminate and return this error. I am not getting I also tried different GPUs, and using all GPUs on a node instead of just 1, but still have the same issue. |
@ecoArcGaming no it should not be an out of memory issue, TIGRE deals well with smaller GPU memories than the problem at hand. So you tried in various GPUs? can you tell me which ones? Just trying to pinpoint the error |
I tried Nvidia RTX 2080TI and A6000. Here are a few package versions: |
Humm, I have access to RTX2080Ti I think, I'll try to run it with CUDA 11.6.2 in conda and see what happens, but its quite strange, there should not be any issue. I tend to have a custom CUDA installation, rather than the conda one, but this should not be an issue. |
Thanks. I tried a few things in the meantime. Calling |
Yes, the issue in both cases will be due to something going off in texture creation, its just caught at different times. Can you post your script in minimal form (geometry, angles) so I can test it too? |
It is unfortunately a part of a larger project which I did not write. Would |
The numerical value of the geometry/angles may be of importance, if you could share something like |
Okay, I am calling
I have stored my proj and angles are two numpy arrays in two https://drive.google.com/file/d/1JnJQAlgo9B9pvD7t8conP7V79zNpF2Vk/view?usp=sharing |
Just in case, try with an even number of pixels.
…On Thu, 30 Jan 2025, 18:06 Erik, ***@***.***> wrote:
Okay, I am calling algs.fdk(prjs, geo, angles). Here is my geomtry:
`TIGRE parameters Geometry parameters
Distance from source to detector (DSD) = 7.944359081836327 mm
Distance from source to origin (DSO)= 2.6347865868263476 mm Detector
parameters
Number of pixels (nDetector) = [768 972]
Size of each pixel (dDetector) = [0.00597206 0.00597206] mm
Total size of the detector (sDetector) = [4.58653892 5.80483832] mm Image
parameters
Number of voxels (nVoxel) = [501 501 501]
Total size of the image (sVoxel) = [2. 2. 2.] mm
Size of each voxel (dVoxel) = [0.00399202 0.00399202 0.00399202] mm Offset
correction parameters
Offset of image from origin (offOrigin) = [0. 0. 0.] mm
Offset of detector (offDetector) = [0. 0. 0.] mm
Auxillary parameters
Samples per pixel of forward projection (accuracy) = 0.5`
I have stored my proj and angles are two numpy arrays in two .npy files.
You can download them here:
https://drive.google.com/file/d/1JnJQAlgo9B9pvD7t8conP7V79zNpF2Vk/view?usp=sharing
https://drive.google.com/file/d/1hlmOr2snYEi1HsTCK7I75t-JYpfJh-Xm/view?usp=sharing
—
Reply to this email directly, view it on GitHub
<#634 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC2OENENLQUWAZ6P4HICTJD2NJS3FAVCNFSM6AAAAABWFSSRI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRVGIYTQMBTGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
You mean the [501, 501, 501] in geo? Changed it to [500, 500, 500] and still have the same error. |
I guess it could be a problem of multi-GPU setting. I can create projections by TIGRE on my own PC (windows, python 3.9.22, cuda 11.8) but not for our server with 4 GPUs. It reported the same error:
|
@musetee Interesting. What about with some size that is divisible by 4, like 512^3? |
yes I used this geometry from the r2_gaussian project: https://github.com/Ruyi-Zha/r2_gaussian/tree/main Modemode: cone # X-ray source mode parallel/cone System configurationDSD: 7.0 # Distance Source Detector Detector parametersnDetector: # Number of pixels (Note: [v, u] not [u,v])
Image parametersnVoxel: # Number of voxels [x, y, z]
OffsetsoffOrigin: # Offset of image from origin
Auxiliaryaccuracy: 0.5 # Accuracy of FWD proj AnglestotalAngle: 360.0 # Total angle (degree) Noisenoise: true
|
@musetee so it also fails for 4 gpus with 156^3, but works well in 2 GPUs? Indeed this almost surely looks like a error in the logic for splitting the problem into 4 GPUs, but I don't seem to able to reproduce. |
in the meantime @ecoArcGaming can you try then limiting your use to 2 GPUs, to see if that works? you can use the GPU selection API that TIGRE comes with to just select a couple |
Sounds good. I will try that and let you know if it works. |
This did not work for me. I did
Which still gives:
|
@ecoArcGaming I wonder if this is an A6000 specific issue... I'll keep looking. |
That makes sense!! We have 2 A6000s, 1 A4000 and 1 A5000 on that server (None of them works). Driver version is 555.85 |
my own PC has one 3090ti and it works fine |
Hi, I am using the latest TIGRE Python (finally installed after many struggles...), and when I tried to run some FDK reconstruction scripts. I encountered the following error:
./Common/CUDA/TIGRE_common.cpp (7): Main loop fail ../Common/CUDA/TIGRE_common.cpp (14): CBCT:CUDA:Atb an illegal memory access was encountered
No further detail was provided by the interpreter. I tried some of the fixes in this issue: #501 to no avail. I am running this on 1 of the 4 A6000 GPUs on a cluster, not sure if that is relevant. How can I resolve this error? Thanks.
The text was updated successfully, but these errors were encountered: