Artifacts in 3D-TV Denoised Images #594

zezisme · 2024-10-17T06:29:02Z

Hello, I found that when using im3DDenoise to denoise 3D images, as the number of iterations increases, edge slices will produce image artifacts, and these artifacts seem to come from other slices. Is there any reference document for this algorithm? Can you provide the corresponding mathematical derivation document ? I want know is the issue come from the mathematical principle or
its implementation method?

Actual Behavior

TV_lambda = 200；

FDK

FDK+TVdenoise（50 iterations）

FDK+TVdenoise（100 iterations）

FDK+TVdenoise（200 iterations）

The artifact seems come from the 267th slice image(Or near this slice， And the total slice num is 400)
It seems to be related to a ratio of 1/3（（400-267）/400≈1/3）

Code to reproduce the problem (If applicable)

x=FDK(rawdata_proj,geo,angles,'filter',filter_type);
% TV denoising
TV_niter = 200; % 50,100,200,..
TV_lambda = 200; % 10,20,.....200
x=im3DDenoise(x,'TV',TV_niter,TV_lambda,'gpuids',GpuIds());

Specifications

MATLAB/python version:
OS:
CUDA version:

The text was updated successfully, but these errors were encountered:

AnderBiguri · 2024-10-17T10:57:51Z

Interesting. Is that ghost that you get the first slice? This was coded 10 years ago, so I don't exactly remember the implementation, but it could be caused to periodic boundary conditions.

AnderBiguri · 2024-10-17T11:01:19Z

In any case, you can find the math and the relevant paper in my PhD thesis (in my github profile) and the code is in Common/CUDA/tvdenoising.cu. It does not seem like it has periodic boundary conditions. Can you identify which slice this ghost is coming from? Can yo also tell me the size of the image and number of GPUs (including models)? Can you reproduce this in a small image, on a single GPU using TIGREs default data?

zezisme · 2024-10-17T12:30:59Z

In any case, you can find the math and the relevant paper in my PhD thesis (in my github profile) and the code is in Common/CUDA/tvdenoising.cu. It does not seem like it has periodic boundary conditions. Can you identify which slice this ghost is coming from? Can yo also tell me the size of the image and number of GPUs (including models)? Can you reproduce this in a small image, on a single GPU using TIGREs default data?

The
CASE1: INPUT IMAGES SIZE = (1529,1529,400),And I just have one GPU
The input image（slice 400,last slice,No ghost ）

Using the default data(iter=50; hyper=15;single GPU),the output denoised image is（there is a obvious ghost ）:
The 400th slice image（there is a obvious ghost ）

The 399th slcie image

The 398th slcie image

The ghost are becoming weaker
And the ghost is seems come from the 267th slice image

CASE2:INPUT IMAGES SIZE = (1529,1529,100),Using the default data(iter=50; hyper=15;single GPU)
the ouput image(last image) has no ghost(great!!!)

CASE3:INPUT IMAGES SIZE = (1529,1529,200),Using the default data(iter=50; hyper=15;single GPU)
the ouput image(last image) has ghost again

And the ghost is seems come from the 100th slice image

Final Test: I find that if the GPU memory is not enough, than there is a warn, the algorithm will using the CPU memory , And the ghost is coming! if the GPU memory is enough, the ghost will not occur.

Conclution:
This bug is most likely caused by a problem with data acquisition during the communication between the CPU and GPU.

AnderBiguri · 2024-10-17T12:35:31Z

Fnatstic experiment, indeed this is why I asked GPU size etc. Seems that something is broken in the memory-in-out where either the memory doesn't get appropriately reset, or the chunk of memory being copied is wrong. I will investigate.

zezisme · 2024-10-17T12:50:25Z

Fnatstic experiment, indeed this is why I asked GPU size etc. Seems that something is broken in the memory-in-out where either the memory doesn't get appropriately reset, or the chunk of memory being copied is wrong. I will investigate.

Great！ Looking forward to your solution

AnderBiguri · 2024-10-17T12:53:19Z

A posibility:

https://github.com/CERN/TIGRE/blob/master/Common/CUDA/tvdenoising.cu#L378-L389

may need to be changed to:

for (dev = 0; dev < deviceCount; dev++){
                        cudaSetDevice(gpuids[dev]);
                        // New line:
                        cudaMemsetAsync(d_src[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);
                        cudaMemcpyAsync(d_src[dev]+offset_device[dev], src+offset_host[dev]  , bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                    }
                    for (dev = 0; dev < deviceCount; dev++){
                        cudaSetDevice(gpuids[dev]);
                        // All these are async
                        // New line:
                        cudaMemsetAsync(d_u[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);
                        cudaMemcpyAsync(d_u[dev]  +offset_device[dev], d_src[dev]+offset_device[dev], bytes_device[dev]*sizeof(float), cudaMemcpyDeviceToDevice,stream[dev*nStream_device+1]);
                        cudaMemsetAsync(d_px[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                        cudaMemsetAsync(d_py[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                        cudaMemsetAsync(d_pz[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                    }

I can not try now, but if you want to try that, great!
If not, give me a bit of time and i'll give it a shot myself.

zezisme · 2024-10-18T05:05:51Z

A posibility:

https://github.com/CERN/TIGRE/blob/master/Common/CUDA/tvdenoising.cu#L378-L389

may need to be changed to:

for (dev = 0; dev < deviceCount; dev++){
                        cudaSetDevice(gpuids[dev]);
                        // New line:
                        cudaMemsetAsync(d_src[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);
                        cudaMemcpyAsync(d_src[dev]+offset_device[dev], src+offset_host[dev]  , bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                    }
                    for (dev = 0; dev < deviceCount; dev++){
                        cudaSetDevice(gpuids[dev]);
                        // All these are async
                        // New line:
                        cudaMemsetAsync(d_u[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);
                        cudaMemcpyAsync(d_u[dev]  +offset_device[dev], d_src[dev]+offset_device[dev], bytes_device[dev]*sizeof(float), cudaMemcpyDeviceToDevice,stream[dev*nStream_device+1]);
                        cudaMemsetAsync(d_px[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                        cudaMemsetAsync(d_py[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                        cudaMemsetAsync(d_pz[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                    }

I can not try now, but if you want to try that, great! If not, give me a bit of time and i'll give it a shot myself.

I have tested your new code, but the results are still the same as before, the same ghost is still there

AnderBiguri · 2024-10-18T08:52:36Z

Okey, I will investigate further, but this may be hard to fix, as during all my years, I have never seen this happen, which means its a relatively nitche case, and I am not sure I will be able to reproduce. I'll try!

AnderBiguri · 2024-10-18T09:02:52Z

Can you tell me exactly which GPU you have? I can try to simulate your machine if I know exactly how much memory your GPU has.

zezisme · 2024-10-18T10:09:01Z

Can you tell me exactly which GPU you have? I can try to simulate your machine if I know exactly how much memory your GPU has.

Here is my GPU information! Thank you very much that you can fix this bug as soon as possible. In addition, I am also learning cuda programming so that I can fix the problem by myself if possible.

zezisme · 2024-10-19T11:35:31Z

Okey, I will investigate further, but this may be hard to fix, as during all my years, I have never seen this happen, which means its a relatively nitche case, and I am not sure I will be able to reproduce. I'll try!

Hello @AnderBiguri :
I seem to have found the bugs. After reading the source code carefully, I found two problems:
(1) In iterations >1, the device memory is not reset to zero, which is very important for the first_chunk and the last_chunk (because buffer_pixels exists， and the last chunk is not large enough);
(2) After each iteration of sp, cudaMemcpyDeviceToHost, h_u, h_px, h_py, and h_pz will be changed. At this time, the next sp iteration (sp+1) requires cudaMemcpyHostToDevice. However, since the copied data bytes_device is curr_pixels+buffer_pixels(

TIGRE/Common/CUDA/tvdenoising.cu

Line 375 in f0e9100

    
           bytes_device[dev]=curr_pixels+buffer_pixels*!is_first_chunk+buffer_pixels*!is_last_chunk;

), the host data after the previous sp iteration is partially copied to the sp+1 iteration, causing data aliasing.

the first one may be the main reason of the ghost, And I tran to insert the fellow new codes in

TIGRE/Common/CUDA/tvdenoising.cu

Lines 402 to 434 in f0e9100

    
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+1]); 
        
               cudaMemcpyAsync(d_u [dev] +offset_device[dev], h_u +offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]); 
        
           } 
        
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+2]); 
        
               cudaMemcpyAsync(d_px[dev]+offset_device[dev], h_px+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+2]); 
        
           } 
        
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+3]); 
        
               cudaMemcpyAsync(d_py[dev] +offset_device[dev], h_py+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+3]); 
        
           } 
        
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+4]); 
        
               cudaMemcpyAsync(d_pz[dev] +offset_device[dev], h_pz+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+4]); 
        
           }  
        
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaStreamSynchronize(stream[dev*nStream_device+1]); 
        
               cudaMemcpyAsync(d_src[dev]+offset_device[dev], src +offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]); 
        
           }

The new codes:

                   for (dev = 0; dev < deviceCount; dev++){   
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+1]);
                        // New line:
                        cudaMemsetAsync(d_u[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);
                        cudaMemcpyAsync(d_u [dev] +offset_device[dev], h_u +offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                       
                    }

                    for (dev = 0; dev < deviceCount; dev++){   
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+2]);
                        // New line:
                        cudaMemsetAsync(d_px[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+2]);
                        cudaMemcpyAsync(d_px[dev]+offset_device[dev], h_px+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+2]);
                       
                    }
                    for (dev = 0; dev < deviceCount; dev++){   
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+3]);
                       // New line:
                        cudaMemsetAsync(d_py[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+3]);
                        cudaMemcpyAsync(d_py[dev] +offset_device[dev], h_py+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+3]);
                        
                    }
                    for (dev = 0; dev < deviceCount; dev++){   
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+4]);
                       // New line:
                        cudaMemsetAsync(d_pz[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+4]);
                        cudaMemcpyAsync(d_pz[dev] +offset_device[dev], h_pz+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+4]);
                        
                    } 
                    for (dev = 0; dev < deviceCount; dev++){   

                        
                        cudaStreamSynchronize(stream[dev*nStream_device+1]);
                       // New line:
                        cudaMemsetAsync(d_src[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);
                        cudaMemcpyAsync(d_src[dev]+offset_device[dev], src +offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                        

                    }

It is just add a line to set the d_u、d_px、d_py、d_pz、d_src to be zero before setting the new value. And finnal this attempt work!!! there is no ghost whatever the para I used.

Of course, there should be a more efficient solution, but I haven't tried it yet;
And I haven't thought of a good solution for problem 2.

zezisme · 2024-10-19T14:09:03Z

Here comes a new problem. For the first and last pictures, since 0 is used as the buffer data, there will be obvious noise.
The first and end slice picture(with noise)

Here is the middle slice picture(smooth, without noise)

Can we use the adjacent pictures that be flipped in z-axis as the buffer data, so that it can smooth the gradient of data.
I am not good at cuda programming yet, so I don't know how to implement it

zezisme · 2024-10-20T15:18:37Z

I have tried using the flip data of the first and end slice as the buffer data, and here is the new code（Note: Multi-GPU is not considered here）:

TIGRE/Common/CUDA/tvdenoising.cu

Lines 378 to 389 in f0e9100

    
           for (dev = 0; dev < deviceCount; dev++){ 
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaMemcpyAsync(d_src[dev]+offset_device[dev], src+offset_host[dev]  , bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]); 
        
           } 
        
           for (dev = 0; dev < deviceCount; dev++){ 
        
               cudaSetDevice(gpuids[dev]); 
        
               // All these are async 
        
               cudaMemcpyAsync(d_u[dev]  +offset_device[dev], d_src[dev]+offset_device[dev], bytes_device[dev]*sizeof(float), cudaMemcpyDeviceToDevice,stream[dev*nStream_device+1]); 
        
               cudaMemsetAsync(d_px[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]); 
        
               cudaMemsetAsync(d_py[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]); 
        
               cudaMemsetAsync(d_pz[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]); 
        
           }

for (dev = 0; dev < deviceCount; dev++){
                        cudaSetDevice(gpuids[dev]);
                        // New line: not care the multi GPU
                        if(is_first_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_src[dev]+pixels_per_slice*j, src+pixels_per_slice*(buffer_length-j-1), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                            }
                        }
                        if(is_last_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_src[dev]+bytes_device[dev]+pixels_per_slice*j, src+pixels_per_slice*(image_size[2]-j-1), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                            }
                        }
                        cudaMemcpyAsync(d_src[dev]+offset_device[dev], src+offset_host[dev]  , bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                    }
                    for (dev = 0; dev < deviceCount; dev++){
                        cudaSetDevice(gpuids[dev]);
                        // All these are async
                        // New line:
                        cudaMemcpyAsync(d_u[dev], d_src[dev], mem_img_each_GPU, cudaMemcpyDeviceToDevice,stream[dev*nStream_device+1]);
                        cudaMemsetAsync(d_px[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                        cudaMemsetAsync(d_py[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                        cudaMemsetAsync(d_pz[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                    }

TIGRE/Common/CUDA/tvdenoising.cu

Lines 402 to 434 in f0e9100

    
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+1]); 
        
               cudaMemcpyAsync(d_u [dev] +offset_device[dev], h_u +offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]); 
        
           } 
        
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+2]); 
        
               cudaMemcpyAsync(d_px[dev]+offset_device[dev], h_px+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+2]); 
        
           } 
        
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+3]); 
        
               cudaMemcpyAsync(d_py[dev] +offset_device[dev], h_py+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+3]); 
        
           } 
        
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+4]); 
        
               cudaMemcpyAsync(d_pz[dev] +offset_device[dev], h_pz+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+4]); 
        
           }  
        
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaStreamSynchronize(stream[dev*nStream_device+1]); 
        
               cudaMemcpyAsync(d_src[dev]+offset_device[dev], src +offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]); 
        
           }

for (dev = 0; dev < deviceCount; dev++){   
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+1]);
                        // New line: not care of the multi GPU()
                        if(is_first_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_u[dev]+pixels_per_slice*j, h_u+pixels_per_slice*(buffer_length-j-1), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                            }
                        }
                        if(is_last_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_u[dev]+bytes_device[dev]+pixels_per_slice*j, h_u+pixels_per_slice*(image_size[2]-j-1), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                            }
                        }
                        //cudaMemsetAsync(d_u[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);//zhouenze
                        cudaMemcpyAsync(d_u [dev] +offset_device[dev], h_u +offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                       
                    }

                    for (dev = 0; dev < deviceCount; dev++){   
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+2]);
                        // New line:
                        if(is_first_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_px[dev]+pixels_per_slice*j, h_px+pixels_per_slice*(buffer_length-j-1), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+2]);
                            }
                        }
                        if(is_last_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_px[dev]+bytes_device[dev]+pixels_per_slice*j, h_px+pixels_per_slice*(image_size[2]-j-1), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+2]);
                            }
                        }
                        // cudaMemsetAsync(d_px[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+2]);//zhouenze
                        cudaMemcpyAsync(d_px[dev]+offset_device[dev], h_px+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+2]);
                       
                    }
                    for (dev = 0; dev < deviceCount; dev++){   
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+3]);
                        // New line:
                        if(is_first_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_py[dev]+pixels_per_slice*j, h_py+pixels_per_slice*(buffer_length-j-1), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+3]);
                            }
                        }
                        if(is_last_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_py[dev]+bytes_device[dev]+pixels_per_slice*j, h_py+pixels_per_slice*(image_size[2]-j-1), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+3]);
                            }
                        }
                        // cudaMemsetAsync(d_py[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+3]);//zhouenze
                        cudaMemcpyAsync(d_py[dev] +offset_device[dev], h_py+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+3]);
                        
                    }
                    for (dev = 0; dev < deviceCount; dev++){   
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+4]);
                        // New line:
                        if(is_first_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_pz[dev]+pixels_per_slice*j, h_pz+pixels_per_slice*(buffer_length-j-1), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+4]);
                            }
                        }
                        if(is_last_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_pz[dev]+bytes_device[dev]+pixels_per_slice*j, h_pz+pixels_per_slice*(image_size[2]-j-1), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+4]);
                            }
                        }
                        // cudaMemsetAsync(d_pz[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+4]);//zhouenze
                        cudaMemcpyAsync(d_pz[dev] +offset_device[dev], h_pz+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+4]);
                        
                    } 
                    for (dev = 0; dev < deviceCount; dev++){  
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+1]);
                        // New line:
                        if(is_first_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_src[dev]+pixels_per_slice*j, src+pixels_per_slice*(buffer_length-j-1), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                            }
                        }
                        if(is_last_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_src[dev]+bytes_device[dev]+pixels_per_slice*j, src+pixels_per_slice*(image_size[2]-j-1), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                            }
                        }
                        //cudaMemsetAsync(d_src[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);//zhouenze
                        cudaMemcpyAsync(d_src[dev]+offset_device[dev], src +offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                        

                    }

Here is the result:
the left image: init denoised image of end slice(have ghost); middle image: the denoised end image(using zero as buffer data) (no ghost, but have severe noise)，right image: the denoised end image(using flipped data as buffer data) (no ghost, but have slight noise enhancement than the init image, why????). It should be noted that these methods differ only in the slices at the two ends, and all other slices are the same.

AnderBiguri · 2024-10-20T16:23:03Z

hi @zezisme , I am not sure I undetstood very well the second issue or these last images. Can you edit the text to explain them a bit longer?

zezisme · 2024-10-21T07:20:53Z

hi @zezisme , I am not sure I undetstood very well the second issue or these last images. Can you edit the text to explain them a bit longer?

This is the flow chart of the second problem, but I am not sure whether this problem has a little big impact on denoising.

AnderBiguri · 2024-10-21T09:59:42Z

Right, let me rephrase it to see if I fully understood:

The issue you are highlighting is that after the first split of the image is processed, the second one will use updated data as a buffer, rather than the original data.

Yes you are correct then, this is what happens. Looking at my notes, I seem to have tested this when I coded the multi-GPU code and I decided that even if this is wrong, it has a limited impact on the denoising, so I was not going to do much about it. The solution otherwise requires a full extra copy of the image in memory, which I decided that was too costly for the little effect it had.

However, this should have no effect in the end-slides I think. I believe that the end-slides is a problem of the boundary condition. This code assumes Neumann boundary conditions (i.e. gradient is zero in the slice direction at the first/last slice). This means that the gradient at the edges will always be smaller than everywhere else, and so will it be then the denoising strength. Perhaps using periodic boundary conditions would fix the issue (if this is the issue). Critically, it would mean to change these lines to access the next (rather than previous) pixel value in the else case:

https://github.com/CERN/TIGRE/blob/master/Common/CUDA/tvdenoising.cu#L70-L123

zezisme · 2024-10-21T10:39:55Z

Right, let me rephrase it to see if I fully understood:

The issue you are highlighting is that after the first split of the image is processed, the second one will use updated data as a buffer, rather than the original data.

Yes you are correct then, this is what happens. Looking at my notes, I seem to have tested this when I coded the multi-GPU code and I decided that even if this is wrong, it has a limited impact on the denoising, so I was not going to do much about it. The solution otherwise requires a full extra copy of the image in memory, which I decided that was too costly for the little effect it had.

However, this should have no effect in the end-slides I think. I believe that the end-slides is a problem of the boundary condition. This code assumes Neumann boundary conditions (i.e. gradient is zero in the slice direction at the first/last slice). This means that the gradient at the edges will always be smaller than everywhere else, and so will it be then the denoising strength. Perhaps using periodic boundary conditions would fix the issue (if this is the issue). Critically, it would mean to change these lines to access the next (rather than previous) pixel value in the else case:

https://github.com/CERN/TIGRE/blob/master/Common/CUDA/tvdenoising.cu#L70-L123

Yes, I agree that the second problem is not the cause of ghost！the main reason is the first problem that have mentioned before.
here is the flow chart of the main problem; I also have tested the new code, and have some result, you can have a look in github issues page(#594).

AnderBiguri · 2024-10-21T10:42:59Z

I just updated TIGRE to solve the ghosting issue. Does the change (its from your code) solve it?

zezisme · 2024-10-21T10:54:50Z

I just updated TIGRE to solve the ghosting issue. Does the change (its from your code) solve it?

And for the boundary buf expansion method, I have a new method to solve the noise enhancement problem. Here is the new code (just copy the boundary slice as the buf slice to avoid the negative z-axis gradient caused by the mirror slice) for

TIGRE/Common/CUDA/tvdenoising.cu

Lines 402 to 443 in cf42782

    
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+1]); 
        
               // Sometimes the last_chunk is smaller than the other ones, and thus if we don't reset the memory to zero, we'll gave ghosts residual data in the variable 
        
               cudaMemsetAsync(d_u[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);  
        
               cudaMemcpyAsync(d_u [dev] +offset_device[dev], h_u +offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]); 
        
           } 
        
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+2]); 
        
               // Sometimes the last_chunk is smaller than the other ones, and thus if we don't reset the memory to zero, we'll gave ghosts residual data in the variable 
        
               cudaMemsetAsync(d_px[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);  
        
               cudaMemcpyAsync(d_px[dev]+offset_device[dev], h_px+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+2]); 
        
           } 
        
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+3]); 
        
               // Sometimes the last_chunk is smaller than the other ones, and thus if we don't reset the memory to zero, we'll gave ghosts residual data in the variable 
        
               cudaMemsetAsync(d_py[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);  
        
               cudaMemcpyAsync(d_py[dev] +offset_device[dev], h_py+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+3]); 
        
           } 
        
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+4]); 
        
               // Sometimes the last_chunk is smaller than the other ones, and thus if we don't reset the memory to zero, we'll gave ghosts residual data in the variable 
        
               cudaMemsetAsync(d_pz[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);  
        
               cudaMemcpyAsync(d_pz[dev] +offset_device[dev], h_pz+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+4]); 
        
           }  
        
           for (dev = 0; dev < deviceCount; dev++){    
        
               cudaSetDevice(gpuids[dev]); 
        
               cudaStreamSynchronize(stream[dev*nStream_device+1]); 
        
                // Sometimes the last_chunk is smaller than the other ones, and thus if we don't reset the memory to zero, we'll gave ghosts residual data in the variable 
        
               cudaMemsetAsync(d_src[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);  
        
               cudaMemcpyAsync(d_src[dev]+offset_device[dev], src +offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]); 
        
           }

The fellowing change may just work in single GPU, If you use multiple GPUs, it may go wrong！

for (dev = 0; dev < deviceCount; dev++){   
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+1]);
                        // New line:
                        if(is_first_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_u[dev]+pixels_per_slice*j, h_u+pixels_per_slice*j, pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                            }
                        }
                        if(is_last_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_u[dev]+bytes_device[dev]+pixels_per_slice*j, h_u+pixels_per_slice*(image_size[2]-buffer_length+j), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                            }
                        }
                        cudaMemcpyAsync(d_u [dev] +offset_device[dev], h_u +offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                       
                    }

                    for (dev = 0; dev < deviceCount; dev++){   
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+2]);
                        // New line:
                        if(is_first_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_px[dev]+pixels_per_slice*j, h_px+pixels_per_slice*j, pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+2]);
                            }
                        }
                        if(is_last_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_px[dev]+bytes_device[dev]+pixels_per_slice*j, h_px+pixels_per_slice*(image_size[2]-buffer_length+j), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+2]);
                            }
                        }
                        cudaMemcpyAsync(d_px[dev]+offset_device[dev], h_px+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+2]);
                       
                    }
                    for (dev = 0; dev < deviceCount; dev++){   
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+3]);
                        // New line:
                        if(is_first_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_py[dev]+pixels_per_slice*j, h_py+pixels_per_slice*j, pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+3]);
                            }
                        }
                        if(is_last_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_py[dev]+bytes_device[dev]+pixels_per_slice*j, h_py+pixels_per_slice*(image_size[2]-buffer_length+j), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+3]);
                            }
                        }
                        cudaMemcpyAsync(d_py[dev] +offset_device[dev], h_py+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+3]);
                        
                    }
                    for (dev = 0; dev < deviceCount; dev++){   
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+4]);
                        // New line:
                        if(is_first_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_pz[dev]+pixels_per_slice*j, h_pz+pixels_per_slice*j, pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+4]);
                            
                            }
                        }
                        if(is_last_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_pz[dev]+bytes_device[dev]+pixels_per_slice*j, h_pz+pixels_per_slice*(image_size[2]-buffer_length+j), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+4]);
                            }
                        }
                        cudaMemcpyAsync(d_pz[dev] +offset_device[dev], h_pz+offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+4]);
                        
                    } 
                    for (dev = 0; dev < deviceCount; dev++){  
                        cudaSetDevice(gpuids[dev]);
                        cudaStreamSynchronize(stream[dev*nStream_device+1]);
                        // New line:
                        if(is_first_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_src[dev]+pixels_per_slice*j, src+pixels_per_slice*j, pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                            }
                        }
                        if(is_last_chunk){
                            for (unsigned int j=0;j<buffer_length;j++){
                                cudaMemcpyAsync(d_src[dev]+bytes_device[dev]+pixels_per_slice*j, src+pixels_per_slice*(image_size[2]-buffer_length+j), pixels_per_slice*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                            }
                        }
                        cudaMemcpyAsync(d_src[dev]+offset_device[dev], src +offset_host[dev],  bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                        

                    }

zezisme · 2024-10-21T11:06:34Z

I just updated TIGRE to solve the ghosting issue. Does the change (its from your code) solve it?

I have read your modification, which can indeed solve the ghost problem. However, for the problem of assuming that the boundary gradient is 0（Neumann boundary conditions）, this will make the boundary slice noise larger. But this may not be a very important problem

AnderBiguri · 2024-10-21T11:10:44Z

Hi @zezisme. I only added the memory zeroing part, not the part that you mentioned that is only for one GPU.

For now, you shared 3 pieces of code:

I have added now this one (memory zeroing before the copy):
Artifacts in 3D-TV Denoised Images #594 (comment)
I have NOT added this code:
Artifacts in 3D-TV Denoised Images #594 (comment)
The code its hard to read without diffs, but if I understand correctly, this is attempting to fix issue 2, i.e. the fact that the buffer has updated image values, rather than the ones from the previous iterations. I am not sure this makes a big difference, as I mention here Artifacts in 3D-TV Denoised Images #594 (comment) . You did show some images in this code-post, but I did not understand them (what is "flipped"?, I don't know what the 3 cases you show are).
The third code you just shared is this:
Artifacts in 3D-TV Denoised Images #594 (comment)
I am not sure what you are doing. There is no negative Z created by mirror slices, the boundary conditions now make the gradient ==0. This is done in the compute side, in the lines I linked in my comment (Artifacts in 3D-TV Denoised Images #594 (comment)) where you can see that if any of the axis is accessed out of bounds, its detivative its not computed.

So, questions for you:

Given just the first change, does this fix your ghosting?
The second piece of code, as we discussed, would indeed fix the fact that different splits don't access exactly the same data. However, do you see a big difference if you don't do anything about it?
Is the third piece of code needed?

zezisme · 2024-10-21T11:29:10Z

Hi @zezisme. I only added the memory zeroing part, not the part that you mentioned that is only for one GPU.

For now, you shared 3 pieces of code:

I have added now this one (memory zeroing before the copy):
Artifacts in 3D-TV Denoised Images #594 (comment)

I have NOT added this code:
Artifacts in 3D-TV Denoised Images #594 (comment)
The code its hard to read without diffs, but if I understand correctly, this is attempting to fix issue 2, i.e. the fact that the buffer has updated image values, rather than the ones from the previous iterations. I am not sure this makes a big difference, as I mention here Artifacts in 3D-TV Denoised Images #594 (comment) . You did show some images in this code-post, but I did not understand them (what is "flipped"?, I don't know what the 3 cases you show are).

The third code you just shared is this:
Artifacts in 3D-TV Denoised Images #594 (comment)
I am not sure what you are doing. There is no negative Z created by mirror slices, the boundary conditions now make the gradient ==0. This is done in the compute side, in the lines I linked in my comment (Artifacts in 3D-TV Denoised Images #594 (comment)) where you can see that if any of the axis is accessed out of bounds, its detivative its not computed.

So, questions for you:

Given just the first change, does this fix your ghosting?

The second piece of code, as we discussed, would indeed fix the fact that different splits don't access exactly the same data. However, do you see a big difference if you don't do anything about it?

Is the third piece of code needed?

Answer three questions：

the first change can indeed solve the ghost problem!
Sorry for my unclear expression. All the other codes are to solve the problem of boundary slice noise (due to Neumann boundary conditions), not for the problem2 in Artifacts in 3D-TV Denoised Images #594 (comment) don't think this is an important issue either.)
the third piece of code is not needed in Neumann boundary conditions.

AnderBiguri · 2024-10-21T11:38:58Z

@zezisme Thanks a lot for the explanation!

I think the best way to tackle the fact that the edge slices have different strength is to change the math, not the memory.

Can you see if we change the lines I linked before to the following fixed the problem? Essentially this changes the boundary conditions from Neumman to mirror.

   __device__ __inline__
            float divergence(const float* pz, const float* py, const float* px,
            long z, long y, long x, long depth, long rows, long cols,
            float dz, float dy, float dx)
    {
        long size2d = rows*cols;
        long idx = z * size2d + y * cols + x;
        float _div = 0.0f;
        
        if ( z - 1 >= 0 ) {
            _div += (pz[idx] - pz[(z-1)*size2d + y*cols + x]) / dz;
        } else {
            _div += (pz[idx]-pz[(z+1)*size2d + y*cols + x]) / dz;
        }
        
        if ( y - 1 >= 0 ) {
            _div += (py[idx] - py[z*size2d + (y-1)*cols + x]) / dy;
        } else {
            _div += (py[idx] - py[z*size2d + (y+1)*cols + x]) / dy;
        }
        
        if ( x - 1 >= 0 ) {
            _div += (px[idx] - px[z*size2d + y*cols + (x-1)]) / dx;
        } else {
            _div += (px[idx] - px[z*size2d + y*cols + (x+1)]) / dx;
        }
        
        return _div;
    }
    
    __device__ __inline__
            void gradient(const float* u, float* grad,
            long z, long y, long x,
            long depth, long rows, long cols,
            float dz, float dy, float dx)
    {
        long size2d = rows*cols;
        long idx = z * size2d + y * cols + x;
        
        float uidx = u[idx];
        
        if ( z + 1 < depth ) {
            grad[0] = (u[(z+1)*size2d + y*cols + x] - uidx) / dz;
        }else{
            grad[0] = (u[(z-1)*size2d + y*cols + x] - uidx) / dz;
        }
        
        if ( y + 1 < rows ) {
            grad[1] = (u[z*size2d + (y+1)*cols + x] - uidx) / dy;
        }else{
            grad[0] = (u[z*size2d + (y-1)*cols + x] - uidx) / dy;
        }
        
        if ( x + 1 < cols ) {
            grad[2] = (u[z*size2d + y*cols + (x+1)] - uidx) / dx;
        }else{
            grad[0] =  (u[z*size2d + y*cols + (x-1)] - uidx) / dx;
        }
    }

zezisme · 2024-10-21T12:56:54Z

@zezisme Thanks a lot for the explanation!

I think the best way to tackle the fact that the edge slices have different strength is to change the math, not the memory.

Can you see if we change the lines I linked before to the following fixed the problem? Essentially this changes the boundary conditions from Neumman to mirror.

   __device__ __inline__
            float divergence(const float* pz, const float* py, const float* px,
            long z, long y, long x, long depth, long rows, long cols,
            float dz, float dy, float dx)
    {
        long size2d = rows*cols;
        long idx = z * size2d + y * cols + x;
        float _div = 0.0f;
        
        if ( z - 1 >= 0 ) {
            _div += (pz[idx] - pz[(z-1)*size2d + y*cols + x]) / dz;
        } else {
            _div += (pz[idx]-pz[(z+1)*size2d + y*cols + x]) / dz;
        }
        
        if ( y - 1 >= 0 ) {
            _div += (py[idx] - py[z*size2d + (y-1)*cols + x]) / dy;
        } else {
            _div += (py[idx] - py[z*size2d + (y+1)*cols + x]) / dy;
        }
        
        if ( x - 1 >= 0 ) {
            _div += (px[idx] - px[z*size2d + y*cols + (x-1)]) / dx;
        } else {
            _div += (px[idx] - px[z*size2d + y*cols + (x+1)]) / dx;
        }
        
        return _div;
    }
    
    __device__ __inline__
            void gradient(const float* u, float* grad,
            long z, long y, long x,
            long depth, long rows, long cols,
            float dz, float dy, float dx)
    {
        long size2d = rows*cols;
        long idx = z * size2d + y * cols + x;
        
        float uidx = u[idx];
        
        if ( z + 1 < depth ) {
            grad[0] = (u[(z+1)*size2d + y*cols + x] - uidx) / dz;
        }else{
            grad[0] = (u[(z-1)*size2d + y*cols + x] - uidx) / dz;
        }
        
        if ( y + 1 < rows ) {
            grad[1] = (u[z*size2d + (y+1)*cols + x] - uidx) / dy;
        }else{
            grad[0] = (u[z*size2d + (y-1)*cols + x] - uidx) / dy;
        }
        
        if ( x + 1 < cols ) {
            grad[2] = (u[z*size2d + y*cols + (x+1)] - uidx) / dx;
        }else{
            grad[0] =  (u[z*size2d + y*cols + (x-1)] - uidx) / dx;
        }
    }

Hello @AnderBiguri
This doesn't seem to work, there is still strong noise at the boundary slice. I guess it is because of the existence of buffer that the actual curr boundary slice is not detected

AnderBiguri · 2024-10-21T13:37:49Z

Ah you may be right... Let met think about it a little more. Indeed I think we always call the kernels with a z value of size_slice+buffer*2 which perhaps we should not do for edge slices (first and last).

zezisme · 2024-10-21T14:54:54Z

Ah you may be right... Let met think about it a little more. Indeed I think we always call the kernels with a z value of size_slice+buffer*2 which perhaps we should not do for edge slices (first and last).

Yes, do you have a better solution now? I have tried passing a parameter to the kernels to identify the real boundary slice, but I meet a new problem, and I am a little confused now.

AnderBiguri · 2024-10-21T15:11:40Z

@zezisme maybe your solution is the best, but I need some time to test and play with multi-GPU cases to ensure its right.

An alternative for now is to add some slices to the input image to im3Ddenoise() and remove them after the GPU call. Computationally more expensive but should do the trick if you need results soon. Otherwise its possible that what you are doing (filling the buffer with edge slices) is the right way to go.

zezisme · 2024-10-21T15:23:17Z

@zezisme maybe your solution is the best, but I need some time to test and play with multi-GPU cases to ensure its right.

An alternative for now is to add some slices to the input image to im3Ddenoise() and remove them after the GPU call. Computationally more expensive but should do the trick if you need results soon. Otherwise its possible that what you are doing (filling the buffer with edge slices) is the right way to go.

OK，thank you very much！I will choose the solution that adding some slices to the image before TVdenoising. And look forward to your subsequent more effective solutions！

AnderBiguri · 2024-10-28T16:32:35Z

hi @zezisme I will be quite busy the next weeks, so if you want to try to produce a nice piece of code that would also be multi-GPU compatible to fix the last error, feel free to give it a shot and make a Pull Request.

If not, no worries, I'll try to tackle this later on. Just that I won't have time for it in the next few weeks.

AnderBiguri · 2024-11-08T17:44:59Z

I made a PR, can you test it @zezisme to see if it fixes the issue? I don't have a computer with multiple GPUs and enough RAM to test it right now.
#609

AnderBiguri added a commit that referenced this issue Oct 21, 2024

Attempt to do something about #594

cf42782

AnderBiguri mentioned this issue Nov 7, 2024

'OS_ASD_POCS' reconstruction results in visible streaks #607

Open

AnderBiguri mentioned this issue Nov 8, 2024

Fix memory buffer issues in tv denoising #609

Merged

AnderBiguri closed this as completed in #609 Nov 11, 2024

AnderBiguri closed this as completed in af01f3f Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Artifacts in 3D-TV Denoised Images #594

Artifacts in 3D-TV Denoised Images #594

zezisme commented Oct 17, 2024 •

edited

Loading

AnderBiguri commented Oct 17, 2024

AnderBiguri commented Oct 17, 2024

zezisme commented Oct 17, 2024

AnderBiguri commented Oct 17, 2024

zezisme commented Oct 17, 2024

AnderBiguri commented Oct 17, 2024

zezisme commented Oct 18, 2024

AnderBiguri commented Oct 18, 2024

AnderBiguri commented Oct 18, 2024

zezisme commented Oct 18, 2024 •

edited

Loading

zezisme commented Oct 19, 2024 •

edited

Loading

zezisme commented Oct 19, 2024 •

edited

Loading

zezisme commented Oct 20, 2024 •

edited

Loading

AnderBiguri commented Oct 20, 2024

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 21, 2024

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 21, 2024

zezisme commented Oct 21, 2024 •

edited

Loading

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 21, 2024

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 21, 2024

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 21, 2024

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 21, 2024

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 28, 2024

AnderBiguri commented Nov 8, 2024

Artifacts in 3D-TV Denoised Images #594

Artifacts in 3D-TV Denoised Images #594

Comments

zezisme commented Oct 17, 2024 • edited Loading

Actual Behavior

Code to reproduce the problem (If applicable)

Specifications

AnderBiguri commented Oct 17, 2024

AnderBiguri commented Oct 17, 2024

zezisme commented Oct 17, 2024

AnderBiguri commented Oct 17, 2024

zezisme commented Oct 17, 2024

AnderBiguri commented Oct 17, 2024

zezisme commented Oct 18, 2024

AnderBiguri commented Oct 18, 2024

AnderBiguri commented Oct 18, 2024

zezisme commented Oct 18, 2024 • edited Loading

zezisme commented Oct 19, 2024 • edited Loading

zezisme commented Oct 19, 2024 • edited Loading

zezisme commented Oct 20, 2024 • edited Loading

AnderBiguri commented Oct 20, 2024

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 21, 2024

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 21, 2024

zezisme commented Oct 21, 2024 • edited Loading

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 21, 2024

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 21, 2024

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 21, 2024

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 21, 2024

zezisme commented Oct 21, 2024

AnderBiguri commented Oct 28, 2024

AnderBiguri commented Nov 8, 2024

zezisme commented Oct 17, 2024 •

edited

Loading

zezisme commented Oct 18, 2024 •

edited

Loading

zezisme commented Oct 19, 2024 •

edited

Loading

zezisme commented Oct 19, 2024 •

edited

Loading

zezisme commented Oct 20, 2024 •

edited

Loading

zezisme commented Oct 21, 2024 •

edited

Loading