Merge pull request #21 from amine0110/dev

v0.0.10
amine0110 · Dec 30, 2023 · 1b4b369 · 1b4b369
2 parents cfc7179 + 548fba4
commit 1b4b369
Show file tree

Hide file tree

Showing 16 changed files with 321 additions and 22 deletions.
diff --git a/README.md b/README.md
@@ -27,7 +27,7 @@ For more installation instructions, please see [this](https://github.com/amine01
 For detailed usage and examples, refer to the [tutorials](./tutorials) directory and [documentation](./docs).
 
 ## What is New?
-The latest release contains some new exciting features, if you want to know more about the new features you can read the [release features documentation](./docs/release_features.md).
+The latest release contains some new exciting features, if you want to know more about the new features you can read the [release features documentation](https://github.com/amine0110/pycad/tree/main/docs/releases/v_0_0_10.md).
 
 ## Contributions & Support
 

diff --git a/docs/releases/v_0_0_10.md b/docs/releases/v_0_0_10.md
@@ -0,0 +1,25 @@
+# What is new in the release?
+In this release `0.0.10` we worked mostly on fixing some bugs and upgrading the modules to something that works in general. We found that some modules were not working properly in some scenarions, for example the dicom windowing modules were having issues to read some dicoms, where a flag called `fore` needs to be activated, and since we are using the pydicom internally then the user does not have the ability to activate this flag when an issue is raised.
+
+# Features
+- Merge the NIFTI segmentation.
+- New dataset added.
+- Feautre upgrade.
+- Bugs fixes
+
+## MultiClassNiftiMerger
+In some projects that I have seen, the dataset sometimes comes with multi label annotations, which means we want to segment multiple regions of interest from the same scan but for the dataset we have multiple NIFTI files instead of multiple classes in one NIFTI file. This can be useful when we want to segment only one of these labels but when it comes to multi class segmentation then this will not work and we need to adapt the dataset to have the correct structure. In this case, you can use the module `MultiClassNiftiMerger` to merge the NIFTI labels to one file.
+
+You can find the module under `pycad.datasets`.
+
+## Kidney Stone Dataset
+A new dataset has been added to the list of our datasets. This dataset is a sort of 2D normal images that shows the kideny from a CT scan and for the annotation, we have bounding boxes, which means we can train a model for object detection, such as YOLOv5, ... If you want to do segmentation, then you can add the SAM model to predict the masks.
+
+## Features Upgrade
+Some of the modules needed to be improved, so we added some features to make the library user friendly. We changed the way we read the DICOMs, for some datasets, the DICOM files are hard to read using `pydicom.dcmread` directly, and a flag called `force` needs to be activated. And since the dcmread was being used inside the PYCAD library then we didn't have the ability to activate this flag from PYCAD. Now it is added and can be activated whenever necessary.
+
+Another thing that has been updated, is in the NIFTI to PNG module, in some cases the code is not able to read or convert some NIFTI files, and in my experience, I needed always to have the list of the rejected cases return by the module so that I know what to do with (or delete them if necessary), so in the release, this feature has been added where you can save the list of the rejected cases and you can also directly delete the rejected cases when you complete the conversion. This feature is used mostly in the scenarios when you convert the volumes and segmentations at the same time, so if there an issue with the volumes (which is always the case), you won't have the same cases in the volumes and segmentations, because the segmentation will be all converted compared to the volumes, and this can be an issue since you will have additional PNG masks, and these masks needs to be deleted, and here you can use the feature discussed in this section. More about it, you can check `from pycad.converters import NiftiToPngConverter`.
+
+Another thing that has been added in this release is the unittest for the module `MultiClassNiftiMerger`, since it is a new module and can create issues, so a unittest is added, this will allow us to test the module internally and also help you validate you merge if you need.
+
+Other bugs fixes and features updates have been added to improve performance of the library.
diff --git a/docs/release_features.md → docs/releases/v_0_0_9.md b/docs/release_features.md → docs/releases/v_0_0_9.md
diff --git a/pycad/__init__.py b/pycad/__init__.py
@@ -2,4 +2,4 @@
 # This file is part of the PYCAD library and is released under the MIT License:
 # https://github.com/amine0110/pycad/blob/main/LICENSE
 
-__version__ = "0.0.9"
+__version__ = "0.0.10"
diff --git a/pycad/converters/nifti_to_png.py b/pycad/converters/nifti_to_png.py
@@ -30,9 +30,10 @@ class NiftiToPngConverter:
     ```
     '''
 
-    def __init__(self, max_v=200, min_v=-200):
+    def __init__(self, max_v=None, min_v=None):
         self.max_v = max_v
         self.min_v = min_v
+        self.rejected_cases = []
 
     def prepare_image(self, image_data, data_type='vol'):
         '''
@@ -59,12 +60,12 @@ def convert_nifti_to_png(self, in_dir:str, out_dir:str, data_type:str):
         This function is to take one nifti file and then convert it into png series, it keeps the same casename and then adds _indexID.\n
         - `in_dir`: the path to one nifti file: nii | nii.gz\n
         - `out_dir`: the path to save the png series\n
-        - `data_type`: the type of the input nifti file, is it a volume or segmentation?
+        - `data_type`: the type of the input nifti file, is it a volume or segmentation? This value is expecting either 'seg' for segmentation or 'vol' for volume.
         '''
         try:
             new_img = sitk.ReadImage(in_dir)
             img_array = sitk.GetArrayFromImage(new_img)
-            case_name = os.path.basename(in_dir)[:-7]
+            case_name = os.path.basename(in_dir).split('.')[0]
 
             if not os.path.exists(out_dir):
                 os.makedirs(out_dir)
@@ -77,6 +78,7 @@ def convert_nifti_to_png(self, in_dir:str, out_dir:str, data_type:str):
                 img.save(f"{out_dir}/{case_name}_{str(i).zfill(4)}.png")
         except:
             print('Error with the file:', in_dir)
+            self.rejected_cases.append(os.path.basename(in_dir).split('.')[0])
 
     def convert_nifti_to_png_dir(self, in_dir:str, out_dir:str, data_type:str):
         '''
@@ -91,7 +93,7 @@ def convert_nifti_to_png_dir(self, in_dir:str, out_dir:str, data_type:str):
         for case in tqdm(cases_list):
             self.convert_nifti_to_png(case, out_dir, data_type)
 
-    def run(self, in_dir_vol:str = None, in_dir_seg:str = None, out_dir:str = None):
+    def run(self, in_dir_vol:str = None, in_dir_seg:str = None, out_dir:str = None, delete_none_converted=False):
         '''
         This function is the main function to call the conversion function for the volumes and segmentations.\n
         - `in_dir_vol`: path to the input dir containing the volume files (nifti)\n
@@ -106,3 +108,36 @@ def run(self, in_dir_vol:str = None, in_dir_seg:str = None, out_dir:str = None):
         if in_dir_seg:
             print("Converting segmentation files")
             self.convert_nifti_to_png_dir(in_dir_seg, out_dir + '/labels', 'seg') # convert the segmentation files
+
+        # Delete the none converted files
+        if delete_none_converted:
+            self.delete_images_by_name(out_dir + '/labels', self.rejected_cases)
+            self.delete_images_by_name(out_dir + '/images', self.rejected_cases)
+            print('The rejected cases have been deleted.')
+
+        # Show info
+        print(f"INFO: the conversions is done with {len(os.listdir(out_dir + '/labels'))} labels and {len(os.listdir(out_dir + '/images'))} images.")
+
+    def delete_images_by_name(self, folder_path, names_list):
+        """
+        Deletes images from a specified folder whose names contain any of the strings in the provided list.
+        
+        ### Params
+        - folder_path: Path to the folder containing the images.
+        - name_list: List of strings. Images containing any of these strings in their names will be deleted.
+        """
+        # Check if the folder exists
+        if not os.path.exists(folder_path):
+            print(f"Folder {folder_path} does not exist.")
+            return
+
+        # List of image extensions to consider
+        image_extensions = ['png', 'jpg', 'jpeg']
+
+        # Iterate over each name in the list
+        for name in names_list:
+            # Search for images that contain the specified name and have the defined extensions
+            for ext in image_extensions:
+                for filename in glob(os.path.join(folder_path, f'*{name}*.{ext}')):
+                    print(f"Deleting {filename}")
+                    os.remove(filename)
diff --git a/pycad/datasets/__init__.py b/pycad/datasets/__init__.py
@@ -7,4 +7,5 @@
 from .png_to_txt_ml import PngToTxtConverterML
 from .data_splitter import DataSplitter
 from .yolo_dataset_yaml import YOLODatasetYaml
-from .monai_dataset_json import MONAIDatasetOrganizer
+from .monai_dataset_json import MONAIDatasetOrganizer
+from .nifti_merger import MultiClassNiftiMerger
diff --git a/pycad/datasets/detection/diverse/__init__.py b/pycad/datasets/detection/diverse/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) 2023 PYCAD
+# This file is part of the PYCAD library and is released under the MIT License:
+# https://github.com/amine0110/pycad/blob/main/LICENSE
+
+
+from .kidney_stone_dataset import KidneyStoneDataset
diff --git a/pycad/datasets/detection/diverse/kidney_stone_dataset.py b/pycad/datasets/detection/diverse/kidney_stone_dataset.py
@@ -0,0 +1,81 @@
+# Copyright (c) 2023 PYCAD
+# This file is part of the PYCAD library and is released under the MIT License:
+# https://github.com/amine0110/pycad/blob/main/LICENSE
+
+
+import os
+import gdown
+import zipfile
+import requests
+
+class KidneyStoneDataset:
+    '''
+    This class is for the kidney stone segmentation dataset from the decathlon dataset.
+    You can get more information about it using `info()` function.
+
+    ### Example usage
+
+    ```Python
+    from pycad.dataset.detection.diverse import KidneyStoneDataset
+    
+    kidney_stone_dataset = KidneyStoneDataset()
+    kidney_stone_dataset.info()  # Print dataset information
+    kidney_stone_dataset.download('all')  # Download and extract subgroup all
+    ```
+    '''
+    def __init__(self, dataset_size=1300):
+        self.dataset_size = dataset_size
+        self.dataset_subgroups = {
+            'all': 'https://drive.google.com/uc?id=1bSOeebGa92qc42CiIqUcfZjOWC2rdlrE'
+        }
+        self.base_path = 'datasets/'
+
+    def info(self):
+        print(f"Kidney Stone Dataset from Roboflow dataset. This is a collection of 2D images with bounding boxes for the detection.")
+        print(f"Total Cases: {self.dataset_size}")
+        print(f"Subgroups: {self.dataset_size}")
+        print("Source: https://universe.roboflow.com/selam-h8tid/kidney-stone-detection-fwubk/dataset/1")
+
+    def download(self, subgroup, path=None):
+        if subgroup not in self.dataset_subgroups:
+            print(f"No subgroup {subgroup} available.")
+            return
+
+        if subgroup.isdigit() and int(subgroup) > self.dataset_size:
+            print(f"Subgroup {subgroup} exceeds dataset size.")
+            return
+
+        download_url = self.dataset_subgroups[subgroup]
+        save_path = path if path else self.base_path
+        self._download_and_extract(download_url, save_path, subgroup)
+
+    def _download_and_extract(self, url, path, subgroup):
+        if not os.path.exists(path):
+            os.makedirs(path)
+
+        try:
+            file_path = os.path.join(path, f'kidney_stone{subgroup}.zip')
+            gdown.download(url, file_path, quiet=False)
+
+            # Check file size after download
+            if os.path.getsize(file_path) < 1024:  # Example size threshold (1KB)
+                print("Downloaded file is too small, might be an error.")
+                return
+
+            with zipfile.ZipFile(file_path, 'r') as zip_ref:
+                zip_ref.extractall(path)
+            print(f"Downloaded and extracted at {path}")
+
+            # Delete the zip file after extraction
+            os.remove(file_path)
+            print(f"Deleted zip file: {file_path}")
+
+        except requests.exceptions.RequestException as e:
+            print("Error in downloading the file: ", e)
+        except zipfile.BadZipFile:
+            print("Error in extracting the file: File may be corrupted or not a zip file.")
+        except Exception as e:
+            print("An unexpected error occurred: ", e)
+            if os.path.exists(file_path):
+                os.remove(file_path)
+                print(f"Deleted incomplete zip file: {file_path}")
diff --git a/pycad/datasets/nifti_merger.py b/pycad/datasets/nifti_merger.py
@@ -0,0 +1,100 @@
+# Copyright (c) 2023 PYCAD
+# This file is part of the PYCAD library and is released under the MIT License:
+# https://github.com/amine0110/pycad/blob/main/LICENSE
+
+
+import os
+import shutil
+import nibabel as nib
+import numpy as np
+from glob import glob
+
+class MultiClassNiftiMerger:
+    '''
+    If you have multiple nifti files representing different classes for the same patient, then this 
+    function is for you, it helps you merge the nifti files into one nifti file.
+
+    ### Params
+    - volume_path: Path to the volume NIfTI file.
+    - class_paths: List of paths to the class NIfTI files.
+    - output_dir: Directory where the merged files will be saved.
+    - move_volumes: Flag to control whether to move corresponding volumes.
+
+    ### Example of usage
+
+    ```Python
+    # Example usage for directories
+    from pycad.datasets import MultiClassNiftiMerger
+
+    volume_dir = 'datasets/hips/hip_right100/volumes'
+    class_dirs = ['datasets/hips/hip_right100/segmentations', 'datasets/hips/hip_left100/segmentations']
+    output_dir = 'datasets/hips/merged'
+    MultiClassNiftiMerger.process_directories(volume_dir, class_dirs, output_dir, move_volumes=True)
+    ```
+    '''
+
+    def __init__(self, volume_path, class_paths, output_dir, move_volumes=False):
+        self.volume_path = volume_path
+        self.class_paths = class_paths
+        self.output_dir = output_dir
+        self.move_volumes = move_volumes
+
+        self.segmentations_dir = os.path.join(output_dir, 'segmentations')
+        self.volumes_dir = os.path.join(output_dir, 'volumes')
+
+    def check_files(self):
+        # Check if files exist
+        paths_to_check = [self.volume_path] + self.class_paths
+        for path in paths_to_check:
+            if not os.path.exists(path):
+                raise FileNotFoundError(f"File not found: {path}")
+
+    def combine_classes(self):
+        self.check_files()
+
+        # Create directories for output
+        os.makedirs(self.segmentations_dir, exist_ok=True)
+        if self.move_volumes:
+            os.makedirs(self.volumes_dir, exist_ok=True)
+
+        # Initialize a combined array with zeros
+        first_nifti = nib.load(self.class_paths[0])
+        combined_classes = np.zeros(first_nifti.shape, dtype=np.int16)
+
+        # Assign new class labels
+        for idx, class_path in enumerate(self.class_paths):
+            class_nifti = nib.load(class_path)
+            class_data = class_nifti.get_fdata()
+            combined_classes[class_data > 0] = idx + 1
+
+        # Create a new NIfTI image for the combined classes
+        combined_nifti = nib.Nifti1Image(combined_classes, affine=class_nifti.affine)
+
+        # Save the new NIfTI file
+        combined_filename = os.path.basename(self.volume_path).replace('volume', 'combined')
+        combined_path = os.path.join(self.segmentations_dir, combined_filename)
+        nib.save(combined_nifti, combined_path)
+
+        # Optionally move the volume file
+        if self.move_volumes:
+            shutil.copy(self.volume_path, self.volumes_dir)
+
+        print(f"Combined NIfTI file saved at: {combined_path}")
+
+    @staticmethod
+    def process_directories(volume_dir, class_dirs, output_dir, ext='.nii.gz', move_volumes=False):
+        volume_files = glob(os.path.join(volume_dir, f'*{ext}'))
+
+        for volume_file in volume_files:
+            volume_filename = os.path.basename(volume_file)
+            class_paths = [glob(os.path.join(class_dir, f"{volume_filename.split('.')[0]}*{ext}")) for class_dir in class_dirs]
+            class_paths = [item for sublist in class_paths for item in sublist] # Flatten list
+
+            if class_paths:
+                merger = MultiClassNiftiMerger(
+                    volume_file,
+                    class_paths,
+                    output_dir,
+                    move_volumes
+                )
+                merger.combine_classes()
diff --git a/pycad/preprocessing/dicom_anonymization.py b/pycad/preprocessing/dicom_anonymization.py
@@ -47,12 +47,12 @@ def list_anonymization_fields(self):
             print(f"- {field}")
         return anonymization_fields
 
-    def anonymize_dicoms(self, fields_to_anonymize):
+    def anonymize_dicoms(self, fields_to_anonymize, force=False):
         dicom_files = glob(os.path.join(self.input_dir, '*.dcm'))
         for file_path in tqdm(dicom_files, desc="Anonymizing"):
             try:
                 # Read the DICOM file
-                dicom = pydicom.read_file(file_path)
+                dicom = pydicom.read_file(file_path, force=force)
 
                 # Anonymize the fields specified
                 for field in fields_to_anonymize:
@@ -64,7 +64,7 @@ def anonymize_dicoms(self, fields_to_anonymize):
             except Exception as e:
                 print(f"Error anonymizing {file_path}: {e}")
 
-    def run(self):
+    def run(self, force=False):
         # List fields that can be anonymized
         available_fields = self.list_anonymization_fields()
 
@@ -79,7 +79,7 @@ def run(self):
         confirm = input("Do you want to proceed with anonymization? (yes/no): ")
         if confirm.lower() == 'yes':
             # Perform the anonymization
-            self.anonymize_dicoms(fields_to_anonymize)
+            self.anonymize_dicoms(fields_to_anonymize, force=force)
             print("Anonymization complete.")
         else:
             print("Anonymization canceled.")
diff --git a/pycad/preprocessing/dicom_ct_windowing.py b/pycad/preprocessing/dicom_ct_windowing.py
@@ -48,9 +48,9 @@ def __init__(self, window_center=40, window_width=400, visualize=False):
         handler.setFormatter(formatter)
         self.logger.addHandler(handler)
 
-    def preprocess_ct_image(self, dicom_path, output_path, i):
+    def preprocess_ct_image(self, dicom_path, output_path, i, force=False):
         # Load the DICOM file
-        dcm = pydicom.read_file(dicom_path)
+        dcm = pydicom.read_file(dicom_path, force=force)
         original_image = dcm.pixel_array.astype(float)
 
         # Rescale to Hounsfield units (HU)
@@ -84,7 +84,7 @@ def preprocess_ct_image(self, dicom_path, output_path, i):
 
         return original_image, image
 
-    def process_directory(self, input_dir, output_dir):
+    def process_directory(self, input_dir, output_dir, force=False):
         """
         Processes all DICOM files in a given directory, applies windowing, and saves the output.
         """
@@ -100,7 +100,7 @@ def process_directory(self, input_dir, output_dir):
         example_image = None
         for i, dicom_path in enumerate(sorted(dicom_paths)):
             try:
-                original_image, preprocessed_image = self.preprocess_ct_image(dicom_path, output_dir, i)
+                original_image, preprocessed_image = self.preprocess_ct_image(dicom_path, output_dir, i, force=force)
                 if self.visualize and example_image is None:
                     example_image = (original_image, preprocessed_image)
             except Exception as e: