Skip to content

Commit

Permalink
Merge pull request #21 from amine0110/dev
Browse files Browse the repository at this point in the history
v0.0.10
  • Loading branch information
amine0110 authored Dec 30, 2023
2 parents cfc7179 + 548fba4 commit 1b4b369
Show file tree
Hide file tree
Showing 16 changed files with 321 additions and 22 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ For more installation instructions, please see [this](https://github.com/amine01
For detailed usage and examples, refer to the [tutorials](./tutorials) directory and [documentation](./docs).

## What is New?
The latest release contains some new exciting features, if you want to know more about the new features you can read the [release features documentation](./docs/release_features.md).
The latest release contains some new exciting features, if you want to know more about the new features you can read the [release features documentation](https://github.com/amine0110/pycad/tree/main/docs/releases/v_0_0_10.md).

## Contributions & Support

Expand Down
25 changes: 25 additions & 0 deletions docs/releases/v_0_0_10.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# What is new in the release?
In this release `0.0.10` we worked mostly on fixing some bugs and upgrading the modules to something that works in general. We found that some modules were not working properly in some scenarions, for example the dicom windowing modules were having issues to read some dicoms, where a flag called `fore` needs to be activated, and since we are using the pydicom internally then the user does not have the ability to activate this flag when an issue is raised.

# Features
- Merge the NIFTI segmentation.
- New dataset added.
- Feautre upgrade.
- Bugs fixes

## MultiClassNiftiMerger
In some projects that I have seen, the dataset sometimes comes with multi label annotations, which means we want to segment multiple regions of interest from the same scan but for the dataset we have multiple NIFTI files instead of multiple classes in one NIFTI file. This can be useful when we want to segment only one of these labels but when it comes to multi class segmentation then this will not work and we need to adapt the dataset to have the correct structure. In this case, you can use the module `MultiClassNiftiMerger` to merge the NIFTI labels to one file.

You can find the module under `pycad.datasets`.

## Kidney Stone Dataset
A new dataset has been added to the list of our datasets. This dataset is a sort of 2D normal images that shows the kideny from a CT scan and for the annotation, we have bounding boxes, which means we can train a model for object detection, such as YOLOv5, ... If you want to do segmentation, then you can add the SAM model to predict the masks.

## Features Upgrade
Some of the modules needed to be improved, so we added some features to make the library user friendly. We changed the way we read the DICOMs, for some datasets, the DICOM files are hard to read using `pydicom.dcmread` directly, and a flag called `force` needs to be activated. And since the dcmread was being used inside the PYCAD library then we didn't have the ability to activate this flag from PYCAD. Now it is added and can be activated whenever necessary.

Another thing that has been updated, is in the NIFTI to PNG module, in some cases the code is not able to read or convert some NIFTI files, and in my experience, I needed always to have the list of the rejected cases return by the module so that I know what to do with (or delete them if necessary), so in the release, this feature has been added where you can save the list of the rejected cases and you can also directly delete the rejected cases when you complete the conversion. This feature is used mostly in the scenarios when you convert the volumes and segmentations at the same time, so if there an issue with the volumes (which is always the case), you won't have the same cases in the volumes and segmentations, because the segmentation will be all converted compared to the volumes, and this can be an issue since you will have additional PNG masks, and these masks needs to be deleted, and here you can use the feature discussed in this section. More about it, you can check `from pycad.converters import NiftiToPngConverter`.

Another thing that has been added in this release is the unittest for the module `MultiClassNiftiMerger`, since it is a new module and can create issues, so a unittest is added, this will allow us to test the module internally and also help you validate you merge if you need.

Other bugs fixes and features updates have been added to improve performance of the library.
File renamed without changes.
2 changes: 1 addition & 1 deletion pycad/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
# This file is part of the PYCAD library and is released under the MIT License:
# https://github.com/amine0110/pycad/blob/main/LICENSE

__version__ = "0.0.9"
__version__ = "0.0.10"
43 changes: 39 additions & 4 deletions pycad/converters/nifti_to_png.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,10 @@ class NiftiToPngConverter:
```
'''

def __init__(self, max_v=200, min_v=-200):
def __init__(self, max_v=None, min_v=None):
self.max_v = max_v
self.min_v = min_v
self.rejected_cases = []

def prepare_image(self, image_data, data_type='vol'):
'''
Expand All @@ -59,12 +60,12 @@ def convert_nifti_to_png(self, in_dir:str, out_dir:str, data_type:str):
This function is to take one nifti file and then convert it into png series, it keeps the same casename and then adds _indexID.\n
- `in_dir`: the path to one nifti file: nii | nii.gz\n
- `out_dir`: the path to save the png series\n
- `data_type`: the type of the input nifti file, is it a volume or segmentation?
- `data_type`: the type of the input nifti file, is it a volume or segmentation? This value is expecting either 'seg' for segmentation or 'vol' for volume.
'''
try:
new_img = sitk.ReadImage(in_dir)
img_array = sitk.GetArrayFromImage(new_img)
case_name = os.path.basename(in_dir)[:-7]
case_name = os.path.basename(in_dir).split('.')[0]

if not os.path.exists(out_dir):
os.makedirs(out_dir)
Expand All @@ -77,6 +78,7 @@ def convert_nifti_to_png(self, in_dir:str, out_dir:str, data_type:str):
img.save(f"{out_dir}/{case_name}_{str(i).zfill(4)}.png")
except:
print('Error with the file:', in_dir)
self.rejected_cases.append(os.path.basename(in_dir).split('.')[0])

def convert_nifti_to_png_dir(self, in_dir:str, out_dir:str, data_type:str):
'''
Expand All @@ -91,7 +93,7 @@ def convert_nifti_to_png_dir(self, in_dir:str, out_dir:str, data_type:str):
for case in tqdm(cases_list):
self.convert_nifti_to_png(case, out_dir, data_type)

def run(self, in_dir_vol:str = None, in_dir_seg:str = None, out_dir:str = None):
def run(self, in_dir_vol:str = None, in_dir_seg:str = None, out_dir:str = None, delete_none_converted=False):
'''
This function is the main function to call the conversion function for the volumes and segmentations.\n
- `in_dir_vol`: path to the input dir containing the volume files (nifti)\n
Expand All @@ -106,3 +108,36 @@ def run(self, in_dir_vol:str = None, in_dir_seg:str = None, out_dir:str = None):
if in_dir_seg:
print("Converting segmentation files")
self.convert_nifti_to_png_dir(in_dir_seg, out_dir + '/labels', 'seg') # convert the segmentation files

# Delete the none converted files
if delete_none_converted:
self.delete_images_by_name(out_dir + '/labels', self.rejected_cases)
self.delete_images_by_name(out_dir + '/images', self.rejected_cases)
print('The rejected cases have been deleted.')

# Show info
print(f"INFO: the conversions is done with {len(os.listdir(out_dir + '/labels'))} labels and {len(os.listdir(out_dir + '/images'))} images.")

def delete_images_by_name(self, folder_path, names_list):
"""
Deletes images from a specified folder whose names contain any of the strings in the provided list.
### Params
- folder_path: Path to the folder containing the images.
- name_list: List of strings. Images containing any of these strings in their names will be deleted.
"""
# Check if the folder exists
if not os.path.exists(folder_path):
print(f"Folder {folder_path} does not exist.")
return

# List of image extensions to consider
image_extensions = ['png', 'jpg', 'jpeg']

# Iterate over each name in the list
for name in names_list:
# Search for images that contain the specified name and have the defined extensions
for ext in image_extensions:
for filename in glob(os.path.join(folder_path, f'*{name}*.{ext}')):
print(f"Deleting {filename}")
os.remove(filename)
3 changes: 2 additions & 1 deletion pycad/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@
from .png_to_txt_ml import PngToTxtConverterML
from .data_splitter import DataSplitter
from .yolo_dataset_yaml import YOLODatasetYaml
from .monai_dataset_json import MONAIDatasetOrganizer
from .monai_dataset_json import MONAIDatasetOrganizer
from .nifti_merger import MultiClassNiftiMerger
6 changes: 6 additions & 0 deletions pycad/datasets/detection/diverse/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Copyright (c) 2023 PYCAD
# This file is part of the PYCAD library and is released under the MIT License:
# https://github.com/amine0110/pycad/blob/main/LICENSE


from .kidney_stone_dataset import KidneyStoneDataset
81 changes: 81 additions & 0 deletions pycad/datasets/detection/diverse/kidney_stone_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Copyright (c) 2023 PYCAD
# This file is part of the PYCAD library and is released under the MIT License:
# https://github.com/amine0110/pycad/blob/main/LICENSE


import os
import gdown
import zipfile
import requests

class KidneyStoneDataset:
'''
This class is for the kidney stone segmentation dataset from the decathlon dataset.
You can get more information about it using `info()` function.
### Example usage
```Python
from pycad.dataset.detection.diverse import KidneyStoneDataset
kidney_stone_dataset = KidneyStoneDataset()
kidney_stone_dataset.info() # Print dataset information
kidney_stone_dataset.download('all') # Download and extract subgroup all
```
'''
def __init__(self, dataset_size=1300):
self.dataset_size = dataset_size
self.dataset_subgroups = {
'all': 'https://drive.google.com/uc?id=1bSOeebGa92qc42CiIqUcfZjOWC2rdlrE'
}
self.base_path = 'datasets/'

def info(self):
print(f"Kidney Stone Dataset from Roboflow dataset. This is a collection of 2D images with bounding boxes for the detection.")
print(f"Total Cases: {self.dataset_size}")
print(f"Subgroups: {self.dataset_size}")
print("Source: https://universe.roboflow.com/selam-h8tid/kidney-stone-detection-fwubk/dataset/1")

def download(self, subgroup, path=None):
if subgroup not in self.dataset_subgroups:
print(f"No subgroup {subgroup} available.")
return

if subgroup.isdigit() and int(subgroup) > self.dataset_size:
print(f"Subgroup {subgroup} exceeds dataset size.")
return

download_url = self.dataset_subgroups[subgroup]
save_path = path if path else self.base_path
self._download_and_extract(download_url, save_path, subgroup)

def _download_and_extract(self, url, path, subgroup):
if not os.path.exists(path):
os.makedirs(path)

try:
file_path = os.path.join(path, f'kidney_stone{subgroup}.zip')
gdown.download(url, file_path, quiet=False)

# Check file size after download
if os.path.getsize(file_path) < 1024: # Example size threshold (1KB)
print("Downloaded file is too small, might be an error.")
return

with zipfile.ZipFile(file_path, 'r') as zip_ref:
zip_ref.extractall(path)
print(f"Downloaded and extracted at {path}")

# Delete the zip file after extraction
os.remove(file_path)
print(f"Deleted zip file: {file_path}")

except requests.exceptions.RequestException as e:
print("Error in downloading the file: ", e)
except zipfile.BadZipFile:
print("Error in extracting the file: File may be corrupted or not a zip file.")
except Exception as e:
print("An unexpected error occurred: ", e)
if os.path.exists(file_path):
os.remove(file_path)
print(f"Deleted incomplete zip file: {file_path}")
100 changes: 100 additions & 0 deletions pycad/datasets/nifti_merger.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Copyright (c) 2023 PYCAD
# This file is part of the PYCAD library and is released under the MIT License:
# https://github.com/amine0110/pycad/blob/main/LICENSE


import os
import shutil
import nibabel as nib
import numpy as np
from glob import glob

class MultiClassNiftiMerger:
'''
If you have multiple nifti files representing different classes for the same patient, then this
function is for you, it helps you merge the nifti files into one nifti file.
### Params
- volume_path: Path to the volume NIfTI file.
- class_paths: List of paths to the class NIfTI files.
- output_dir: Directory where the merged files will be saved.
- move_volumes: Flag to control whether to move corresponding volumes.
### Example of usage
```Python
# Example usage for directories
from pycad.datasets import MultiClassNiftiMerger
volume_dir = 'datasets/hips/hip_right100/volumes'
class_dirs = ['datasets/hips/hip_right100/segmentations', 'datasets/hips/hip_left100/segmentations']
output_dir = 'datasets/hips/merged'
MultiClassNiftiMerger.process_directories(volume_dir, class_dirs, output_dir, move_volumes=True)
```
'''

def __init__(self, volume_path, class_paths, output_dir, move_volumes=False):
self.volume_path = volume_path
self.class_paths = class_paths
self.output_dir = output_dir
self.move_volumes = move_volumes

self.segmentations_dir = os.path.join(output_dir, 'segmentations')
self.volumes_dir = os.path.join(output_dir, 'volumes')

def check_files(self):
# Check if files exist
paths_to_check = [self.volume_path] + self.class_paths
for path in paths_to_check:
if not os.path.exists(path):
raise FileNotFoundError(f"File not found: {path}")

def combine_classes(self):
self.check_files()

# Create directories for output
os.makedirs(self.segmentations_dir, exist_ok=True)
if self.move_volumes:
os.makedirs(self.volumes_dir, exist_ok=True)

# Initialize a combined array with zeros
first_nifti = nib.load(self.class_paths[0])
combined_classes = np.zeros(first_nifti.shape, dtype=np.int16)

# Assign new class labels
for idx, class_path in enumerate(self.class_paths):
class_nifti = nib.load(class_path)
class_data = class_nifti.get_fdata()
combined_classes[class_data > 0] = idx + 1

# Create a new NIfTI image for the combined classes
combined_nifti = nib.Nifti1Image(combined_classes, affine=class_nifti.affine)

# Save the new NIfTI file
combined_filename = os.path.basename(self.volume_path).replace('volume', 'combined')
combined_path = os.path.join(self.segmentations_dir, combined_filename)
nib.save(combined_nifti, combined_path)

# Optionally move the volume file
if self.move_volumes:
shutil.copy(self.volume_path, self.volumes_dir)

print(f"Combined NIfTI file saved at: {combined_path}")

@staticmethod
def process_directories(volume_dir, class_dirs, output_dir, ext='.nii.gz', move_volumes=False):
volume_files = glob(os.path.join(volume_dir, f'*{ext}'))

for volume_file in volume_files:
volume_filename = os.path.basename(volume_file)
class_paths = [glob(os.path.join(class_dir, f"{volume_filename.split('.')[0]}*{ext}")) for class_dir in class_dirs]
class_paths = [item for sublist in class_paths for item in sublist] # Flatten list

if class_paths:
merger = MultiClassNiftiMerger(
volume_file,
class_paths,
output_dir,
move_volumes
)
merger.combine_classes()
8 changes: 4 additions & 4 deletions pycad/preprocessing/dicom_anonymization.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,12 @@ def list_anonymization_fields(self):
print(f"- {field}")
return anonymization_fields

def anonymize_dicoms(self, fields_to_anonymize):
def anonymize_dicoms(self, fields_to_anonymize, force=False):
dicom_files = glob(os.path.join(self.input_dir, '*.dcm'))
for file_path in tqdm(dicom_files, desc="Anonymizing"):
try:
# Read the DICOM file
dicom = pydicom.read_file(file_path)
dicom = pydicom.read_file(file_path, force=force)

# Anonymize the fields specified
for field in fields_to_anonymize:
Expand All @@ -64,7 +64,7 @@ def anonymize_dicoms(self, fields_to_anonymize):
except Exception as e:
print(f"Error anonymizing {file_path}: {e}")

def run(self):
def run(self, force=False):
# List fields that can be anonymized
available_fields = self.list_anonymization_fields()

Expand All @@ -79,7 +79,7 @@ def run(self):
confirm = input("Do you want to proceed with anonymization? (yes/no): ")
if confirm.lower() == 'yes':
# Perform the anonymization
self.anonymize_dicoms(fields_to_anonymize)
self.anonymize_dicoms(fields_to_anonymize, force=force)
print("Anonymization complete.")
else:
print("Anonymization canceled.")
8 changes: 4 additions & 4 deletions pycad/preprocessing/dicom_ct_windowing.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,9 @@ def __init__(self, window_center=40, window_width=400, visualize=False):
handler.setFormatter(formatter)
self.logger.addHandler(handler)

def preprocess_ct_image(self, dicom_path, output_path, i):
def preprocess_ct_image(self, dicom_path, output_path, i, force=False):
# Load the DICOM file
dcm = pydicom.read_file(dicom_path)
dcm = pydicom.read_file(dicom_path, force=force)
original_image = dcm.pixel_array.astype(float)

# Rescale to Hounsfield units (HU)
Expand Down Expand Up @@ -84,7 +84,7 @@ def preprocess_ct_image(self, dicom_path, output_path, i):

return original_image, image

def process_directory(self, input_dir, output_dir):
def process_directory(self, input_dir, output_dir, force=False):
"""
Processes all DICOM files in a given directory, applies windowing, and saves the output.
"""
Expand All @@ -100,7 +100,7 @@ def process_directory(self, input_dir, output_dir):
example_image = None
for i, dicom_path in enumerate(sorted(dicom_paths)):
try:
original_image, preprocessed_image = self.preprocess_ct_image(dicom_path, output_dir, i)
original_image, preprocessed_image = self.preprocess_ct_image(dicom_path, output_dir, i, force=force)
if self.visualize and example_image is None:
example_image = (original_image, preprocessed_image)
except Exception as e:
Expand Down
Loading

0 comments on commit 1b4b369

Please sign in to comment.