ribfrac guide

PhilChina · May 16, 2021 · e8f3f5b · e8f3f5b
1 parent 109ca2b
commit e8f3f5b
Show file tree

Hide file tree

Showing 5 changed files with 155 additions and 31 deletions.
diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@ The resulting self-configuring method, nnDetection, adapts itself without any ma
 We demonstrate the effectiveness of nnDetection on two public benchmarks, ADAM and LUNA16, and propose 10 further public data sets for a comprehensive evaluation of medical object detection methods.
 
 # Installation
-## Docker Installation
+## Docker
 The easiest way to get started with nnDetection is the provided is to build a Docker Container with the provided Dockerfile.
 
 Please install docker and [nvidia-docker2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) before continuing.
@@ -43,7 +43,7 @@ Warning:
 When running a training inside the container it is necessary to [increase the shared memory](https://stackoverflow.com/questions/30210362/how-to-increase-the-size-of-the-dev-shm-in-docker-container) (via --shm-size).
 
 
-## Source Installation
+## Source
 1. Install CUDA (>10.1) and cudnn (make sure to select [compatible versions](https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html)!)
 2. [Optional] Depending on your GPU you might need to set `TORCH_CUDA_ARCH_LIST`, check [compute capabilities](https://developer.nvidia.com/cuda-gpus) here.
 3. Install [torch](https://pytorch.org/) (make sure to match the pytorch and CUDA versions!) (requires pytorch >1.7+)
@@ -69,7 +69,7 @@ Run the following command in the terminal (!not! in pytorch root folder) to veri
 python -c "import torch; import nndet._C; import nndet"
 ```
 
-To test the whole installation please run the Toy Dataset example.
+To test the whole installation please run the Toy Data set example.
 </details>
 
 <details close>
@@ -123,18 +123,18 @@ It can be imported from `nndet.ptmodule` and example can be found in `nndet.ptmo
 </details>
 
 # Experiments & Data
-The datasets used for our experiments are not hosted or maintained by us, please give credit to the authors of the datasets.
-Some of the labels were corrected in datasets which we converted and can be downloaded.
-The `Reproducing Experiments` section has an overview of multiple guides which explain the preparation of the datasets.
+The data sets used for our experiments are not hosted or maintained by us, please give credit to the authors of the data sets.
+Some of the labels were corrected in data sets which we converted and can be downloaded.
+The `Reproducing Experiments` section has an overview of multiple guides which explain the preparation of the data sets.
 
-## Toy Dataset
-Running `nndet_example` will automatically generate an example dataset with 3D squares and sqaures with holes which can be used to test the installation or experiment with prototype code (it is still necessary to run the other nndet commands to process/train/predict the dataset).
+## Toy Data set
+Running `nndet_example` will automatically generate an example data set with 3D squares and sqaures with holes which can be used to test the installation or experiment with prototype code (it is still necessary to run the other nndet commands to process/train/predict the data set).
 
 ```bash 
 # create data to test installation/environment (10 train 10 test)
 nndet_example
 
-# create full dataset for prototyping (1000 train 1000 test)
+# create full data set for prototyping (1000 train 1000 test)
 nndet_example --full [--num_processes]
 ```
 
@@ -146,20 +146,20 @@ After running the generation script follow the `Planning`, `Training` and `Infer
 <div align="center">
 
 | <!-- --> | <!-- --> | <!-- --> |
-|:--------------------------------:|:----------------------:|:----------------------------:|
-| [Task 003 Liver](#TODO)          | [Task 011 Kits](#TODO) | [Task 020 RibFrac](#TODO)    |
-| [Task 007 Pancreas](#TODO)       | [Task 012 LIDC](#TODO) | [Task 021 ProstateX](#TODO)  |
-| [Task 008 Hepatic Vessel](#TODO) | [Task 017 CADA](#TODO) | [Task 025 LymphNodes](#TODO) |
-| [Task 010 Colon](#TODO)          | [Task 019 ADAM](#TODO) | [Task 016 Luna](#TODO)       |
+|:----------------------------------------------------------------:|:-------------------------------------------------:|:---------------------------------------:|
+| [Task 003 Liver](/projects/Task001_Decathlon/README.md)          | [Task 011 Kits](/projects/Task011_Kits/README.md) | [Task 020 RibFrac](/projects/Task020_RibFrac/README.md)      |
+| [Task 007 Pancreas](/projects/Task001_Decathlon/README.md)       | [Task 012 LIDC](/projects/Task012_LIDC/README.md) | [Task 021 ProstateX](/projects/Task021_ProstateX/README.md)  |
+| [Task 008 Hepatic Vessel](/projects/Task001_Decathlon/README.md) | [Task 017 CADA](/projects/Task017_CADA/README.md) | [Task 025 LymphNodes](/projects/Task025_LymphNodes/README.md) |
+| [Task 010 Colon](/projects/Task001_Decathlon/README.md)          | [Task 019 ADAM](/projects/Task019_ADAM/README.md) | [Task 016 Luna](/projects/Task016_Luna/README.md)         |
 
 </div>
 
-## Adding New Datasets
-nnDetection relies on a standardized input format which is very similar to the [nnU-Net](https://github.com/MIC-DKFZ/nnUNet) format and allows easy integration of new datasets.
+## Adding New Data sets
+nnDetection relies on a standardized input format which is very similar to the [nnU-Net](https://github.com/MIC-DKFZ/nnUNet) format and allows easy integration of new data sets.
 More details about the format can be found below.
 
 ### Folders
-All datasets should reside inside `Task[Number]_[Name]` folder inside the specified detection data folder (et the path to this folder with the `det_data` environment flag).
+All data sets should reside inside `Task[Number]_[Name]` folder inside the specified detection data folder (et the path to this folder with the `det_data` environment flag).
 To avoid conflicts with our provided pretrained models we recommend to use task numbers starting from 100.
 An overview is provided below ([Name] symbolise folder, `-` symbolise files, indents refer to substructures)
 
@@ -186,8 +186,8 @@ ${det_data}
         ...
 ```
 
-### Dataset Info
-`dataset.yaml` or `dataset.json` provides general information about the dataset:
+### Data set Info
+`dataset.yaml` or `dataset.json` provides general information about the data set:
 Note: [Important] Classes and modalities start with index 0!
 ```yaml
 task: Task000D3_Example
@@ -199,18 +199,18 @@ dim: 3 # number of spatial dimensions of the data
 target_class: # define class of interest for patient level evaluations # TODO: check if this should be included
 test_labels: True # manually splitted test set
 
-labels: # classes of dataset; need to start at 0
+labels: # classes of data set; need to start at 0
     "0": "Square"
     "1": "SquareHole"
 
-modalities: # modalities of dataset; need to start at 0
+modalities: # modalities of data set; need to start at 0
     "0": "CT"
 ```
 
 ### Image Format
 nnDetection uses the same image format as nnU-Net.
 Each case consists of at least one 3D nifty file with one modalityand are saved in the `images` folders.
-If multiple modalities are available, each modalities uses a separate file and the sequence at the end of the name indicates the modality (corresponds to the number specified in the dataset file).
+If multiple modalities are available, each modalities uses a separate file and the sequence at the end of the name indicates the modality (corresponds to the number specified in the data set file).
 
 An example with two modalities could look like this:
 ```text
@@ -273,7 +273,7 @@ After planning and preprocessing the resulting data folder structure should look
         [imagesTr] # stores cropped image data; contains npz files
         [labelsTr] # stores labels
     [preprocessed]
-        [analysis]
+        [analysis] # some plots to visualize properties of the underlying data set
         [properties] # sufficient for new plans
         [labelsTr] # labels in original format (original spacing)
         [labelsTs] # optional
@@ -283,7 +283,7 @@ After planning and preprocessing the resulting data folder structure should look
         - {name of plan}.pkl e.g. D3V001_3d.pkl
 ```
 
-Befor starting the training copy the data (Task Folder, dataset info and preprocessed folder are needed) to a SSD (highly recommended) and unpack the image data with
+Befor starting the training copy the data (Task Folder, data set info and preprocessed folder are needed) to a SSD (highly recommended) and unpack the image data with
 
 ```bash
 nndet_unpack [path] [num_processes]
@@ -371,7 +371,7 @@ If a self-made test set was used, evaluation can be performed by invoking `nndet
 ## nnU-Net for Detection
 Besides nnDetection we also include the scripts to prepare and evaluate nnU-Net in the context of obejct detection.
 Both frameworks need to be configured correctly before running the scripts to assure correctness.
-After preparing the dataset in the nnDetection format (which is a superset of the nnU-Net format) it is possible to export it to nnU-Net via `scripts/nnunet/nnunet_export.py`. Since nnU-Net needs task ids without any additions it may be necessary to overwrite the task name via the `-nt` option for some dataets (e.g. `Task019FG_ADAM` needs to be renamed to `Task019_ADAM`).
+After preparing the data set in the nnDetection format (which is a superset of the nnU-Net format) it is possible to export it to nnU-Net via `scripts/nnunet/nnunet_export.py`. Since nnU-Net needs task ids without any additions it may be necessary to overwrite the task name via the `-nt` option for some dataets (e.g. `Task019FG_ADAM` needs to be renamed to `Task019_ADAM`).
 Follow the usual nnU-Net preprocessing and training pipeline to generate the needed models.
 Use the `--npz` option during training to save the predicted probabilities which are needed to generate the detection results.
 After determining the best ensemble configuration from nnU-Net pass all paths to `scripts/nnunet/nnunet_export.py` which will ensemble and postprocess the predictions for object detection.
@@ -417,9 +417,9 @@ In many cases this limitation can be circumvented by converting the bounding box
 </details>
 
 <details close>
-<summary>Mask RCNN and 2D Datasets</summary>
+<summary>Mask RCNN and 2D Data sets</summary>
 <br>
-2D datasets and Mask R-CNN are not supported in the first release.
+2D data sets and Mask R-CNN are not supported in the first release.
 We hope to provide these sometime in the future.
 </details>
 

diff --git a/projects/Task017_CADA/README.md b/projects/Task017_CADA/README.md
@@ -1,6 +1,6 @@
 # CADA
 **Disclaimer**: We are not the host of the data.
-Please make sure to read the requirements and usage policies of the data befor using it and **give credit to the authors of the dataset**!
+Please make sure to read the requirements and usage policies of the data and **give credit to the authors of the dataset**!
 
 Please read the information from the homepage carefully and follow the rules and instructions provided by the original authors when using the data.
 - Homepage: https://cada.grand-challenge.org/Introduction/
@@ -11,4 +11,4 @@ Please read the information from the homepage carefully and follow the rules and
 1. Follow the instructions and usage policies to download the data and place the data and labels at the following locations: data -> `Task017_CADA / raw / train_dataset` and labels -> `Task017_CADA / raw / train_mask_images`
 2. Run `python prepare.py` in `projects / Task017_CADA / scripts` of the nnDetection repository.
 
-The data is now prepared in the correct format and the instructions from the nnDetection README can be used to train the networks.
+The data is now converted to the correct format and the instructions from the nnDetection README can be used to train the networks.
diff --git a/projects/Task019_ADAM/README.md b/projects/Task019_ADAM/README.md
@@ -1,6 +1,6 @@
 # ADAM
 **Disclaimer**: We are not the host of the data.
-Please make sure to read the requirements and usage policies of the data befor using it and **give credit to the authors of the dataset**!
+Please make sure to read the requirements and usage policies of the data and **give credit to the authors of the dataset**!
 
 Please read the information from the homepage carefully and follow the rules and instructions provided by the original authors when using the data.
 - Homepage: http://adam.isi.uu.nl/
@@ -13,4 +13,5 @@ Please read the information from the homepage carefully and follow the rules and
 3. Run `python split.py` in `projects / Task019_ADAM / scripts` of the nnDetection repository.
 4. [Info]: The provided instructions will automatically create a patient stratified random split. We used a random split for our challenge submission. By renaming the provided split file in the `preprocessed` folders, nnDetection will automatically create a random split.
 
-The data is now prepared in the correct format and the instructions from the nnDetection README can be used to train the networks.
+The data is now converted to the correct format and the instructions from the nnDetection README can be used to train the networks.
+
diff --git a/projects/Task020_RibFrac/README.md b/projects/Task020_RibFrac/README.md
@@ -0,0 +1,17 @@
+# RibFrac
+**Disclaimer**: We are not the host of the data.
+Please make sure to read the requirements and usage policies of the data and **give credit to the authors of the dataset**!
+
+Please read the information from the homepage carefully and follow the rules and instructions provided by the original authors when using the data.
+- Homepage: https://ribfrac.grand-challenge.org/
+- Subtask: Task 1
+
+## Setup
+0. Follow the installation instructions of nnDetection and create a data directory name `Task020FG_RibFrac`. We added FG to the ID to indicate that we don't distinguish the different classes. (even if you prepare the data set with classes, the data needs to be placed inside that directory)
+1. Follow the instructions and usage policies to download the data and copy the data/labels/csv files to the following locations:
+data -> `Task020FG_RibFrac / raw / imagesTr`; labels -> `Task020FG_RibFrac / raw / labelsTr`; csv files -> `Task020FG_RibFrac / raw`
+2. Run `python prepare.py` in `projects / Task020FG_RibFrac / scripts` of the nnDetection repository.
+
+Note: If no manual split is created, nnDetection will create a random 5Fold split which we used for results.
+
+The data is now converted to the correct format and the instructions from the nnDetection README can be used to train the networks.
diff --git a/projects/Task020_RibFrac/scripts/prepare.py b/projects/Task020_RibFrac/scripts/prepare.py
@@ -0,0 +1,106 @@
+import os
+import shutil
+from pathlib import Path
+
+import pandas as pd
+from tqdm import tqdm
+
+from nndet.io import save_json
+from nndet.utils.check import env_guard
+
+
+def create(
+    image_source: Path,
+    label_source: Path,
+    image_target_dir: Path,
+    label_target_dir: Path,
+    df: pd.DataFrame,
+    fg_only: bool = False,
+    ):
+    image_target_dir.mkdir(parents=True, exist_ok=True)
+    label_target_dir.mkdir(parents=True, exist_ok=True)
+
+    case_id = image_source.stem.rsplit('-', 1)[0]
+    case_id_check = label_source.stem.rsplit('-', 1)[0]
+    assert case_id == case_id_check, f"case ids not matching, found image {case_id} and label {case_id_check}"
+
+    df_case = df.loc[df['public_id'] == case_id]
+    instances = {}
+    for row in df_case.itertuples():
+        _cls = int(row.label_code)
+        if _cls == 0:   # background has label code 0 and lab id 0
+            continue
+
+        if fg_only:
+            _cls = 1
+        elif _cls == -1:
+            _cls = 5
+
+        instances[str(row.label_id)] = _cls - 1  # class range from 0 - 4 // if fg only 0
+        assert 0 < _cls < 6, f"Something strange happened {_cls}"
+    save_json({"instances": instances}, label_target_dir / f"{case_id}.json")
+
+    shutil.copy2(image_source, image_target_dir / f"{case_id}_0000.nii.gz")
+    shutil.copy2(label_source, label_target_dir / f"{case_id}.nii.gz")
+
+
+@env_guard
+def main():
+    det_data_dir = Path(os.getenv('det_data'))
+    task_data_dir = det_data_dir / "Task020_RibFrac"
+    source_data_dir = task_data_dir / "raw"
+
+    if not source_data_dir.is_dir():
+        raise RuntimeError(f"{source_data_dir} should contain the raw data but does not exist.")
+    if not (p := source_data_dir / "imagesTr").is_dir():
+        raise ValueError(f"Expected data to be located at {p}")
+    if not (p := source_data_dir / "labelsTr").is_dir():
+        raise ValueError(f"Expected labels to be located at {p}")
+    if not (p := source_data_dir / "ribfrac-train-info-1.csv").is_file():
+        raise ValueError(f"Expected {p} to exist.")
+    if not (p := source_data_dir / "ribfrac-train-info-2.csv").is_file():
+        raise ValueError(f"Expected {p} to exist.")
+    if not (p := source_data_dir / "ribfrac-val-info.csv").is_file():
+        raise ValueError(f"Expected {p} to exist.")
+
+    target_data_dir = task_data_dir / "raw_splitted" / "imagesTr"
+    target_data_dir.mkdir(exist_ok=True, parents=True)
+    target_label_dir = task_data_dir / "raw_splitted" / "labelsTr"
+    target_label_dir.mkdir(exist_ok=True, parents=True)
+
+    csv_fies = [source_data_dir / "ribfrac-train-info-1.csv",
+                source_data_dir / "ribfrac-train-info-2.csv",
+                source_data_dir / "ribfrac-val-info.csv"]
+    df = pd.concat([pd.read_csv(f) for f in csv_fies])
+
+    image_paths = list((source_data_dir / "imagesTr").glob("*.nii.gz"))
+    image_paths.sort()
+    label_paths = list((source_data_dir / "labelsTr").glob("*.nii.gz"))
+    label_paths.sort()
+
+    print(f"Found {len(image_paths)} data files and {len(label_paths)} label files.")
+    assert len(image_paths) == len(label_paths)
+
+    meta = {
+        "name": "RibFracFG",
+        "task": "Task020FG_RibFrac",
+        "target_class": None,
+        "test_labels": False,
+        "labels": {"0": "fracture"}, # since we are running FG vs BG this is not completely correct
+        "modalities": {"0": "CT"},
+        "dim": 3,
+    }
+    save_json(meta, task_data_dir / "dataset.json")
+
+    for ip, lp in tqdm(list(zip(image_paths, label_paths))):
+        create(image_source=ip,
+               label_source=lp,
+               image_target_dir=target_data_dir,
+               label_target_dir=target_label_dir,
+               df=df,
+               fg_only=True,
+               )
+
+
+if __name__ == '__main__':
+    main()