[Paper
] [Pre-print
] [Code
] [BibTeX
]
Has been presented in 2024 at the 21th International Symposium on Biomedical Imaging (ISBI-2024).
Authors: George Batchkala, Bin Li, Mengran Fan, Mark McCole, Cecilia Brambilla, Fergus Gleeson, Jens Rittscher.
-
DHMC_MetaData_Release_1.0.csv - downloaded from https://bmirds.github.io/LungCancer/; gives predominant LUAD pattern
-
tcga_classes_extended_info.csv - see https://github.com/GeorgeBatch/TCGA-lung-histology-download/
-
tcga_dsmil_test_ids.csv - see https://github.com/GeorgeBatch/TCGA-lung-histology-download/
-
tcia_cptac_md5sum_hashes.txt - see https://github.com/GeorgeBatch/TCIA-CPTAC-lung-histology-download
-
tcia_cptac_luad_lusc_cohort.csv - see https://github.com/GeorgeBatch/TCIA-CPTAC-lung-histology-download
-
tcia_cptac_string_2_ouh_labels.csv - took unique values from tcia_cptac_luad_lusc_cohort.csv and manually mapped to labels inspired by OUH (Oxford University Hospitals) reports
Columns include the label
(LUAD vs LUSC) and paths to features:
features_csv_file_path
h5_file_path
pt_file_path
mapping = {
"LUAD": 0,
"LUSC": 1,
}
DHMC has only LUAD slides, so all entries in the label
field are 0:
TCGA has both LUAD and LUSC so entries in the label
field include 0 and 1:
Run the labels creation code notebook. The code will create the files in labels/experiment-label-files/.
Note, the combined dataset for training/validation is not the same as in the paper since the in-house DART dataset is not publicly available. The test set, however, is the same as in the paper and is fully available in the 8-label task and 5-label task.
For publication, I used the tiling and feature extraction pipeline from https://github.com/binli123/dsmil-wsi repository.
For faster computation, the csv features should be converted into hdf5
and pt
files like in https://github.com/mahmoodlab/CLAM.
I am currently working on standardising the tiling and feature extraction pipeline for the Dependency-MIL model using tiatoolbox.
For training I used the code from https://github.com/binli123/dsmil-wsi modified to accomodate for partial labels using custom_binary_cross_entropy_with_logits
function from source.losses
I will release the code once I finish improving it. If you need the code urgently, please contact me.
Code for creating
- PyTorch dataset: dataset_detailed.py.
- PyTorch data loaders using PyTorch Ligtning Datamodule : datamodule_detailed.py.
Dependency-MIL model can be created using get_model()
function from source.models.combined_model
George Batchkala is supported by Fergus Gleeson and the EPSRC Center for Doctoral Training in Health Data Science (EP/S02428X/1). The work was done as part of DART Lung Health Program (UKRI grant 40255).
The computational aspects of this research were supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z and the NIHR Oxford BRC. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
If you find Dependency-MIL useful for your your research and applications, please cite using this BibTeX:
@INPROCEEDINGS{batchkala2024dependency-mil,
author={Batchkala, George and Li, Bin and Fan, Mengran and McCole, Mark and Brambilla, Cecilia and Gleeson, Fergus and Rittscher, Jens},
booktitle={2024 IEEE International Symposium on Biomedical Imaging (ISBI)},
title={Accurate Subtyping of Lung Cancers by Modelling Class Dependencies},
year={2024},
volume={},
number={},
pages={1-5},
keywords={Accuracy;Convolution;Annotations;Histopathology;Lung cancer;Lung;Predictive models;lung cancer;computational pathology;multi-label classification;multiple-instance learning},
doi={10.1109/ISBI56570.2024.10635232}
}