This repository provides the code for following paper:
Jan Trienes, Paul Youssef, Jörg Schlötterer, and Christin Seifert. 2023. Guidance in Radiology Report Summarization: An Empirical Evaluation and Error Analysis. In Proceedings of the 16th International Natural Language Generation Conference (INLG), Prague, Czech Republic. Association for Computational Linguistics.
Contents:
Code for summarization method is included as submodules. Clone as below.
git clone --recurse-submodules [email protected]:mcmi-group/guided-summary.git
conda env update -f environment.yml
conda activate guided-summary
pip install -r requirements-dev.txt
pip install -e .
For evaluation, please install ROUGE as per these instructions. Furthermore, build the CheXpert docker image with this script: ./scripts/build_chexpert.sh
.
Artifact | Description | Link | Where to extract |
---|---|---|---|
Datasets | Use below scripts to download and pre-process the raw MIMIC-CXR and OpenI datasets. | see below | n/a |
Error annotations | 1,200 expert annotations (100 reports × 4 candidates × 3 annotators) of MIMIC-CXR test reports. | TBA | error-analysis/data/ |
Model outputs | Outputs generated by all summarization models. | MIMIC-CXR (tba) | OpenI | outputs/ |
Checkpoints | Pre-trained models. | MIMIC-CXR (tba) | OpenI | outputs/ |
Source: https://physionet.org/content/mimic-cxr/2.0.0/
# When prompted, type your Physionet password...
export PHYSIONET_USER=...
./scripts/preprocess_mimic.sh
# Build PreSumm dataset with finding section
source scripts/config_mimic.sh
./scripts/ds_unguided.sh
./scripts/ds_oracle.sh
# Build PreSumm dataset with background + finding section
source scripts/config_mimic_bg.sh
./scripts/ds_unguided.sh
./scripts/ds_oracle.sh
# Build WGSum datasets
python scripts/convert_to_wgsum.py
CUDA_VISIBLE_DEVICES=0 ./scripts/ds_wgsum.sh data/processed/mimic-wgsum/
CUDA_VISIBLE_DEVICES=1 ./scripts/ds_wgsum.sh data/processed/mimic-bg-wgsum/
CUDA_VISIBLE_DEVICES=2 ./scripts/ds_wgsum.sh data/processed/mimic-official-wgsum/
CUDA_VISIBLE_DEVICES=3 ./scripts/ds_wgsum.sh data/processed/mimic-official-bg-wgsum/
# Build WGSum+CL dataset
CUDA_VISIBLE_DEVICES=0 ./scripts/ds_wgsum_cl.sh data/processed/mimic-wgsum/ data/processed/mimic-wgsum-cl/
CUDA_VISIBLE_DEVICES=1 ./scripts/ds_wgsum_cl.sh data/processed/mimic-bg-wgsum/ data/processed/mimic-bg-wgsum-cl/
CUDA_VISIBLE_DEVICES=2 ./scripts/ds_wgsum_cl.sh data/processed/mimic-official-wgsum/ data/processed/mimic-official-wgsum-cl/
CUDA_VISIBLE_DEVICES=3 ./scripts/ds_wgsum_cl.sh data/processed/mimic-official-bg-wgsum/ data/processed/mimic-official-bg-wgsum-cl/
Source: https://openi.nlm.nih.gov/faq#collection
source scripts/config_openi.sh
./scripts/preprocess_openi.sh
./scripts/ds_unguided.sh
./scripts/ds_oracle.sh
# Build PreSumm dataset with background + finding section
source scripts/config_openi_bg.sh
./scripts/ds_unguided.sh
./scripts/ds_oracle.sh
# Build WGSum datasets
python scripts/convert_to_wgsum.py
# Build WGSum dataset
CUDA_VISIBLE_DEVICES=2 ./scripts/ds_wgsum.sh data/processed/openi-wgsum/
CUDA_VISIBLE_DEVICES=3 ./scripts/ds_wgsum.sh data/processed/openi-bg-wgsum/
# Build WGSum+CL dataset
CUDA_VISIBLE_DEVICES=2 ./scripts/ds_wgsum_cl.sh data/processed/openi-wgsum/ data/processed/openi-wgsum-cl/
CUDA_VISIBLE_DEVICES=3 ./scripts/ds_wgsum_cl.sh data/processed/openi-bg-wgsum/ data/processed/openi-bg-wgsum-cl/
The code is based on the original PreSumm and GSum implementations. When training for the first time, use only one GPU so that pre-trained models can be downloaded. The training can be restarted after that.
Configure training. Choices = {openi, mimic, mimic_bg, mimic_official, mimic_official_bg}
.
source scripts/config_XXXXX.sh
#### For slurm, prepend following (adapt gpus accordingly)
# sbatch --partition GPUampere --gpus 5 --time 10:00:00 [script]
##### Base Models
# OracleExt
./scripts/train_extoracle.sh
# BertExt (fixed, k=1)
./scripts/train_bertext.sh
# BertAbs
./scripts/train_bertabs.sh
# WGSum + WGSum+CL
./scripts/train_wgsum.sh
./scripts/train_wgsum_cl.sh
# GSum w/ OracleExt
./scripts/train_gsum_oracle.sh
##### GSUM w/ Fixed-Length and Variable-Length Guidance (ours):
# BertExt (fixed, k=[1,5], LR-Approx, BERT-Approx, Thresholding, k=|OracleExt|)
./scripts/train_bertext_allranks.sh
./scripts/train_bertext_thresholds.sh
./scripts/ds_variable.sh
# GSum (oracle-trained) w/ different BertExt strategies
./scripts/test_gsum.sh
# Abstain experiments
./scripts/ds_abstain.sh
./scripts/train_gsum_oracle_abstain.sh
./scripts/test_gsum_abstain.sh
Notebook | Purpose | Paper Figures/Tables |
---|---|---|
01-statistics.ipynb | Calculate descriptive statistics of the datasets. | Table 1, 8 |
02-evaluation.ipynb | Evaluate all model runs. | Table 2--6, Figure 3, 5, 6 |
03-example.ipynb | Example report with model outputs. | Figure 1 |
04-error-analysis-assignment.ipynb | Prepare reports for error analysis, and assign to annotators. | n/a |
05-error-analysis-results.ipynb | Analysis of manual error annotations. | Figure 4, Table 9 |
06-error-analysis-radnli.ipynb | Evaluating the factuality of addition spans with RadNLI (see below). | Table 7 |
07-dataset-inconsistency.ipynb | Measuring duplication in MIMIC-CXR and showing examples. | Table 11 |
To run the RadNLI experiment for evaluating factuality of additions, setup below environment:
mamba env update -f radnli_env.yml
conda activate radnli
You also need to download the pre-trained model:
cd ifcc/resources && ./download.sh
After that you can start the experiment using ./notebooks/06-error-analysis-radnli.ipynb
To test, lint and autoformat, use following Make targets:
make test
make lint
make format
If you use the resources in this repository, please cite:
@InProceedings{Trienes:2023:INLG,
title = "Guidance in Radiology Report Summarization: {A}n Empirical Evaluation and Error Analysis",
author = {Trienes, Jan and
Youssef, Paul and
Schl{\"o}tterer, J{\"o}rg and
Seifert, Christin},
booktitle = "Proceedings of the 16th International Natural Language Generation Conference (INLG)",
year = "2023",
doi = "10.18653/v1/2023.inlg-main.13",
pages = "176--195",
}
If you have any questions, please contact Jan Trienes at jan.trienes [AT] gmail.com
.