MDS-ED: Multimodal Decision Support in the Emergency Department -- a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine
This is the official repository for the paper MDS-ED: Multimodal Decision Support in the Emergency Department -- a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine
In this study, conducted within the context of an emergency department, we introduce a state-of-the-art biomedical multimodal benchmark. This benchmark is evaluated in two comprehensive settings:
- Patient Discharge Diagnoses: A dataset consisting of 1,428 patient discharge diagnoses.
- Patient Deterioration Events: A dataset consisting of 15 patient deterioration events.
The datasets include various patient data collected within a 90-minute interval upon arrival, such as:
- Demographics
- Biometrics
- Vital parameter trends
- Laboratory value trends
- ECG waveforms
-
Comprehensive Size: MDS-ED ranks first in terms of the number of patients and second in the number of visits in the open-source domain, despite focusing only on the first 1.5 hours of ED arrival.
-
Features Diversity: MDS-ED leads in feature modalities, including demographics, biometrics, vital parameter trends, laboratory value trends, and ECG waveforms, making it more extensive than most datasets. Chief complaints and previous medications were excluded due to their unstructured nature and potential bias.
-
Extensive Range of Target Labels: MDS-ED offers 1,443 target labels, significantly more than other datasets, which usually have fewer and narrower scope tasks.
-
Accessibility: MDS-ED is open-source, promoting further research and collaboration.
Overall, we can draw several conclusions:
- Superior Performance of Multimodal Models: The results demonstrate that multimodal models, which integrate diverse data types, offer superior performance in both diagnostic and deterioration tasks (row 1/2 vs. row 3/4).
- Improved Performance with Raw ECG Waveforms: In the diagnoses task, the use of ECG raw waveforms instead of ECG features improves the performance in a statistically significant manner (row 3 vs. row 4), whereas for the deterioration task, we performed a direct comparison via bootstrapping the score difference for statistical significance, and we did not find a statistically significant difference. To the best of our knowledge, this is the first statistically robust demonstration of the added value of raw waveform input against features for clinically relevant prediction tasks such as diagnoses prediction.
- ECG Waveform Superiority in Diagnostic Setting: The model building on ECG waveforms as only input outperforms the tabular-only model in the diagnostic setting, but not in the deterioration setting (row 1 vs. row 2). We hypothesize that this is due to the inclusion of tabular trends over time which aligns with the task definition of deterioration. We believe that the inclusion of multiple raw ECGs over time instead of just a single snapshot would allow us to capture more meaningful deterioration trends also from raw waveform data.
@misc{alcaraz2024mdsedmultimodaldecisionsupport,
title={MDS-ED: Multimodal Decision Support in the Emergency Department -- a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine},
author={Juan Miguel Lopez Alcaraz and Nils Strodthoff},
year={2024},
eprint={2407.17856},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2407.17856},
}