CovRNN: A collection of Recurrent Neural Network models for the prediction of COVID-19 patients outcomes on admission based on their electronic health records (EHR) data
This repository provides the code for training and fine-tuning CovRNN, a collection of Recurrent Neural Network models for the prediction of COVID-19 patients outcomes on admission based on their electronic health records (EHR) data on admission, without the need for specific feature selection or missing data imputation
CovRNN is designed to predict three outcomes: in-hospital mortality, need for mechanical ventilation, and long length of stay (LOS >7 days). Predictions are made for time-to-event risk scores (survival prediction) and all-time risk scores (binary prediction). Our models were trained and validated using heterogeneous and de-identified data of 247,960 COVID-19 patients from 87 healthcare systems, derived from the Cerner® Real-World Dataset (CRWD) and 36,140 de-identified patients’ data derived from the Optum® de-identified COVID-19 Electronic Health Record v. 1015 dataset (2007–2020). For further details, Please refer to our paper CovRNN—A recurrent neural network model for predicting outcomes of COVID-19 patients: model development and validation using EHR data.
We showed that deep learning-based models can achieve state-of-the-art prediction accuracy while consuming the structured EHR categorical data in their standard raw format without the need for extensive feature engineering, which implies that the trained models can be easily validated on new data sources. CovRNN was validated across datasets from different sources, indicating it's transferability. Our framework can be further applied to train and evaluate predictive models for different types of clinical events.
In this Repository, we are sharing the pretrained CovRNN trained on more than 170,000 COVID-19 patients extracted from the CRWD, so you can fine-tune our CovRNN pre-trained model on a sample of your local data, and use it.
A tutorial showing an example on how to use our comprehensive model development framework to train a new predictive model using your own data is available on https://github.com/ZhiGroup/pytorch_ehr/tree/ACM_BCB-Tutorial , this tutorial is using MIMIC IV data, and use very basic code to define the cohort just as an example. We highly recommend a more regrious definition of the cohort cases and controls as described in our paper.
Pytorch version 1.7
Pandas
Numpy
sklearn
lifelines
sksurv
Matplotlib
tqdm
Python: 3.7+
Pretrained Model Usage is described in this folder, including model fine-tuning
This folder also includes the CRWD pretrained models and their state dictionaries
Rasmy L, Nigo M, Kannadath BS, Xie Z, Mao B, Patel K, Zhou Y, Zhang W, Ross AM, Xu H, Zhi D. CovRNN-A recurrent neural network model for predicting outcomes of COVID-19 patients: model development and validation using EHR data. medRxiv. 2021 Sep 29