To compare and assess methods to estimate CFR during an infectious disease outbreak, that could be easily adopted or incorporated into dashboards, while evaluating common assumptions for the different types of data available to the researcher. If adequate, these methods would maximise the speed and efficiency of analyses, compared to slower but more methodologically robust statistical inference.
If we have access to individual-level data, where we have data for each case’s time of disease onset and death, we don’t need to aggregate data into incidence , e.g., to use {cfr} functions, to obtain a reliable estimate of severity. The problem is that often, during an outbreak, complete data on cases’ onset and their outcomes won’t be available, given the high proportion of cases without a known outcome in the growing phase of an epidemic.
- Linelist with known outcomes, assuming that onset-death delay follows the same distribution as onset-recovery.
- Truncated linelist with unknown delays
- Recoveries are not recorded but deaths are (Cholera-like outbreak)
- Delay-death shorter than delay-recovery (Ebola-like outbreak)
Currently, the methods to estimate the time-varying and the static CFR used by the {cfr} package rely on daily incidence and daily death data. In this case, we are either forced to go back to the naïve calculation of cases/deaths, or we can do some calculation to allocate weekly cases into the days of the week, sometimes this is done by adding all cases to the first day of the week. Imputation methods, such as the one from EpiEstim, are useful to reconstruct daily incidence of disease. In this section we assess in which scenarios it is possible to use this method to estimate disease severity, depending on the quality of data that is available to the researcher.
- Weekly aggregation of cases and deaths
- Weekly aggregation of cases and only total deaths
- Incidence data with known delays: comparison of discrete vs continuous distributions