Skip to content

Artificial Intelligence for Industry's project on Italian COVID-19 dataset.

Notifications You must be signed in to change notification settings

LIA-UniBo/AI4I-COVID-Python

Repository files navigation

AI4I-COVID-Python

Artificial Intelligence for Industry's project on Italian COVID-19 dataset.

In this project we explored the potential and limitations of Bayesian melding, a statistical technique which fits the input parameters of a deterministic function, according to stochastic observations.

The general idea behind the method is to merge different "opinions" about an observed phenomenon via statistical pooling:

  • A prior probability on the outputs of the model ("what may be reasonable to happen")
  • An induced probability computed by applying the deterministic model to some input prior distribution ("what we expect to observe according to the model")
  • A likelihood probability on the inputs ("what we know has happened")
  • A likelihood probability on the outputs ("what we actually observe").

In order to correctly apply the pooling operation, the model needs to be inverted. Since this is seldom possible, pooling is approximated with the SIR (sampling importance-resampling, not to be confused with the susceptible-infected-removed model, also used in this repository) algorithm:

  1. Extract a large number of random samples from the input prior distribution
  2. Weight each sample formula according to formula, where:
    • formula is the output of the model applied to formula
    • formula is the pooling factor (usually 0.5)
    • formula is the output prior
    • formula is the induced output posterior, ie. the output distribution computed applying the input distribution to the model; it can be estimated by applying the model to each sample and then performing a kernel density estimation with a Gaussian kernel
    • formula is the input likelihood
    • formula is the output likelihood
  3. Extract a small subset of samples, but this time use the computed weights instead of the prior distribution
  4. The distribution on the resampled weights is an approximation of the true input distribution and the usual operations can be performed on it (eg. extract mean to fit the model to the data and variance to determine confidence).

Bayesian melding was applied to three different epidemiological models:

  • SIR: Susceptible-infected-removed
  • SIRD: Susceptible-infected-recovered-deceased
  • SEIRD: Susceptible-exposed-infected-recovered-deceased, extended with hidden E compartment and reinfection rate.

Due to step 1. being very slow and the curse of dimensionality (especially for SEIRD), we also tried to perform deterministic seeding in order to reduce the search space, with limited success.

Slides' beamer template was forked from UniBO beamer and modified for the AI course at DISI.

Authors: G. Tsiotas, L.S. Lorello.

We also maintain a public dataset of Italian regions' colors at: https://github.com/tsiotas/covid-19-zone.

This dataset is updated every day and contains the colors of each region, starting from November, 6th, 2020 (the first day in which the Government decided to apply a color-based scheme).

About

Artificial Intelligence for Industry's project on Italian COVID-19 dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published