Author: Steve Walker
Last Updated: 2025-03-20
This is a (currently draft) set of companion materials for a workshop on
using macpan2
for applied public
health modelling.
(Note: I highlight areas that cannot be finalized yet, with notes like this)
- Preliminaries
- Exploration
- Starter Model Library
- Simulating Dynamics
- Relating Model Specifications to Box Diagrams
- Relating Model Specifications to Explicit Dynamics
- Basics of Modifying Model Specifications
- Compare Simulated and Observed Incidence
- Reporting Delays
- Hospital Admissions and Occupancy
- Symptom Status
- Seroprevalence
- Vaccination
- Wastewater Pathogen Concentrations
- Vital Dynamics
- Importation and Stochasticity
- Calibration
- Inference
- Stratification
Please read the syllabus before participating in this workshop. The instructor will review this syllabus with participants before the sessions start.
The syllabus provides a roadmap for the workshop, guiding participants through the exploration-calibration-inference-stratification strategy that can be used for compartmental modeling. By the end of the workshop, participants will have the skills to develop compartmental models using macpan2, navigate its documentation to address challenges, and provide constructive feedback for improving the tool.
The goal of macpan2
is to implement well-understood techniques, for
use by epidemiologists who are tasked with using data-calibrated
compartmental models to answer public health questions.
We collect some of the more technical ideas in tip boxes.
Tip boxes like this contain technical information that could interrupt the flow of ideas, but is too important to bury in footnotes. |
If you’re attending this workshop, you’re already an experienced modeller. Many of the concepts will be familiar to you, so you may gain more by doing illustrative exercises.
For your first exercise, please familiarize yourself with the package website: https://canmod.github.io/macpan2. Pay particular attention to the quickstart guide and other user guides. |
Throughout these materials I will repeatedly use several coding idioms. Please familiarize yourself with these in advance of the workshop.
We will often use the base R pipe
operator,
|>
, when it improves readability.
Instructions for setting your computer for using macpan2
are
here.
In this section you will learn how to explore and specify compartmental models, and informally compare them with observed data.
At the core of macpan2
are model specifications, which define
compartment flows and set default parameter values. These specifications
provide a systematic framework for describing system behavior, starting
with the foundational elements of compartmental models. While additional
complexities (e.g., vital dynamics) are often required, we are going to
be focusing on simple models.
The macpan2
model
library
provides some example models as a place to start.
To list available models in the macpan2 library:
show_models()
Directory | Title | Description |
---|---|---|
awareness | Awareness Models | Behaviour modifications in response to death |
hiv | HIV | A simple HIV model |
lotka_volterra_competition | Lotka-Volterra | Simple two-species competition model |
lotka_volterra_predator_prey | Lotka-Volterra | Simple predator-prey model |
macpan_base | Macpan Base | Re-implementation of the McMaster group’s COVID-19 model |
nfds | NFDS and Vaccine Design | An ecological model using population genomics to design optimal vaccines as implemented in Colijn et al. (2020) |
seir | Basic SEIR | Simple epidemic model with an exposed class |
shiver | SHIVER = SEIR + H + V | A modified SEIR model with Hospitalization and Vaccination |
si | Basic SI | A very simple epidemic model |
sir | Basic SIR | A very simple epidemic model |
sir_demog | SIR with Demography | An SIR model with birth and death |
sir_mosquito | Mosquito-Vector SIR | SIR model for mosquito vectors |
sir_waning | SIR with Waning Immunity (SIRS) | A basic SIR model with a flow from R back to S |
ww | Wastewater Model | Macpan base with an additional wastewater component |
This function displays the available models, along with their
directories, titles, and descriptions. A simpler function just lists the
models, mp_list_models
. The
mp_model_docs
function will take you to the documentation for any particular model in
this list.
To examine a specific model, such as the SIR model, load it into R:
sir = mp_tmb_library("starter_models", "sir", package = "macpan2")
print(sir)
## ---------------------
## Default values:
## quantity value
## beta 0.2
## gamma 0.1
## N 100.0
## I 1.0
## R 0.0
## ---------------------
##
## ---------------------
## Before the simulation loop (t = 0):
## ---------------------
## 1: S ~ N - I - R
##
## ---------------------
## At every iteration of the simulation loop (t = 1 to T):
## ---------------------
## 1: mp_per_capita_flow(from = "S", to = "I", rate = "beta * I / N",
## abs_rate = "infection")
## 2: mp_per_capita_flow(from = "I", to = "R", rate = "gamma", abs_rate = "recovery")
There are three parts to this specification.
- Default values: The default values that model quantities take
before each simulation begins1. In the
sir
example there are two parameters (the transmission ratebeta
and removal rategamma
), the total population size,N
, and the initial values of two state variables,I
andR
. - Before the simulation loop: Expressions that
macpan2
evaluates before the dynamical simulation loop. In thesir
example the initial value ofS
is computed. The specification could have placedS
in the defaults: in this case we compute it before the simulation loop begins instead in order to allow it to depend on other model quantities. - At every iteration of the simulation loop: Expressions that get
evaluated iteratively during the simulation loop. In the
sir
example there are two flows, which I discuss more throughout the next few sections.
You can use several functions to produce model specification objects, like sir above, can be produced using several different functions. Above I produced such an object using mp_tmb_library() , which reads pre-existing specifications in from the library. If you go to the [directory]s(https://github.com/canmod/macpan2/tree/main/inst/starter_models/sir) that defines this library model you will see the R script that actually produces the specification, using the mp_tmb_model_spec() function; you can use this function yourself to produce your own model specifications. Later we will learn about other functions that take a model specification and return a modified version of this specification. This is useful when a model needs to be modified so that it can be compared with data (e.g., account for under-reporting, time-varying transmission rates in response to school closures). |
The sir model illustrates the two kinds of expressions that can be used to define model behaviour: (1) Generic math expressions (e.g., S ~ N - I - R ) and (2) flows among compartments (e.g., mp_per_capita_flows(from ...) ). In generic math expressions, the tilde, ~ , is used to denote assignment. For example, the expression, S ~ N - I - R , says that the result of N - I - R is assigned to the quantity, S . Any function listed here can be used on the right-hand-side of these expressions. Each flow expression corresponds to a single flow between compartments. The tilde-based math expressions and flow expressions can be mixed to provide a great deal of flexibility in model specifications. |
The main thing that you will do with model specifications like sir
is
to simulate dynamics under these models. Simulation is useful to explore
scenarios and to prepare for calibrating models to data. Because
specifications come with default values for parameters and initial
states, they can be simulated from without any further information.
However, you will almost almost need to modify a built-in model
specification to apply it to your particular situation; we will discuss
some of the ways to modify these models.
Typically the first thing to do to understand and validate a new model is to simulate from it, so we will work on this immediately.
Follow this quick-start guide to simulate from the sir model using its default values. |
Simple, Consistent, and General Format for Simulation Output Simulations of macpan2 models are always return simulation outputs in the same format, a long format data frame where each output variable (e.g., S , I , R ) has a separate row for every time. This output format is simple, works well with existing R frameworks (e.g., tidyverse /ggplot2 ) and can easily be restructured to other formats (e.g., with tidyr::pivot_wider() ). This article gives a brief but full description of the data format. |
Model specification objects, such as the sir
object produced in the
previous section, are essentially textual descriptions of compartmental
model flow diagrams. Below we compare the during the simulation loop
portion of the sir
object with the corresponding flow diagram:
mp_print_during(sir)
## ---------------------
## At every iteration of the simulation loop (t = 1 to T):
## ---------------------
## 1: mp_per_capita_flow(from = "S", to = "I", rate = "beta * I / N",
## abs_rate = "infection")
## 2: mp_per_capita_flow(from = "I", to = "R", rate = "gamma", abs_rate = "recovery")
One possibly unclear aspect of the previous model printout is the
specific mathematical meaning of the mp_per_capita_flow
functions, and
their arguments.
mp_print_during(sir)
## ---------------------
## At every iteration of the simulation loop (t = 1 to T):
## ---------------------
## 1: mp_per_capita_flow(from = "S", to = "I", rate = "beta * I / N",
## abs_rate = "infection")
## 2: mp_per_capita_flow(from = "I", to = "R", rate = "gamma", abs_rate = "recovery")
You can find the technical documentation for this function (and related functions) here, but the following set of definitions summarizes the terminology.
Absolute flow rate: Total number of individuals flowing from one compartment to another in a single time-step. We call absolute flows things like infection and recovery . These names specified using the abs_rate argument in the mp_per_capita_flow() function. The numerical value of the absolute flow rate depends on the per-capita flow rate, but in a way that depends on the state update method.Per-capita flow rate: The rate at which individuals flow from one compartment to another in a single time-step, per individual in the source compartment. For example, the per-capita infection rate is often given by beta * I / N and for recovery is often gamma . These expressions are specified using the rate argument of the mp_per_capita_rate function.State update method: Mathematical framework for computing absolute flow rates from per-capita flow rates. For example, the difference equation (a.k.a. Euler ODE solver, mp_euler()) state update method results in infection = S * (beta * I / N) , with the per-capita flow rate in parentheses. For the state update method using the Runge-Kutta 4 ODE solver (mp_rk4()), the relationship between absolute flow rates and per-capita flow rates is more complex because it accounts (to some extent) for the fact that the state variables change continuously through a single time step. |
Now that these definitions are in place, let’s look at an example. One
can use the
mp_expand()
function to expand the mp_per_capita_flow()
notation to an explicit
form that is passed to the simulation engine.
sir |> mp_print_during()
## ---------------------
## At every iteration of the simulation loop (t = 1 to T):
## ---------------------
## 1: mp_per_capita_flow(from = "S", to = "I", rate = "beta * I / N",
## abs_rate = "infection")
## 2: mp_per_capita_flow(from = "I", to = "R", rate = "gamma", abs_rate = "recovery")
sir |> mp_expand() |> mp_print_during()
## ---------------------
## At every iteration of the simulation loop (t = 1 to T):
## ---------------------
## 1: infection ~ S * (beta * I/N)
## 2: recovery ~ I * (gamma)
## 3: S ~ S - infection
## 4: I ~ I + infection - recovery
## 5: R ~ R + recovery
This output illustrates that the (default) simulation framework is a discrete time system of difference equations. In the first two expressions in the expanded version the absolute flow rates are computed, which are used in the last three expressions to update the state variables.
For another example, suppose you are more comfortable with treating compartmental models as ODEs. Indeed many modellers read the SIR flow diagram as representing this specific set of ODEs.
We can easily use the sir
compartmental model object to simulate it as
an ODE using the mp_rk4()
function, which uses the Runge Kutta 4 ODE
solver.
sir |> mp_rk4() |> mp_expand() |> mp_print_during()
## ---------------------
## At every iteration of the simulation loop (t = 1 to T):
## ---------------------
## 1: k1_infection ~ S * (beta * I/N)
## 2: k1_recovery ~ I * (gamma)
## 3: k1_S ~ -k1_infection
## 4: k1_I ~ +k1_infection - k1_recovery
## 5: k1_R ~ +k1_recovery
## 6: k2_infection ~ (S + (k1_S/2)) * (beta * (I + (k1_I/2))/N)
## 7: k2_recovery ~ (I + (k1_I/2)) * (gamma)
## 8: k2_S ~ -k2_infection
## 9: k2_I ~ +k2_infection - k2_recovery
## 10: k2_R ~ +k2_recovery
## 11: k3_infection ~ (S + (k2_S/2)) * (beta * (I + (k2_I/2))/N)
## 12: k3_recovery ~ (I + (k2_I/2)) * (gamma)
## 13: k3_S ~ -k3_infection
## 14: k3_I ~ +k3_infection - k3_recovery
## 15: k3_R ~ +k3_recovery
## 16: k4_infection ~ (S + k3_S) * (beta * (I + k3_I)/N)
## 17: k4_recovery ~ (I + k3_I) * (gamma)
## 18: k4_S ~ -k4_infection
## 19: k4_I ~ +k4_infection - k4_recovery
## 20: k4_R ~ +k4_recovery
## 21: infection ~ (k1_infection + 2 * k2_infection + 2 * k3_infection + k4_infection)/6
## 22: recovery ~ (k1_recovery + 2 * k2_recovery + 2 * k3_recovery + k4_recovery)/6
## 23: S ~ S - infection
## 24: I ~ I + infection - recovery
## 25: R ~ R + recovery
Mix and Match Compartmental Models with State Update Methods Difference equations and ODEs are the two most common state update methods for simulating a compartmental model. The design of macpan2 allows different state update methods to be mixed and matched with different compartmental model specifications. This flexible modularity is a key part of the power of macpan2 – the model specification language is agnostic about the state update method. Therefore each compartmental model can be simulated in any number of ways. The full list of currently available state update methods is here. |
Here is one more example of an SIR model expansion that uses the Euler-multinomial distribution to simulate a compartmental model with process error.
sir |> mp_euler_multinomial() |> mp_expand() |> mp_print_during()
## ---------------------
## At every iteration of the simulation loop (t = 1 to T):
## ---------------------
## 1: infection ~ reulermultinom(S, (beta * I/N))
## 2: recovery ~ reulermultinom(I, gamma)
## 3: S ~ S - infection
## 4: I ~ I + infection - recovery
## 5: R ~ R + recovery
Note that all state update
methods
generate the same last three expressions for this sir
model object.
S ~ S - infection
I ~ I + infection - recovery
R ~ R + recovery
This means that all state update methods can be thought of as discrete time difference equations, but with different approaches to calculating the absolute flow rates per time step.
Embrace Difference Equations The original McMasterPandemic project was based entirely on discrete time models. Simulations from discrete time difference equations requires little math to implement and understand. This simplicity is nice in applied work where the barriers to real-world impact do not usually include mathematical novelty and elegance. Further, when modelling a real system it is natural to compare simulations with observed data, which are measured at discrete times anyways. Difference equations are not without their drawbacks however. Many results in mathematical epidemiology come from ODEs and various stochastic processes, and so when using difference equations epidemiologists might be concerned that their intuition from ODEs will not hold. Difference equations can also be less stable (e.g., state variables going below zero) than ODEs, which can become a real problem when calibrating models using trajectory matching. Therefore we make it easy to check these concerns by using a state update method like mp_rk4() (for a simple ODE solver) or mp_euler_multinomial() (for a simple process error model) to overcome these concerns whenever necessary.When using these alternative state update methods keep in mind that they too can be thought of as difference equation model. Carefully inspect the last three lines of any of the expanded sir models, which constitute the same set of difference equations no matter what state update method is used. This similarity indicates that all state update methods boil down to discrete-time difference equations, but with different approaches to calculating the absolute flow rates (e.g., infection and recovery ). These absolute rate variables describe the changes in the state variables (e.g., S , I , R ) due to different processes over a single time-step, allowing for continuous change within a time-step. This means, for example, that the time-series of the infection variable will contain the incidence over each time-step. So, for example, if each time-step represents one week, the infection variable will contain weekly incidence values.In summary, the choice of state update method will often not matter in applied work, but when it does the state update methods make it simple to explore options. |
Write out by hand the last four lines of an expanded SEIR model. Read the SEIR model from the model library and test if you were correct. |
Follow the advice here to create two copies of the SIR model: one called SI and one called SEIR. Place both of these models in a directory on your computer called macpan-workshop-library . Or just do one model as time permits.Simplify the model called SI so that it is an SI model (check your work here). Add flows to the model called SEIR so that it becomes an SEIR model (check your work here). Simulate a time-series of incidence from one of these models and plot it. |
The previous exercise described one approach to modifying existing
models, which involves copying a directory and editing the files in that
directory. Another approach uses the mp_tmb_update()
,
mp_tmb_insert()
,
mp_tmb_delete()
functions, which take a loaded model specification object (like the
sir
object above) and returns a modified version of that object.
Read this help page and be ready to discuss the differences between these functions: https://canmod.github.io/macpan2/reference/mp_tmb_insert. Use these functions to repeat the previous exercise by modifying the sir object within an R session instead of editing a model definition file. |
Now we will use data and simulations together to understand a system.
Let’s get started by loading in some familiar Ontario COVID-19 data.
release = "https://github.com/canmod/macpan2/releases/download/macpan1.5_data"
covid_on = (release
|> file.path("covid_on.RDS")
|> url()
|> readRDS()
)
covid_on |> head() |> print()
## # A tibble: 6 × 4
## province date var value
## <chr> <date> <chr> <dbl>
## 1 ON 2020-02-04 Hospitalization NA
## 2 ON 2020-02-04 ICU NA
## 3 ON 2020-02-04 Ventilator NA
## 4 ON 2020-02-04 report NA
## 5 ON 2020-02-04 newTests NA
## 6 ON 2020-02-04 death NA
covid_on |> summary()
## province date var value
## Length:11481 Min. :2020-01-16 Length:11481 Min. :-26834981
## Class :character 1st Qu.:2021-01-27 Class :character 1st Qu.: 1
## Mode :character Median :2022-03-10 Mode :character Median : 95
## Mean :2022-03-14 Mean : 638
## 3rd Qu.:2023-05-26 3rd Qu.: 694
## Max. :2024-04-16 Max. : 26834981
## NA's :1793
Hmmm … lots of suspicious things to investigate … but let’s stay focused on case reports, which I will compare with simulated incidence.
reports = (covid_on
|> filter(var == "report")
|> filter(abs(value) < 10^4) ## do not believe numbers higher than 10^4
)
(reports
|> ggplot()
+ geom_line(aes(date, value))
+ theme_bw()
)
Just to start, let’s focus on a single wave so I do not need to worry about time-varying parameters and/or the causes of waves.
early_reports = filter(reports, date < as.Date("2020-07-01"))
(early_reports
|> ggplot()
+ geom_line(aes(date, value))
+ theme_bw()
)
Before comparing these data with simulations, I update the sir
model
to make it more realistic. We do not need to be perfect at this stage,
just reasonable. The goal is to make an initial plot containing both
simulations and observed data. This will form the foundation for
modifying the model to make it capture important features of the
underlying processes.
sir_covid = mp_tmb_insert(sir
## Modify the the part of the model that
## is executed once "during" each iteration
## of the simulation loop.
, phase = "during"
## Insert the new expressions at the
## end of each iteration.
## (Inf ~ infinity ~ end)
, at = Inf
## Specify the model for under-reporting as
## a simple fraction of the number of new
## cases every time-step.
, expressions = list(reports ~ report_prob * infection)
## Update defaults to something a little more
## reasonable for early covid in Ontario.
, default = list(
gamma = 1/14 ## 2-week recovery
, beta = 2.5/14 ## R0 = 2.5
, report_prob = 0.1 ## 10% of cases get reported
, N = 1.4e7 ## ontario population
, I = 50 ## start with undetected infections
)
)
print(sir_covid)
## ---------------------
## Default values:
## quantity value
## beta 1.785714e-01
## gamma 7.142857e-02
## N 1.400000e+07
## I 5.000000e+01
## R 0.000000e+00
## report_prob 1.000000e-01
## ---------------------
##
## ---------------------
## Before the simulation loop (t = 0):
## ---------------------
## 1: S ~ N - I - R
##
## ---------------------
## At every iteration of the simulation loop (t = 1 to T):
## ---------------------
## 1: mp_per_capita_flow(from = "S", to = "I", rate = "beta * I / N",
## abs_rate = "infection")
## 2: mp_per_capita_flow(from = "I", to = "R", rate = "gamma", abs_rate = "recovery")
## 3: reports ~ report_prob * infection
Now we simulate the trajectory of the reports
variable in the model, and plot it against the observed data.
sim_early_reports = (sir_covid
|> mp_simulator(
time_steps = nrow(early_reports)
, outputs = "reports"
)
|> mp_trajectory()
|> mutate(
date = min(early_reports$date) + days(time)
)
)
comparison_early_reports = bind_rows(
list(
simulated = sim_early_reports
, observed = early_reports
)
, .id = "source"
)
(comparison_early_reports
|> ggplot()
+ geom_line(aes(date, value, colour = source))
## constrain y-axis to the range of the
## observed data, otherwise the simulations
## will make it impossible to see the data
+ coord_cartesian(ylim = c(0, max(early_reports$value, na.rm = TRUE)))
+ theme_bw() + theme(legend.position="bottom")
)
It took me a few iterations of model adjustments to make this plot, which illustrates a reasonably good fit before the peak of the wave and quite poor fit after it. Still, the next two concepts argue that this is quite a bit of progress and sets us up nicely for next steps.
The challenge introduced by reporting delays is that we cannot compare the simulated incidence values from our model with observed case reports, because there is a reporting delay. The following toy example shows how the apparent peak in the data follows the simulations.
So before comparing the simulations with observed data, we use convolution to transform the simulated incidence values into a time series of reported cases that is expected by the model. Keep reading to find out specifically how this is done.
Now we can now compare the observed and expected reported cases in a manner that accounts for reporting delays.
Now to explain how the expected reported case curve is computed. In this method we assume that each new case waits a Gamma-distributed random amount of time before it is observed. The mean and coefficient of variation of this distribution can be fitted parameters. During the COVID-19 pandemic the McMaster group found that a Gamma distribution with mean delay of 11 days and coefficient of variation of 0.25 was a reasonable model.
The following figure illustrates how a single expected case report is computed, in this case at time 26. The Gamma distribution of delay times is represented as bars giving the probability that a new case at a particular time is reported at time 26, which is given as the dashed horizontal line. Note how the reported number of cases is roughly near (but not identical) to the simulated number of cases at the peak of the delay distribution. This makes sense because we would expect most of our reports now to come from the highest probability time in the past.
In macpan2
we typically have a model that doesn’t account for delayed
reporting and we want to update it to do so before fitting to data. One
can use the mp_tmb_insert_reports()
function for this purpose.
si_with_delays = (si
|> mp_tmb_insert_reports(
incidence_name = "infection"
, mean_delay = 11
, cv_delay = 0.25
, report_prob = 1
, reports_name = "reports"
)
)
(si_with_delays
|> mp_simulator(
time_steps = 50L
, outputs = c("infection", "reports")
)
|> mp_trajectory()
|> ggplot()
+ geom_line(aes(time, value, colour = matrix))
+ theme_bw()
)
Notice how the 11
-day reporting delay specific using the mean_delay
argument of the mp_tmb_insert_reports
function has produced a
reports
variable that peaks about 9
or 10
days after the new
infection
peak. Non-linear
averaging has
shifted the individual-level delay of 11
days to something a little
lower at the population level. Still, the mp_tmb_insert_reports
function has had the desired qualitative effect of shifting the
reports
peak to the right of the infection
peak.
Read the documentation on mp_tmb_insert_reports, and have a look at the following model library examples that utilize this delay modelling functionality: shiver, macpan_base, wastewater. |
When these data are available and of high quality it is typically straightforward to compare them with simulations. One must obviously have a model containing hospital compartments (or modify an existing model to contain such compartments). One must also make sure that admissions are compared with flows into a hospital compartment and that occupancy is compared with a state variable of a hospital compartment.
The shiver model gives an example that utilizes hospital data. |
The simple compartmental models (i.e., si, sir, sir_waning, seir) do not account for variation in symptoms amongst infected individuals. However, the severity of symptoms can have a big impact on what cases get reported.
For some applications it is necessary to distinguish between different
symptom statuses, which we do by splitting up the I
box. This
splitting up can be done by replacing all calls to
mp_per_capita_flows()
that involve I
with flows that involve several
I
boxes (e.g., I_mild
, I_severe
) that group together individuals
with similar symptom status.
The macpan_base model utilizes this technique. A discussion of this technique is also given in the examples on the mp_per_capita_flow() help page. |
Modify the si model so that it has three boxes: S , I_mild , and I_severe . |
In cases where report data are broken down by symptom status, then this
technique of splitting up I
into several boxes might be sufficient.
However, if cases with different symptom statuses are combined in
reports, an additional model component must be added that weights the
relative likelihood that different symptom statuses will be reported.
This additional component is related to the reporting
delay functionality.
It is often the case that serological data can be used to estimate the degree of immunity in the population, which can in turn be used to estimate the number of individuals in recovered, vaccinated, and otherwise immune (or partially immune) compartments. The shiver model description provides an example of this technique, which involves transforming either the raw seroprevalence data so that it is comparable with different compartments or adding a variable to the model that produces a transformation of the variables tracking immune states that is comparable with the raw seroprevalence data.
If you happen to now the vaccination schedule (e.g., number of doses administered per time-step for each time-step in the simulation), one simple way to model vaccination is to create a time-varying parameter that moves a known time-dependent number of individuals from a susceptible compartment to a protected compartment. An even simpler variation is just to include a constant number of doses per day, but this approach can be too simplistic especially when investigating questions of vaccine efficiacy.
One complexity with this approach is that it can reduce the number of
susceptible individuals below zero, if the user is not careful. This can
happen when the simulated number of S
individuals drops below the
number of doses per day. This issue could be addressed by using the
minimum of S
and the known number of doses as the realized number of
doses. However this approach has severe drawbacks when calibrating,
because as we will learn in the Calibration section
macpan2
uses the
TMB package
for optimizing fit to data and TMB becomes much less efficient when we
use discontinuous functions like this. To address this complexity we can
use continuously saturating functions of S
to model the number of
doses, a technique that is covered here with the shiver
model.
This technique is also discussed in the examples on the
mp_per_capita_flow() help
page.
Wastewater models must have two kinds of compartments: those that
represent numbers of people and those that represent the concentration
of a pathogen in wastewater. Flows from a person-box into a
wastewater-box (e.g., shedding from I
to W
) requires one to use the
mp_per_capita_inflow()
function instead of the mp_per_capita_flow()
function (see
here
for details). This is because flowing from I
to W
does not cause a
reduction in the size of I
, because people are not becoming
pathogens. The mp_per_capita_inflow()
function ensures that W
is
increased by this shedding process without decreasing I
. This
technique is illustrated
here.
In many cases birth-death dynamics happen slowly enough relative to transmission dynamics that we can assume that the population size is constant, with no birth or death.
The sir_demog model includes birth and death. You can use this example to add vital dynamics to another model. Neither birth nor death can use mp_per_capita_flow() , but instead mp_per_capita_inflow() and mp_per_capita_outflow() (see here for examples and details). |
In many cases large jurisdictions can be considered closed and deterministic systems without altering conclusions. Small populations however are more strongly impacted by stochasticity.
The awareness model in the library illustrates how macpan2 can be used in these cases. |
Participants will learn how to parameterize models for making inferences about a particular population and public health problem.
We have done this already, because it is so common that the default
values provided in the model
library will
require adjustment for any particular purpose – especially for very
generic models like
sir.
See the sir_covid
model object that we produced above in this
section.
The default
argument of the
mp_tmb_model_spec()
and mp_tmb_update()
functions allows us to
supply prior information on model parameter values, but gives us no
opportunity to express uncertainty about these defaults. Later in the
Optimized Calibration section we will see how
to combine a model specification with distributional
specifications
for the parameters that we would like to estimate through calibration.
The distributional specifications are explained in the context of various use-cases in this article. |
We have covered a lot of gound above, but it has all paved the way towards the topic of formally calibrating a model using data. Before getting to this point we needed to do the following.
- Find a model specification to start with.
- Be able to simulate from this model
- Be able to modify this model so that we could compare simulations with different kinds of observed data – for example:
- Express prior uncertainty about model parameters
Now that we have this out of the way we can create a model calibrator object that combines (1) a model specification, (2) a dataset that can be compared with simulations from this model, and (3) a list of distributional assumptions about model parameters that we would like to estimate. Such a calibrator can be used to fit these parameters to data, and then simulate and forecast observable variables using this calibrated model. These calibrators can also be used to generate confidence intervals for these simulations and forecasts using (regularized) maximum likelihood theory.
This article steps through the process of calibrating a compartmental model specification, using the sanity check of fitting a model to simulations generated from the same model. The sequel to this article illustrates these techniques using real scarlet fever data from the International Infectious Disease Data Archive. |
The recommended approach to model fitting with macpan2
is maximum a
posteriori (MAP)
estimation,
which finds the values of parameters at the peak of a posterior
distribution. This approach allows us to get approximate Bayesian
uncertainty estimates by approximating the posterior distribution as a
multivariate Gaussian centered at the peak obtained by MAP. One is able
to declare that parameters be considered random effects, which means
that they are integrated out of the posterior distribution using the
Laplace
approximation.
Using this approach allows us to reduce the dimensionality of the
posterior distribution used in MAP by only optimizing over the fixed
effect parameters.
This approximate posterior distribution can be used to obtain confidence / credible intervals for any continuous function of model variables. The two main examples of this are as follows.
- Confidence intervals on fitted parameters using
mp_tmb_coef(., conf.int = TRUE)
. - Confidence intervals on fitted trajectories using
mp_trajectory_sd()
This approach to computing confidence intervals using Gaussian approximations to the posterior is a special case of the delta method that is implemented in TMB, which is described here.
Example calibrations of varying levels of complexity Intro to calibration using toy data Intro to calibration using real scarlet fever data For hackers only McMaster COVID model Wastewater model of COVID SHIVER model to COVID SEIR model using toy data |
You are free to declare any value in the model to be a trajectory or
parameter. Trajectories are compared with observed data to produce a
likelihood component (e.g., case reports are distributed with a negative
binomial distribution with dispersion parameter 1
) and parameters are
used to produce a component of the prior distribution (e.g., the
transmission rate is log normally distributed with mean 0.2
and
standard deviation 1
). Available distributional assumptions and
settings are described
here,
but inspecting the examples linked to in the previous FAQ will be more
useful for figuring out how to use this functionality.
Yes.
A common approach for making simple models (i.e., si, sir, sir_waning, seir, shiver) applicable to complex reality, is to use time-varying parameters. A common example from the COVID-19 pandemic involved the variation of the transmission rate through time in response to public health measures.
This article describes several approaches to calibrating time-varying parameters. |
Participants will learn how to make inferences using realistically parameterized models.
If you have a model_calibrator
object fitted to a dataset called
fitted_data
, you can visualize the goodness-of-fit of this model to
the data using ggplot2
in the following manner.
fitted_data = mp_trajectory(model_calibrator)
(observed_data
|> ggplot()
+ geom_point(aes(time, value))
+ geom_line(aes(time, value), data = fitted_data)
+ theme_bw()
)
This section of the article on model fitting illustrates other techniques, such as checking convergence of the optimizer, getting parameter estimates with confidence intervals, and visualizing the goodness-of-fit with confidence intervals. |
Forecasts can be made by extending the simulation bounds, and using the general tools for assessing model fits.
Counter-factuals can be introduced by using the model modification tools, and using the general tools for assessing model fits. Counter-factuals commonly involve making parameters time-varying in the forecast period (e.g., in 30 days the transmission rate will change due to non-pharmaceutical interventions being considered).
This example model includes a discussion of how to compute reproduction numbers and the moments of the generation interval distribution for a general compartmental model.
When one extends one of the simple models (i.e.,
si,
sir,
sir_waning,
seir,
shiver)
to be stratified (e.g., by age), transmission/infection processes
become more
complex.
In these cases one may specify S
, E
, and I
as vectors, and
infection rates as matrix operations involving these vectors and
matrices of transmission parameters. This approach is desecribed
here.
- Preliminaries
- Exploration
- Starter Model Library
- Simulating Dynamics
- Relating Model Specifications to Box Diagrams
- Relating Model Specifications to Explicit Dynamics
- Basics of Modifying Model Specifications
- Compare Simulated and Observed Incidence
- Reporting Delays
- Hospital Admissions and Occupancy
- Symptom Status
- Seroprevalence
- Vaccination
- Wastewater Pathogen Concentrations
- Vital Dynamics
- Importation and Stochasticity
- Calibration
- Inference
- Stratification
Footnotes
-
These defaults can be overridden but let’s ignore that for now. We will cover the various ways to override defaults later. ↩