You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: R/apm_est.R
+29-29
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,20 @@
1
1
#' Estimate ATTS from models fits
2
2
#'
3
-
#' @description `eepd_est()` computes the ATTs from the models previously fit by [eepd_pre()], choosing the optimal one by minimizing the largest absolute average prediction error across validation times. Optionally, this process can be simulated to arrive at a distribution of ATTs that accounts for the uncertainty in selecting the optimal model. `plot()` plots the resulting ATT(s).
3
+
#' @description `apm_est()` computes the ATTs from the models previously fit by [apm_pre()], choosing the optimal one by minimizing the largest absolute average prediction error across validation times. Optionally, this process can be simulated to arrive at a distribution of ATTs that accounts for the uncertainty in selecting the optimal model. `plot()` plots the resulting ATT(s).
4
4
#'
5
-
#' @inheritParams eepd_pre
6
-
#' @param fits an `eepd_pre_fits` object; the output of a call to [eepd_pre()].
5
+
#' @inheritParams apm_pre
6
+
#' @param fits an `apm_pre_fits` object; the output of a call to [apm_pre()].
7
7
#' @param post_time the value of the time variable considered post-treatment, for which the ATT is to be estimated.
8
-
#' @param M the sensitivity parameter for set identification. For `eepd_est()`, the default is 0, i.e., under point identification. For `summary()`, this can be set to one or more positive values to produce uncertainty bounds for each value. Only allowed when not set to 0 in the call to `eepd_est()`. See Details.
8
+
#' @param M the sensitivity parameter for set identification. For `apm_est()`, the default is 0, i.e., under point identification. For `summary()`, this can be set to one or more positive values to produce uncertainty bounds for each value. Only allowed when not set to 0 in the call to `apm_est()`. See Details.
9
9
#' @param R the number of bootstrap iterations used to compute the sampling variance of the ATT. Default is 1000. More is better but takes longer.
10
10
#' @param all_models `logical`; whether to compute ATTs for all models (`TRUE`) or just those with BMA weights greater than 0 (`FALSE`, default). This will not effect the final estimates but leaving as `FALSE` can speed up computation when some models have BMA weights of 0.
11
-
#' @param x,object an `eepd_est` object; the output of a call to `eepd_est()`.
11
+
#' @param x,object an `apm_est` object; the output of a call to `apm_est()`.
12
12
#' @param level the desired confidence level. Set to 0 to ignore sampling variation in computing the interval bounds. Default is .95.
13
13
#' @param label `logical`; whether to label the ATT estimates. Requires \pkg{ggrepel} to be installed. Default is `TRUE`.
14
14
#' @param size.weights `logicsl`; whether to size the points based on their BMA weights. Default is `TRUE`.
15
15
#'
16
16
#' @returns
17
-
#' `eepd_est()` returns an `eepd_est` object, which contains the ATT estimates and their variance estimates. The following components are included:
17
+
#' `apm_est()` returns an `apm_est` object, which contains the ATT estimates and their variance estimates. The following components are included:
18
18
#' \describe{
19
19
#' \item{BMA_att}{the BMA-weighted ATT}
20
20
#' \item{atts}{a matrix containing the ATT estimates from each model (when `all_models = FALSE`, only models with positive BMA weights are included)}
@@ -24,7 +24,7 @@
24
24
#' \item{M}{the value of the sensitivity parameter `M`}
25
25
#' \item{post_time}{the value supplied to `post_time`}
26
26
#' \item{pred_errors}{a matrix containing the difference in average prediction errors for each model and each pre-treatment validation period}
27
-
#' \item{BMA_weights}{the BMA weights computed by `eepd_pre()` (when `all_models = FALSE`, only positive BMA weights are included)}
27
+
#' \item{BMA_weights}{the BMA weights computed by `apm_pre()` (when `all_models = FALSE`, only positive BMA weights are included)}
28
28
#' \item{boot_out}{an `fwb` object containing the bootstrap results}
29
29
#' }
30
30
#'
@@ -33,35 +33,35 @@
33
33
#' `summary()` produces a table with the BMA-weighted ATT, it's estimated standard error, and confidence interval limits. When `M` is greater than 0, additional rows for each value of `M` are included with the lower and upper bound. When `level` is greater than 0, these bounds include the uncertainty due to sampling and model selection; otherwise, they correspond to the set identification bounds for the ATT.
34
34
#'
35
35
#' @details
36
-
#' `eepd_est()` estimates the ATT from each model and combines them to form the BMA-weighted estimate of the ATT. Uncertainty for the BMA-weighted ATT is computed by combining two variance components, one that account for sampling and another that accounts for model selection. The component due to sampling is computed by bootstrapping the process of fitting the outcome model for the post-treatment outcome identified by `post_time` and computing the difference between the observed outcome mean difference and the model-predicted outcome mean difference. The fractional weighted bootstrap as implemented in [fwb::fwb()] is used to ensure no units are dropped from the analysis. In each bootstrap sample, the BMA-weighted ATT estimate is computed as the weighted average of the ATTs computed from the models using the fixed BMA weights computed by [eepd_pre()], and the variance is computed as the empirical variance over the bootstrapped estimates. The variance component due to model selection is computed as the BMA-weighted variance of the original ATTs.
36
+
#' `apm_est()` estimates the ATT from each model and combines them to form the BMA-weighted estimate of the ATT. Uncertainty for the BMA-weighted ATT is computed by combining two variance components, one that account for sampling and another that accounts for model selection. The component due to sampling is computed by bootstrapping the process of fitting the outcome model for the post-treatment outcome identified by `post_time` and computing the difference between the observed outcome mean difference and the model-predicted outcome mean difference. The fractional weighted bootstrap as implemented in [fwb::fwb()] is used to ensure no units are dropped from the analysis. In each bootstrap sample, the BMA-weighted ATT estimate is computed as the weighted average of the ATTs computed from the models using the fixed BMA weights computed by [apm_pre()], and the variance is computed as the empirical variance over the bootstrapped estimates. The variance component due to model selection is computed as the BMA-weighted variance of the original ATTs.
37
37
#'
38
38
#' When `M` is greater than 0, bounds for set identification and their uncertainty are additionally computed. This involves bootstrapping the fitting of the pre-period models along with post-treatment models on order to compute the maximum absolute difference in average prediction errors for each model across validation periods. Each bootstrap sample produces a margin of error for each model computed as \eqn{M \times \delta_m} where \eqn{\delta_m} is the maximum absolute difference in average prediction errors for model \eqn{m}. Upper and lower bounds for the set-identified BMA-weighted ATT are computed as \eqn{\text{ATT}_m \pm M \times \delta_m}. The same procedure as above is then used to compute the variance of these bounds.
39
39
#'
40
40
#' `summary()` displays the BMA-weighted ATT estimate, its standard error, and Wald confidence intervals. When `M` is greater than 0, bounds for the set-identified ATT are displayed in the confidence interval bound columns. The lower bound is computed as \eqn{\text{LB} - \sigma_{LB}Z_{l}} and the upper bound as \eqn{\text{UB} + \sigma_{UB}Z_{l}}, where \eqn{\text{LB}} and \eqn{\text{UB}} are the lower and upper bounds, \eqn{\sigma_{LB}} and \eqn{\sigma_{UB}} are their variances accounting for sampling and model selection, and \eqn{Z_{l}} is the critical Z-statistic for confidence level \eqn{l}. To display the set-identification bounds themselves, one should set `level = 0`.
41
41
#'
42
-
#' @seealso [eepd_pre()] for computing the BMA weights; [fwb::fwb()] for the fractional weighted bootstrap.
42
+
#' @seealso [apm_pre()] for computing the BMA weights; [fwb::fwb()] for the fractional weighted bootstrap.
43
43
#'
44
44
#'
45
45
#' @examples
46
46
#' data("ptpdata")
47
47
#'
48
48
#' # Combination of 4 models: 2 time trends, 2 lags
49
-
#' models <- eepd_mod(list(crude_rate ~ 1),
49
+
#' models <- apm_mod(list(crude_rate ~ 1),
50
50
#' lag = 0:1,
51
51
#' time_trend = 0:1)
52
52
#' models
53
53
#'
54
54
#' # Fit the models to data; unit_var must be supplied for
Copy file name to clipboardExpand all lines: R/apm_mod.R
+18-18
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,18 @@
1
1
#' Generate models used to fit outcomes
2
2
#'
3
-
#' @description `eepd_mod()` generates a list of models characterized by a basic model formulas and other options (e.g., lags, families, etc.) that are supplied to [eepd_pre()]. These values are completely crossed to create a grid of model specifications, and multiple sets of model specifications can be combined using `c()` (see Examples).
3
+
#' @description `apm_mod()` generates a list of models characterized by a basic model formulas and other options (e.g., lags, families, etc.) that are supplied to [apm_pre()]. These values are completely crossed to create a grid of model specifications, and multiple sets of model specifications can be combined using `c()` (see Examples).
4
4
#'
5
5
#' @param formula_list a list of model formulas with the outcome on the left side and predictions (or just an intercept) on the right side.
6
-
#' @param family a list of family specifications; see [family()] for allowable options. These will eventually be passed to [glm()] when fitting the models in [eepd_pre()]. `"negbin"` can also be supplied to request a negative binomial model with a log link fit using [MASS::glm.nb()]. Default is `"gaussian"` to specify a linear model.
6
+
#' @param family a list of family specifications; see [family()] for allowable options. These will eventually be passed to [glm()] when fitting the models in [apm_pre()]. `"negbin"` can also be supplied to request a negative binomial model with a log link fit using [MASS::glm.nb()]. Default is `"gaussian"` to specify a linear model.
7
7
#' @param lag a vector of integers indicating the desired outcome lags to be used as predictors. For example, a `lag` value of 3 means the outcome lagged once, twice, and three times will be included as predictors. Default is 0 for no lags.
8
-
#' @param diff_k a vector of integers indicating the desired outcome lag to be used a an offset For example, a `diff_k` value of 1 means the prior time point's outcome will be included as an offset, equivalent to using the outcome minus its corresponding lag as the outcome of the corresponding model. Default is 0 for no lags. Any models with a `diff_k` value less than a `lag` value will be removed automatically. When used with a family with a log link, the lags are automatically log-transformed; an error will be thrown by `eepd_pre()` if nonpositive values are present in the outcome.
8
+
#' @param diff_k a vector of integers indicating the desired outcome lag to be used a an offset For example, a `diff_k` value of 1 means the prior time point's outcome will be included as an offset, equivalent to using the outcome minus its corresponding lag as the outcome of the corresponding model. Default is 0 for no lags. Any models with a `diff_k` value less than a `lag` value will be removed automatically. When used with a family with a log link, the lags are automatically log-transformed; an error will be thrown by `apm_pre()` if nonpositive values are present in the outcome.
9
9
#' @param log a logical vector indicating whether the outcome should be log-transformed. Default is `FALSE` to use the original outcome. When `lag` or `diff_k` are greater than 0, the outcome lags will also be log-transformed if `TRUE`. When the family has a log link and `diff_k` is greater than zero, the lag in the offset will be log transformed.
10
10
#' @param time_trend a vector of integers indicating the desired powers to be included in a time trend. For example, a `time_trend` value of 2 means the time variable and its square will be included as predictors in the model. A value of 0 (the default) means time is not included as a predictor.
11
11
#' @param fixef a logical vector indicating whether unit fixed effects should be included as predictors. Default is `FALSE` to omit unit fixed effects.
12
12
#' @param identiy_only_log `logical`; whether to omit any models in which `log` is `TRUE` but the link in the `family` specification corresponds to something other than `"identity"`. Default is `TRUE`, and this should probably not be changed.
13
13
#'
14
14
#' @returns
15
-
#' An `eepd_models` object, which is a list containing the full cross (less any omitted combinations) of the model features specified in the arguments, with each combination a list. These have a `print()` method and can be combined using `c()`. Each model is named automatically, but these can be set manually using [names()] as well. Models can be removed by setting their value to `NULL`; see Examples.
15
+
#' An `apm_models` object, which is a list containing the full cross (less any omitted combinations) of the model features specified in the arguments, with each combination a list. These have a `print()` method and can be combined using `c()`. Each model is named automatically, but these can be set manually using [names()] as well. Models can be removed by setting their value to `NULL`; see Examples.
16
16
#'
17
17
#' @seealso [formula], [family]
18
18
#'
@@ -21,14 +21,14 @@
21
21
#'
22
22
#' # Combination of 8 models: 1 baseline formulas,
23
23
#' # 2 families, 2 lags, 2 time trends
24
-
#' models1 <- eepd_mod(crude_rate ~ 1,
24
+
#' models1 <- apm_mod(crude_rate ~ 1,
25
25
#' family = list("gaussian", "quasipoisson"),
26
26
#' time_trend = 0:1,
27
27
#' lag = 0:1, fixef = TRUE)
28
28
#' models1
29
29
#'
30
30
#' # Add a single other model with a square time trend
0 commit comments