Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert roxygen2 comments to markdown #115

Merged
merged 14 commits into from
Nov 22, 2021
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -72,3 +72,4 @@ BugReports: https://github.com/epiforecasts/scoringutils/issues
VignetteBuilder: knitr
Depends:
R (>= 2.10)
Roxygen: list(markdown = TRUE)
2 changes: 1 addition & 1 deletion R/absolute_error.R
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ ae_median_sample <- function(true_values, predictions) {
#' @param quantiles numeric vector that denotes the quantile for the values
#' in `predictions`. Only those predictions where `quantiles == 0.5` will
#' be kept. If `quantiles` is `NULL`, then all `predictions` and
#' `true_values` will be used (this is then the same as `absolute_error()`)
#' `true_values` will be used (this is then the same as [absolute_error()])
#' @param verbose logical, return a warning is something unexpected happens
#' @return vector with the scoring values
#' @importFrom stats median
Expand Down
12 changes: 8 additions & 4 deletions R/bias.R
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
#' number of Monte Carlo samples
#' @return vector of length n with the biases of the predictive samples with
#' respect to the true values.
#' @author Nikos Bosse \email{[email protected]}
#' @author Nikos Bosse \email{nikosbosse@@gmail.com}
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#' @examples
#'
#' ## integer valued forecasts
Expand All @@ -51,8 +51,12 @@
#' @export
#' @references
#' The integer valued Bias function is discussed in
#' Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone, 2014-15
#' Funk S, Camacho A, Kucharski AJ, Lowe R, Eggo RM, et al. (2019) Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone, 2014-15. PLOS Computational Biology 15(2): e1006785. https://doi.org/10.1371/journal.pcbi.1006785
#' Assessing the performance of real-time epidemic forecasts: A case study of
#' Ebola in the Western Area region of Sierra Leone, 2014-15 Funk S, Camacho A,
#' Kucharski AJ, Lowe R, Eggo RM, et al. (2019) Assessing the performance of
#' real-time epidemic forecasts: A case study of Ebola in the Western Area
#' region of Sierra Leone, 2014-15. PLOS Computational Biology 15(2): e1006785.
#' <doi:10.1371/journal.pcbi.1006785>


bias <- function(true_values, predictions) {
Expand Down Expand Up @@ -160,7 +164,7 @@ bias <- function(true_values, predictions) {
#' of the central prediction interval
#' @param true_value a single true value
#' @return scalar with the quantile bias for a single quantile prediction
#' @author Nikos Bosse \email{[email protected]}
#' @author Nikos Bosse \email{nikosbosse@@gmail.com}
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#' @examples
#'
#' lower <- c(6341.000, 6329.500, 6087.014, 5703.500,
Expand Down
97 changes: 47 additions & 50 deletions R/eval_forecasts.R
Original file line number Diff line number Diff line change
@@ -1,114 +1,111 @@
#' @title Evaluate forecasts
#'
#' @description The function \code{eval_forecasts} is an easy to use wrapper
#' of the lower level functions in the \code{scoringutils} package.
#' @description The function `eval_forecasts` is an easy to use wrapper
#' of the lower level functions in the \pkg{scoringutils} package.
#' It can be used to score probabilistic or quantile forecasts of
#' continuous, integer-valued or binary variables.
#'
#' @details the following metrics are used where appropriate:
#' \itemize{
#' \item {Interval Score} for quantile forecasts. Smaller is better. See
#' \code{\link{interval_score}} for more information. By default, the
#' [interval_score()] for more information. By default, the
#' weighted interval score is used.
#' \item {Brier Score} for a probability forecast of a binary outcome.
#' Smaller is better. See \code{\link{brier_score}} for more information.
#' Smaller is better. See [brier_score()] for more information.
#' \item {aem} Absolute error of the median prediction
#' \item {Bias} 0 is good, 1 and -1 are bad.
#' See \code{\link{bias}} for more information.
#' \item {Sharpness} Smaller is better. See \code{\link{sharpness}} for more
#' See [bias()] for more information.
#' \item {Sharpness} Smaller is better. See [sharpness()] for more
#' information.
#' \item {Calibration} represented through the p-value of the
#' Anderson-Darling test for the uniformity of the Probability Integral
#' Transformation (PIT). For integer valued forecasts, this p-value also
#' has a standard deviation. Larger is better.
#' See \code{\link{pit}} for more information.
#' See [pit()] for more information.
#' \item {DSS} Dawid-Sebastiani-Score. Smaller is better.
#' See \code{\link{dss}} for more information.
#' See [dss()] for more information.
#' \item {CRPS} Continuous Ranked Probability Score. Smaller is better.
#' See \code{\link{crps}} for more information.
#' See [crps()] for more information.
#' \item {Log Score} Smaller is better. Only for continuous forecasts.
#' See \code{\link{logs}} for more information.
#' See [logs()] for more information.
#' }
#'
#' @param data A data.frame or data.table with the predictions and observations.
#' Note: it is easiest to have a look at the example files provided in the
#' package and in the examples below.
#' The following columns need to be present:
#' \itemize{
#' \item \code{true_value} - the true observed values
#' \item \code{prediction} - predictions or predictive samples for one
#' \item `true_value` - the true observed values
#' \item `prediction` - predictions or predictive samples for one
#' true value. (You only don't need to provide a prediction column if
#' you want to score quantile forecasts in a wide range format.)}
#' For integer and continuous forecasts a \code{sample} column is needed:
#' For integer and continuous forecasts a `sample` column is needed:
#' \itemize{
#' \item \code{sample} - an index to identify the predictive samples in the
#' \item `sample` - an index to identify the predictive samples in the
#' prediction column generated by one model for one true value. Only
#' necessary for continuous and integer forecasts, not for
#' binary predictions.}
#' For quantile forecasts the data can be provided in variety of formats. You
#' can either use a range-based format or a quantile-based format. (You can
#' convert between formats using \code{\link{quantile_to_range_long}},
#' \code{\link{range_long_to_quantile}},
#' \code{\link{sample_to_range_long}},
#' \code{\link{sample_to_quantile}})
#' convert between formats using [quantile_to_range_long()],
#' [range_long_to_quantile()],
#' [sample_to_range_long()],
#' [sample_to_quantile()])
#' For a quantile-format forecast you should provide:
#' \itemize{
#' \item {prediction} - prediction to the corresponding quantile
#' \item {quantile} - quantile to which the prediction corresponds}
#' - `prediction`: prediction to the corresponding quantile
#' - `quantile`: quantile to which the prediction corresponds
#' For a range format (long) forecast you need
#' \itemize{
#' \item \code{prediction} the quantile forecasts
#' \item \code{boundary} values should be either "lower" or "upper", depending
#' - `prediction`: the quantile forecasts
#' - `boundary`: values should be either "lower" or "upper", depending
#' on whether the prediction is for the lower or upper bound of a given range
#' \item {range} the range for which a forecast was made. For a 50\% interval
#' the range should be 50. The forecast for the 25\% quantile should have
#' the value in the \code{prediction} column, the value of \code{range}
#' should be 50 and the value of \code{boundary} should be "lower".
#' If you want to score the median (i.e. \code{range = 0}), you still
#' - `range` the range for which a forecast was made. For a 50%% interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is %% correct markdown here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I seem to remember that %% is the correct way to escape % in LaTeX rather than \% but I cannot find any source for this so I might have invented it 🤷 😅.

Anyways, it seems that % doesn't need to be escape in markdown and that both \% and %% produce extra characters:

testfile <- tempfile(fileext = ".md")

write(
r"(
10%
20\%
30%%
)",
testfile
)

knitr::knit(testfile, output = stdout())
#> processing file: /tmp/RtmpPqcVnl/file301a18fe1273.md
#> 
#> 10%
#> 20\%
#> 30%%
#> A connection with                            
#> description "output"        
#> class       "textConnection"
#> mode        "wr"            
#> text        "text"          
#> opened      "opened"        
#> can read    "no"            
#> can write   "yes"

Created on 2021-08-04 by the reprex package (v2.0.0.9000)

Thanks for making me check!

#' the range should be 50. The forecast for the 25%% quantile should have
#' the value in the `prediction` column, the value of `range`
#' should be 50 and the value of `boundary` should be "lower".
#' If you want to score the median (i.e. `range = 0`), you still
#' need to include a lower and an upper estimate, so the median has to
#' appear twice.}
#' appear twice.
#' Alternatively you can also provide the format in a wide range format.
#' This format needs
#' \itemize{
#' \item pairs of columns called something like 'upper_90' and 'lower_90',
#' This format needs:
#' - pairs of columns called something like 'upper_90' and 'lower_90',
#' or 'upper_50' and 'lower_50', where the number denotes the interval range.
#' For the median, you need to provide columns called 'upper_0' and 'lower_0'}
#' For the median, you need to provide columns called 'upper_0' and 'lower_0'
#' @param by character vector of columns to group scoring by. This should be the
#' lowest level of grouping possible, i.e. the unit of the individual
#' observation. This is important as many functions work on individual
#' observations. If you want a different level of aggregation, you should use
#' \code{summarise_by} to aggregate the individual scores.
#' Also not that the pit will be computed using \code{summarise_by}
#' instead of \code{by}
#' `summarise_by` to aggregate the individual scores.
#' Also not that the pit will be computed using `summarise_by`
#' instead of `by`
#' @param summarise_by character vector of columns to group the summary by. By
#' default, this is equal to `by` and no summary takes place.
#' But sometimes you may want to to summarise
#' over categories different from the scoring.
#' \code{summarise_by} is also the grouping level used to compute
#' `summarise_by` is also the grouping level used to compute
#' (and possibly plot) the probability integral transform(pit).
#' @param metrics the metrics you want to have in the output. If `NULL` (the
#' default), all available metrics will be computed.
#' @param quantiles numeric vector of quantiles to be returned when summarising.
#' Instead of just returning a mean, quantiles will be returned for the
#' groups specified through `summarise_by`. By default, no quantiles are
#' returned.
#' @param sd if TRUE (the default is FALSE) the standard deviation of all
#' @param sd if `TRUE` (the default is `FALSE`) the standard deviation of all
#' metrics will be returned when summarising.
#' @param pit_plots if TRUE (not the default), pit plots will be returned. For
#' details see \code{\link{pit}}.
#' @param pit_plots if `TRUE` (not the default), pit plots will be returned. For
#' details see [pit()].
#' @param interval_score_arguments list with arguments for the calculation of
#' the interval score. These arguments get passed down to
#' \code{interval_score}, except for the argument `count_median_twice` that
#' `interval_score`, except for the argument `count_median_twice` that
#' controls how the interval scores for different intervals are summed up. This
#' should be a logical (default is FALSE) that indicates whether or not
#' should be a logical (default is `FALSE`) that indicates whether or not
#' to count the median twice when summarising. This would conceptually treat the
#' median as a 0\% prediction interval, where the median is the lower as well as
#' the upper bound. The alternative is to treat the median as a single quantile
#' forecast instead of an interval. The interval score would then
#' be better understood as an average of quantile scores.)
#' @param summarised Summarise arguments (i.e. take the mean per group
#' specified in group_by. Default is TRUE.
#' @param verbose print out additional helpful messages (default is TRUE)
#' specified in group_by. Default is `TRUE.`
#' @param verbose print out additional helpful messages (default is `TRUE`)
#' @param forecasts data.frame with forecasts, that should follow the same
#' general guidelines as the `data` input. Argument can be used to supply
#' forecasts and truth data independently. Default is `NULL`.
Expand All @@ -118,9 +115,9 @@
#' `truth_data` should be merged on. Default is `NULL` and merge will be
#' attempted automatically.
#' @param compute_relative_skill logical, whether or not to compute relative
#' performance between models. If `TRUE` (default is FALSE), then a column called
#' performance between models. If `TRUE` (default is `FALSE`), then a column called
#' 'model' must be present in the input data. For more information on
#' the computation of relative skill, see \code{\link{pairwise_comparison}}.
#' the computation of relative skill, see [pairwise_comparison()].
#' Relative skill will be calculated for the aggregation level specified in
#' `summarise_by`.
#' @param rel_skill_metric character string with the name of the metric for which
Expand All @@ -139,7 +136,7 @@
#' forecasts, pit_sd is returned (to account for the randomised PIT),
#' but no Log Score is returned (the internal estimation relies on a
#' kernel density estimate which is difficult for integer-valued forecasts).
#' If \code{summarise_by} is specified differently from \code{by},
#' If `summarise_by` is specified differently from `by`,
#' the average score per summary unit is returned.
#' If specified, quantiles and standard deviation of scores can also be returned
#' when summarising.
Expand Down Expand Up @@ -190,11 +187,11 @@
#' sd = TRUE,
#' summarise_by = c("model"))
#'
#' @author Nikos Bosse \email{[email protected]}
#' @author Nikos Bosse \email{nikosbosse@@gmail.com}
#' @references Funk S, Camacho A, Kucharski AJ, Lowe R, Eggo RM, Edmunds WJ
#' (2019) Assessing the performance of real-time epidemic forecasts: A
#' case study of Ebola in the Western Area region of Sierra Leone, 2014-15.
#' PLoS Comput Biol 15(2): e1006785. <doi.org/10.1371/journal.pcbi.1006785>
#' PLoS Comput Biol 15(2): e1006785. <doi:10.1371/journal.pcbi.1006785>
#' @export

eval_forecasts <- function(data = NULL,
Expand Down
4 changes: 2 additions & 2 deletions R/eval_forecasts_binary.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#'
#' @inheritParams eval_forecasts
#' @return A data.table with appropriate scores. For more information see
#' \code{\link{eval_forecasts}}
#' [eval_forecasts()]
#'
#' @importFrom data.table ':='
#'
Expand All @@ -14,7 +14,7 @@
#' quantiles = c(0.5), sd = TRUE,
#' verbose = FALSE)
#'
#' @author Nikos Bosse \email{[email protected]}
#' @author Nikos Bosse \email{nikosbosse@@gmail.com}

eval_forecasts_binary <- function(data,
by,
Expand Down
5 changes: 3 additions & 2 deletions R/eval_forecasts_continuous_integer.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#' @param prediction_type character, should be either "continuous" or "integer"
#'
#' @return A data.table with appropriate scores. For more information see
#' \code{\link{eval_forecasts}}
#' [eval_forecasts()]
#'
#' @importFrom data.table ':=' as.data.table rbindlist %like%
#'
Expand All @@ -30,11 +30,12 @@
#' sd = TRUE,
#' summarise_by = c("model"))
#'
#' @author Nikos Bosse \email{[email protected]}
#' @references Funk S, Camacho A, Kucharski AJ, Lowe R, Eggo RM, Edmunds WJ
nikosbosse marked this conversation as resolved.
Show resolved Hide resolved
#' (2019) Assessing the performance of real-time epidemic forecasts: A
#' case study of Ebola in the Western Area region of Sierra Leone, 2014-15.
#' PLoS Comput Biol 15(2): e1006785. <doi:10.1371/journal.pcbi.1006785>
#' @author Nikos Bosse \email{nikosbosse@@gmail.com}
#' @inherit eval_forecasts references
nikosbosse marked this conversation as resolved.
Show resolved Hide resolved


eval_forecasts_sample <- function(data,
Expand Down
4 changes: 2 additions & 2 deletions R/eval_forecasts_helper.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#' @param dt the data.table operated on
#' @param varnames names of the variables for which to calculate quantiles
#' @param quantiles the desired quantiles
#' @param by grouping variable in `eval_forecasts()
#' @param by grouping variable in [eval_forecasts()]
#'
#' @return `data.table` with quantiles added
#'
Expand All @@ -30,7 +30,7 @@ add_quantiles <- function(dt, varnames, quantiles, by) {
#' Helper function used within eval_forecasts
#' @param dt the data.table operated on
#' @param varnames names of the variables for which to calculate the sd
#' @param by grouping variable in `eval_forecasts()
#' @param by grouping variable in [eval_forecasts()]
#' @importFrom data.table `%like%`
#' @return `data.table` with sd added
#'
Expand Down
19 changes: 8 additions & 11 deletions R/interval_score.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#' To improve usability, the user is asked to provide an interval range in
#' percentage terms, i.e. interval_range = 90 (percent) for a 90 percent
#' prediction interval. Correspondingly, the user would have to provide the
#' 5\% and 95\% quantiles (the corresponding alpha would then be 0.1).
#' 5%% and 95%% quantiles (the corresponding alpha would then be 0.1).
#' No specific distribution is assumed,
#' but the range has to be symmetric (i.e you can't use the 0.1 quantile
#' as the lower bound and the 0.7 quantile as the upper).
Expand All @@ -34,13 +34,13 @@
#' to alpha.
#' @param weigh if TRUE, weigh the score by alpha / 4, so it can be averaged
#' into an interval score that, in the limit, corresponds to CRPS. Default:
#' FALSE.
#' @param separate_results if TRUE (default is FALSE), then the separate parts
#' of the interval score (sharpness, penalties for over- and under-prediction
#' get returned as separate elements of a list). If you want a `data.frame`
#' instead, simply call `as.data.frmae()` on the output.
#' `FALSE.`
nikosbosse marked this conversation as resolved.
Show resolved Hide resolved
#' @param separate_results if `TRUE` (default is `FALSE`), then the separate
#' parts of the interval score (sharpness, penalties for over- and
#' under-prediction get returned as separate elements of a list). If you want a
#' `data.frame` instead, simply call [as.data.frame()] on the output.
#' @return vector with the scoring values, or a list with separate entries if
#' \code{separate_results} is TRUE.
#' `separate_results` is `TRUE`.
#' @examples
#' true_values <- rnorm(30, mean = 1:30)
#' interval_range = rep(90, 30)
Expand All @@ -65,10 +65,7 @@
#'
#' Evaluating epidemic forecasts in an interval format,
#' Johannes Bracher, Evan L. Ray, Tilmann Gneiting and Nicholas G. Reich,
#' <arXiv:2005.12881v1>
#'
#' Bracher J, Ray E, Gneiting T, Reich, N (2020) Evaluating epidemic forecasts
#' in an interval format. \url{https://arxiv.org/abs/2005.12881}
#' <https://arxiv.org/abs/2005.12881>
#'


Expand Down
Loading