Skip to content

Commit

Permalink
tidying up refs and organizing things
Browse files Browse the repository at this point in the history
  • Loading branch information
bcjaeger committed Oct 8, 2023
1 parent de2b9f0 commit e883301
Show file tree
Hide file tree
Showing 19 changed files with 256 additions and 249 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Authors@R: c(
family = "Burk",
role = "rev")
)
Description: Fit, interpret, and make predictions with oblique random survival forests. Oblique decision trees are notoriously slow compared to their axis based counterparts, but 'aorsf' runs as fast or faster than axis-based decision tree algorithms for right-censored time-to-event outcomes. Methods to accelerate and interpret the oblique random survival forest are described in Jaeger et al., (2022) <arXiv:2208.01129>.
Description: Fit, interpret, and make predictions with oblique random survival forests. Oblique decision trees are notoriously slow compared to their axis based counterparts, but 'aorsf' runs as fast or faster than axis-based decision tree algorithms for right-censored time-to-event outcomes. Methods to accelerate and interpret the oblique random survival forest are described in Jaeger et al., (2023) <DOI: 10.1080/10618600.2023.2231048>.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Expand Down
2 changes: 1 addition & 1 deletion R/orsf.R
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,7 @@
#'
#' `r roxy_cite_jaeger_2019()`
#'
#' `r roxy_cite_jaeger_2022()`
#' `r roxy_cite_jaeger_2023()`
#'
#' @export
#'
Expand Down
2 changes: 1 addition & 1 deletion R/orsf_vi.R
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@
#'
#' `r roxy_cite_menze_2011()`
#'
#' `r roxy_cite_jaeger_2022()`
#' `r roxy_cite_jaeger_2023()`
#'
#'
orsf_vi <- function(object,
Expand Down
13 changes: 7 additions & 6 deletions R/roxy.R
Original file line number Diff line number Diff line change
Expand Up @@ -191,15 +191,16 @@ roxy_cite_jaeger_2019 <- function(){

}

roxy_cite_jaeger_2022 <- function(){
roxy_cite_jaeger_2023 <- function(){

roxy_cite(
authors = "Jaeger BC, Welden S, Lenoir K, Speiser JL, Segar MW, Pandey A, Pajewski NM",
title = "Accelerated and interpretable oblique random survival forests",
journal = "arXiv e-prints",
date = "2022 Aug",
number = 'arXiv-2208',
url = "https://arxiv.org/abs/2208.01129"
journal = "Journal of Computational and Graphical Statistics",
date = "Published online 08 Aug 2023",
number = NULL,
# doi = "10.1080/10618600.2023.2231048",
url = "https://doi.org/10.1080/10618600.2023.2231048"
)

}
Expand Down Expand Up @@ -270,7 +271,7 @@ roxy_dots <- function(){
roxy_vi_describe <- function(type){

switch(type,
'negate' = "Each variable is assessed separately by multiplying the variable's coefficients by -1 and then determining how much the model's performance changes. The worse the model's performance after negating coefficients for a given variable, the more important the variable. This technique is promising b/c it does not require permutation and it emphasizes variables with larger coefficients in linear combinations, but it is also relatively new and hasn't been studied as much as permutation importance. See [Jaeger, 2022](https://arxiv.org/abs/2208.01129) for more details on this technique.",
'negate' = "Each variable is assessed separately by multiplying the variable's coefficients by -1 and then determining how much the model's performance changes. The worse the model's performance after negating coefficients for a given variable, the more important the variable. This technique is promising b/c it does not require permutation and it emphasizes variables with larger coefficients in linear combinations, but it is also relatively new and hasn't been studied as much as permutation importance. See [Jaeger, 2023](https://doi.org/10.1080/10618600.2023.2231048) for more details on this technique.",
'permute' = "Each variable is assessed separately by randomly permuting the variable's values and then determining how much the model's performance changes. The worse the model's performance after permuting the values of a given variable, the more important the variable. This technique is flexible, intuitive, and frequently used. It also has several [known limitations](https://christophm.github.io/interpretable-ml-book/feature-importance.html#disadvantages-9)",
'anova' = "A p-value is computed for each coefficient in each linear combination of variables in each decision tree. Importance for an individual predictor variable is the proportion of times a p-value for its coefficient is < 0.01. This technique is very efficient computationally, but may not be as effective as permutation or negation in terms of selecting signal over noise variables. See [Menze, 2011](https://link.springer.com/chapter/10.1007/978-3-642-23783-6_29) for more details on this technique.")

Expand Down
6 changes: 3 additions & 3 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ knitr::include_graphics('man/figures/tree_axis_v_oblique.png')

## Examples

The `orsf()` function can fit several types of ORSF ensembles. My personal favorite is the accelerated ORSF because it has a great combination of prediction accuracy and computational efficiency (see [arXiv paper](https://arxiv.org/abs/2208.01129)).^2^
The `orsf()` function can fit several types of ORSF ensembles. My personal favorite is the accelerated ORSF because it has a great combination of prediction accuracy and computational efficiency (see [JCGS paper](https://doi.org/10.1080/10618600.2023.2231048)).^2^

```{r, child='Rmd/orsf-fit-accelerated.Rmd'}
Expand Down Expand Up @@ -152,7 +152,7 @@ For more on ICE, see the [vignette](https://docs.ropensci.org/aorsf/articles/pd.

## Comparison to existing software

Comparisons between `aorsf` and existing software are presented in our [arXiv paper](https://arxiv.org/abs/2208.01129). The paper
Comparisons between `aorsf` and existing software are presented in our [JCGS paper](https://doi.org/10.1080/10618600.2023.2231048). The paper:

- describes `aorsf` in detail with a summary of the procedures used in the tree fitting algorithm

Expand All @@ -173,7 +173,7 @@ A more hands-on comparison of `aorsf` and other R packages is provided in [orsf
cat("1. ", aorsf:::roxy_cite_jaeger_2019(), '\n\n')
cat("2. ", aorsf:::roxy_cite_jaeger_2022(), '\n\n')
cat("2. ", aorsf:::roxy_cite_jaeger_2023(), '\n\n')
cat("3. ", aorsf:::roxy_cite_menze_2011())
Expand Down
44 changes: 23 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,8 @@ separating the two classes.
The `orsf()` function can fit several types of ORSF ensembles. My
personal favorite is the accelerated ORSF because it has a great
combination of prediction accuracy and computational efficiency (see
[arXiv paper](https://arxiv.org/abs/2208.01129)).<sup>2</sup>
[JCGS
paper](https://doi.org/10.1080/10618600.2023.2231048)).<sup>2</sup>

``` r

Expand Down Expand Up @@ -144,20 +145,20 @@ using `aorsf`:
require permutation and it emphasizes variables with larger
coefficients in linear combinations, but it is also relatively new and
hasn’t been studied as much as permutation importance. See [Jaeger,
2022](https://arxiv.org/abs/2208.01129) for more details on this
technique.
2023](https://doi.org/10.1080/10618600.2023.2231048) for more details
on this technique.

``` r

orsf_vi_negate(fit)
#> bili sex copper ast age
#> 0.1190578208 0.0619364315 0.0290605798 0.0260108174 0.0251162396
#> 0.1190290560 0.0619448918 0.0290622719 0.0260108174 0.0251263919
#> stage protime edema ascites hepato
#> 0.0237810058 0.0158443269 0.0117270641 0.0105685230 0.0092028195
#> 0.0237725455 0.0158527871 0.0117258458 0.0105685230 0.0092045115
#> albumin chol trt alk.phos spiders
#> 0.0082647861 0.0041510636 0.0036548364 0.0010239241 -0.0003298163
#> 0.0082732463 0.0041510636 0.0036632967 0.0010256161 -0.0003298163
#> trig platelet
#> -0.0011111508 -0.0045314656
#> -0.0011060747 -0.0045517701
```

- **permutation**: Each variable is assessed separately by randomly
Expand All @@ -172,13 +173,13 @@ using `aorsf`:

orsf_vi_permute(fit)
#> bili copper ast age sex
#> 0.0514084384 0.0170611427 0.0142227933 0.0140274813 0.0131527430
#> 0.0514033622 0.0170611427 0.0142515581 0.0140224052 0.0131459748
#> stage protime ascites edema albumin
#> 0.0119752045 0.0102865556 0.0098067817 0.0081730899 0.0080568255
#> 0.0119768965 0.0102950158 0.0098067817 0.0081730899 0.0080652857
#> hepato chol alk.phos trig spiders
#> 0.0069734562 0.0032811220 0.0015862128 0.0014909643 0.0007811902
#> 0.0069734562 0.0032811220 0.0015862128 0.0014943484 0.0007825752
#> trt platelet
#> -0.0007067631 -0.0022135241
#> -0.0007067631 -0.0022338286
```

- **analysis of variance (ANOVA)**<sup>3</sup>: A p-value is computed
Expand Down Expand Up @@ -223,18 +224,18 @@ orsf_summarize_uni(fit, n_variables = 2)
#>
#> -- bili (VI Rank: 1) ----------------------------
#>
#> |----------------- risk -----------------|
#> |----------------- Risk -----------------|
#> Value Mean Median 25th % 75th %
#> 0.70 0.2074286 0.09039332 0.03827337 0.3146957
#> 1.3 0.2261739 0.10784929 0.04915971 0.3425934
#> 3.2 0.3071951 0.21242141 0.11889617 0.4358309
#> 0.70 0.2094827 0.09046313 0.03827429 0.3184979
#> 1.3 0.2283358 0.11078307 0.05347112 0.3492104
#> 3.2 0.3090977 0.21368937 0.11889617 0.4412656
#>
#> -- sex (VI Rank: 2) -----------------------------
#>
#> |----------------- risk -----------------|
#> |----------------- Risk -----------------|
#> Value Mean Median 25th % 75th %
#> m 0.3648659 0.2572239 0.15554270 0.5735661
#> f 0.2479179 0.1021787 0.04161796 0.3591612
#> m 0.3667488 0.2614335 0.15611841 0.5836574
#> f 0.2507675 0.1051310 0.04355687 0.3596206
#>
#> Predicted risk at time t = 1826.25 for top 2 predictors
```
Expand All @@ -255,7 +256,7 @@ For more on ICE, see the
## Comparison to existing software

Comparisons between `aorsf` and existing software are presented in our
[arXiv paper](https://arxiv.org/abs/2208.01129). The paper
[JCGS paper](https://doi.org/10.1080/10618600.2023.2231048). The paper:

- describes `aorsf` in detail with a summary of the procedures used in
the tree fitting algorithm
Expand Down Expand Up @@ -286,8 +287,9 @@ examples](https://docs.ropensci.org/aorsf/reference/orsf.html#tidymodels)

2. Jaeger BC, Welden S, Lenoir K, Speiser JL, Segar MW, Pandey A,
Pajewski NM. Accelerated and interpretable oblique random survival
forests. *arXiv e-prints* 2022 Aug; arXiv-2208. URL:
<https://arxiv.org/abs/2208.01129>
forests. *Journal of Computational and Graphical Statistics*
Published online 08 Aug 2023. URL:
<https://doi.org/10.1080/10618600.2023.2231048>

3. Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA. On oblique
random forests. *Joint European Conference on Machine Learning and
Expand Down
10 changes: 7 additions & 3 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
## Version 0.1.0

## R CMD check results

Duration: 4m 3.8s
Duration: 3m 53.1s

0 errors v | 0 warnings v | 0 notes v
❯ checking C++ specification ... NOTE
Specified C++14: please drop specification unless essential

R CMD check succeeded
0 errors ✔ | 0 warnings ✔ | 1 note ✖

I have specified C++14 for this release. C++14 is essential, as this release uses `std::make_unique`.

## Downstream dependencies

Expand Down
2 changes: 1 addition & 1 deletion man/aorsf-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit e883301

Please sign in to comment.