Skip to content

Commit

Permalink
Update Description and add image alt text (#322)
Browse files Browse the repository at this point in the history
* use ekables

* add email to description as well as new website

* try-dark

* copy from pkgdown site

* copy from pkgdown site

* alt text and bump date

* spell and rebuild

* where does lib come from anyways

* try rich google results

* rebuild

* new

* rename

* greedy

* ok

* rebuild
  • Loading branch information
zachmayer authored Aug 13, 2024
1 parent 280eab5 commit 4b1dc87
Show file tree
Hide file tree
Showing 15 changed files with 68 additions and 28 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ check_package.R
^docs$
^pkgdown$
^README\.Rmd$
lib/
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@ Package: caretEnsemble
Type: Package
Title: Ensembles of Caret Models
Version: 4.0.0
Date: 2024-08-12
Date: 2024-08-13
Authors@R: c(person(c("Zachary", "A."), "Deane-Mayer", role = c("aut", "cre", "cph"), email = "[email protected]"),
person(c("Jared", "E.", "Knowles"), role="ctb", email="[email protected]"),
person("Antón", "López", role="ctb", email="[email protected]")
person("Antón", "López", role="ctb", email="[email protected]")
)
URL: https://github.com/zachmayer/caretEnsemble
URL: http://zachmayer.github.io/caretEnsemble/, https://github.com/zachmayer/caretEnsemble
BugReports: https://github.com/zachmayer/caretEnsemble/issues
Description: Functions for creating ensembles of caret models: caretList()
and caretStack(). caretList() is a convenience function for fitting multiple
Expand Down
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ check:

.PHONY: check-win
check-win:
rm -rf lib/
Rscript -e "devtools:::check_win()"

.PHONY: fix-style
Expand Down Expand Up @@ -150,7 +151,7 @@ dev-guide:
clean:
rm -rf *.Rcheck
rm -f *.tar.gz
rm -f man/*.Rd
rm -rf man/
rm -f README.md
rm -f coverage.rds
rm -f cobertura.xml
Expand Down
40 changes: 24 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,10 @@ print(summary(models))
#> The following models were ensembled: rf, glmnet
#>
#> Model accuracy:
#> model_name metric value sd
#> <char> <char> <num> <num>
#> 1: rf RMSE 1127.974 83.50596
#> 2: glmnet RMSE 1138.137 202.80472
#> model_name metric value sd
#> <char> <char> <num> <num>
#> 1: rf RMSE 1168.108 422.9558
#> 2: glmnet RMSE 1152.298 138.8310
```

Then, use caretEnsemble to make a greedy ensemble of these models
Expand All @@ -67,23 +67,20 @@ print(greedy_stack)
#> Summary of sample sizes: 400, 400, 400, 400, 400
#> Resampling results:
#>
#> RMSE Rsquared MAE
#> 1003.391 0.9357974 577.2895
#> RMSE Rsquared MAE
#> 1096.53 0.933482 631.215
#>
#> Tuning parameter 'max_iter' was held constant at a value of 100
#>
#> Final model:
#> Greedy MSE
#> RMSE: 1010.776
#> RMSE: 1067.635
#> Weights:
#> [,1]
#> rf 0.52
#> glmnet 0.48
ggplot2::autoplot(greedy_stack, training_data = dat, xvars = c("carat", "table"))
#> rf 0.43
#> glmnet 0.57
```

<img src="man/figures/README-unnamed-chunk-3-1.png" width="100%" />

You can also use caretStack to make a non-linear ensemble

``` r
Expand All @@ -101,7 +98,7 @@ print(rf_stack)
#> Resampling results:
#>
#> RMSE Rsquared MAE
#> 894.1208 0.9518165 455.1419
#> 1005.629 0.9423501 490.6683
#>
#> Tuning parameter 'mtry' was held constant at a value of 2
#>
Expand All @@ -113,12 +110,23 @@ print(rf_stack)
#> Number of trees: 500
#> No. of variables tried at each split: 2
#>
#> Mean of squared residuals: 800137.3
#> % Var explained: 94.96
#> Mean of squared residuals: 1007817
#> % Var explained: 94.29
```

Use autoplot from ggplot2 to plot ensemble diagnostics:

``` r
ggplot2::autoplot(greedy_stack, training_data = dat, xvars = c("carat", "table"))
```

<img src="man/figures/README-greedy-stack-6-plot-1.png" alt="6 panel plot of an ensemble of models fit to the diamonds dataset. The RF model is the best and has the highest weight. The residual plots look good. RMSE is about `r round(min(greedy_stack$ens_model$results$RMSE))`." width="100%" />

``` r
ggplot2::autoplot(rf_stack, training_data = dat, xvars = c("carat", "table"))
```

<img src="man/figures/README-unnamed-chunk-4-1.png" width="100%" />
<img src="man/figures/README-unnamed-chunk-5-1.png" alt="6 panel plot of an ensemble of models fit to the diamonds dataset. The RF model is the best and has the highest weight. The residual plots look good. RMSE is about `r round(min(rf_stack$ens_model$results$RMSE))`." width="100%" />

# Installation

Expand Down
9 changes: 8 additions & 1 deletion README.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,20 @@ Then, use caretEnsemble to make a greedy ensemble of these models
```{r}
greedy_stack <- caretEnsemble::caretEnsemble(models)
print(greedy_stack)
ggplot2::autoplot(greedy_stack, training_data = dat, xvars = c("carat", "table"))
```

You can also use caretStack to make a non-linear ensemble
```{r}
rf_stack <- caretEnsemble::caretStack(models, method = "rf")
print(rf_stack)
```

Use autoplot from ggplot2 to plot ensemble diagnostics:
```{r greedy-stack-6-plot, fig.alt="6 panel plot of an ensemble of models fit to the diamonds dataset. The RF model is the best and has the highest weight. The residual plots look good. RMSE is about `r round(min(greedy_stack$ens_model$results$RMSE))`."}
ggplot2::autoplot(greedy_stack, training_data = dat, xvars = c("carat", "table"))
```
```{r, fig.alt="6 panel plot of an ensemble of models fit to the diamonds dataset. The RF model is the best and has the highest weight. The residual plots look good. RMSE is about `r round(min(rf_stack$ens_model$results$RMSE))`."}
ggplot2::autoplot(rf_stack, training_data = dat, xvars = c("carat", "table"))
```

Expand Down
10 changes: 10 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
url: http://zachmayer.github.io/caretEnsemble/
template:
bootstrap: 5
light-switch: true
bslib:
primary: "#0054AD"
border-radius: 0.5rem
btn-border-radius: 0.25rem
danger: "#A6081A"
opengraph:
image:
src: man/figures/README-greedy-stack-6-plot.png
alt: "6 panel plot of an ensemble of models fit to the diamonds dataset. The RF model is the best and has the highest weight. The residual plots look good."

3 changes: 3 additions & 0 deletions inst/WORDLIST
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ CMD
CodeFactor
codebase
coercible
dat
Deane
defaultControl
dev
Expand Down Expand Up @@ -51,12 +52,14 @@ prob
probs
readme
resid
rf
roxygen
rpart
's
savePredictions
scalability
scikit
setosa
SDs
trainControl
travis
Expand Down
3 changes: 2 additions & 1 deletion man/caretEnsemble.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file added man/figures/README-greedy-stack-6-plot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed man/figures/README-unnamed-chunk-2-1.png
Binary file not shown.
Binary file removed man/figures/README-unnamed-chunk-3-1.png
Binary file not shown.
Binary file removed man/figures/README-unnamed-chunk-4-1.png
Binary file not shown.
Binary file added man/figures/README-unnamed-chunk-5-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 11 additions & 3 deletions vignettes/Version-4.0-New-Features.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Version-4.0-New-Features"
title: "Version 4.0 New Features"
author: "Zach Deane-Mayer"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
Expand Down Expand Up @@ -45,7 +45,13 @@ caretStack (and by extension, caretEnsemble) now supports various S3 methods:
```{r}
print(ens)
print(summary(ens))
```

```{r, fig.alt="A dot and whisker plot of ROC for glmnet, rpart, and an ensemble. The ensemble has the highest ROC and is slighly better than the glmnet. The rpart model is bad."}
plot(ens)
```

```{r, fig.alt="A 4-panel plot for glmnet, rpart, and an ensemble. The ensemble has the highest ROC and is slighly better than the glmnet. The rpart model is bad. The glmnet has the highest weight, and the residuals look biased."}
ggplot2::autoplot(ens)
```

Expand Down Expand Up @@ -127,14 +133,16 @@ print(transfer_ens)
We can also predict on new data:
```{r}
preds <- predict(transfer_ens, newdata = head(new_data))
print(preds)
knitr::kable(preds, format = "markdown")
```

# Permutation Importance
Permutation importance is now the default method for variable importance in caretLists and caretStacks:
```{r}
importance <- caret::varImp(transfer_ens)
print(importance)
print(round(importance, 2L))
```

Note that the ensemble uses rpart to classify the easy class (setosa) and then uses the rf to distinguish between the 2 more difficult classes.

This completes our demonstration of the key new features in caretEnsemble 4.0. These enhancements provide greater flexibility, improved performance, and easier usage for ensemble modeling in R.
7 changes: 4 additions & 3 deletions vignettes/caretEnsemble-intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ print(summary(model_list))
We can use the `predict` function to extract predictions from this object for new data:
```{r}
p <- predict(model_list, newdata = head(testing))
print(p)
knitr::kable(p, format = "markdown")
```

If you desire more control over the model fit, use the `caretModelSpec` to construct a list of model specifications for the `tuneList` argument. This argument can be used to fit several different variants of the same model, and can also be used to pass arguments through `train` down to the component functions (e.g. `trace=FALSE` for `nnet`):
Expand All @@ -71,7 +71,7 @@ Finally, you should note that `caretList` does not support custom caret models.

## caretEnsemble
`caretList` is the preferred way to construct list of caret models in this package, as it will ensure the resampling indexes are identical across all models. Lets take a closer look at our list of models:
```{r}
```{r, fig.alt="X/Y scatter plot of rpart vs glmnet AUCs on the Sonar dataset. The glmnet model is better for 4 out of 5 resamples."}
lattice::xyplot(caret::resamples(model_list))
```

Expand Down Expand Up @@ -100,7 +100,8 @@ The ensemble has an AUC on the training set resamples of `r round(auc[1, 'ensemb

Note that the levels for the Sonar Data are "M" and "R", where M is level 1 and R is level 2. "M" stands for "metal cylinder" and "R" stands for rock. M is the positive class, so we exclude class 2L from our predictions. You can set excluded_class_id = 0L
```{r}
predict(greedy_ensemble, newdata = head(testing), excluded_class_id = 0L)
p <- predict(greedy_ensemble, newdata = head(testing), excluded_class_id = 0L)
knitr::kable(p, format = "markdown")
```

We can also use varImp to extract the variable importances from each member of the ensemble, as well as the final ensemble model:
Expand Down

0 comments on commit 4b1dc87

Please sign in to comment.