-
Notifications
You must be signed in to change notification settings - Fork 0
/
examples.Rmd
executable file
·213 lines (132 loc) · 8.43 KB
/
examples.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
---
title: "Getting Started"
output:
html_document:
toc: true
toc_float:
collapsed: false
smooth_scroll: true
---
```{r echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
# Use vertical split by default in this Rmd document
knit_print.htmlwidget <- function(x, ...) {
# Get the chunk height
height <- knitr::opts_current$get("height")
if (length(height) > 0 && height != FALSE)
x$height <- height
else
x$height <- "450px"
htmlwidgets:::knit_print.htmlwidget(x, ...)
}
```
Currently, the `autoCaret` package is not hosted on CRAN, but can be obtained from GitHub. To do so, first make sure you have [devtools](https://cran.r-project.org/web/packages/devtools/index.html) installed.
``` {r, eval=FALSE}
install.packages("devtools")
```
To install the first _preview release_ of `autoCaret` from GitHub:
```{r eval=FALSE}
devtools::install_github("gregce/autoCaret")
```
Once the `autoCaret` package is installed, you may access its functionality as you would any other package by calling:
``` {r, eval=FALSE}
library("autoCaret")
```
Additionally, the RStudio IDE includes integrated support for using autoCaret as an add-in. These features are available in the current [Release of RStudio](https://www.rstudio.com/products/rstudio/download/).
## Using `autoModel`
We begin by loading the `mlbench` package and some example data `Sonar` which is commonly used to binary classification problems. In this example, we will attempt to distinguish Mines (M) from Rocks (R) using binary classification with an initial dataset of where N=208 and P=60.
As a general rule, when using `autoCaret::autoModel` defaults, datasets less than 100 mb should yield optimal performance and likely avoid extremely long run times & high memory requirements.
```{r, eval=TRUE}
library(mlbench)
library(autoCaret)
# Load the data into Memory from the mlbench package
data("Sonar")
# Summarize Sonar's Size
dim(Sonar)
# Check out Variable Names
names(Sonar)
```
Having both the data loaded and having inspected it, we can now make use of the `autoCaret::autoModel()` function.
In this example, we'd like to try and distinguish Rocks (R) from Mines (R), so we will attempt to predict the `Class` variable in the Sonar dataframe.
Using it's defaults, `autoModel` has 2 arguments we need to specify: `df` and `y`.
`df` is the Dataframe that we'd like to use build a binary classification model, while `y` is our classification target or response variable. We can use a non-exported package function, `autoCaret:::checkBinaryTrait` to determine if our `y` variable is indeed binary. The `autoModel` functionality will perform this for us as well.
```{r, eval=TRUE, cache=TRUE, warning=FALSE, message=FALSE}
# Manually check that our intended y paramter is indeed binary
autoCaret:::checkBinaryTrait(Sonar$Class)
# Generate an autoCaret object using the autoModel function
mod <- autoCaret::autoModel(df = Sonar, y = Class, progressBar = FALSE)
```
<br>
## Exploring an `autoCaret` object
In the example above, the returned object, `mod`, is an `autoCaret` object containing 16 objects. To confirm, we can run the below two commmands:
```{r eval=TRUE, warning=FALSE, message=FALSE}
# Check class of autoCaret object
class(mod)
# High level
nrow(summary.default(mod))
```
Running the summary function on our model output displays a wealth of information about the contents of the object as well as the procedural steps taken during modeling. In our example, we observe:
- that our initial dataset of 208 observation was split into a training and test set containing 167 and 41 observations
- Modeling took .64 minutes and entailed resampling our dataset 10 times
- We used the four default models to create an ensemble.
- Using the ensemble model that was generated to predict on the test set yield predictions with 92% accuracy.
```{r eval=TRUE, warning=FALSE, message=FALSE}
# Use the summary generic to store a summary of autoCaret object
overview <- summary(mod)
```
We can also access each of the object variables included in the above displayed summary output via the object itself.
```{r eval=TRUE, warning=FALSE, message=FALSE}
# Print the overview to the console
overview
```
<br>
## Predicting new data
So now that we have a sense of how successful our auto modeling approach was, we'd likely want to use the model, `mod`, we built previously to make predictions on new data we receive.
Because this is an illustrative example, we'lll take a shortcut by just resampling the same data that we used to train on. The main point here is that you can simply pass your `autoCaret` model object, `mod`, into the `predict()` function along with new observations to generate predictions.
```{r eval=TRUE, warning=FALSE, message=FALSE}
#For the sake of example, simulate new data by resampling our original data frame
new <- Sonar[sample(1:nrow(Sonar), 50, replace=TRUE),]
#Make predicitons
preds <- predict(mod, new)
#Print Predictions
preds
```
How well did we do? Well a confusion matrix from the `caret` package can tell us!
- We only mispredicted one example, for overall accuracy of .98
- **Note:** we wouldn't expect this level of accuracy using real data given we resampled from our original training set.
```{r eval=TRUE, warning=FALSE, message=FALSE}
## How well did we do?
caret::confusionMatrix(data = preds, reference = new$Class)
```
<br>
## Using the `autoCaret` add-in
As an alternative option to the CLI, we provide an intuitive GUI in the form an [RStudio Addin](https://rstudio.github.io/rstudioaddins/).
The add-in may be accessed in two ways:
- Via the Addin menu following loading `autoCaret` via `library(autoCaret)` in [a recent release of Rstudio]( https://www.rstudio.com/products/rstudio/download/)
- By running the following non-exported function autoCaret:::autoCaretUI().
### Setup Screen
When launched, you'll be presented with the following screen.
<img src="images/examples/initial_screen.png" id="overview-image" width=550 height=400 align="middle"/>
From here you may either:
- Upload a file containing your dataset
- Select the dataset you wish to model from your local `R` environment
**Note:** There will be nothing available to list if you have yet to load a `data.frame` in your local Global environment.
To illustrate the GUI's use, let's use the same `Sonar` dataset from the above examples:
<img src="images/examples/sonar_selected.png" id="overview-image" width=550 height=400 align="middle"/>
With the Sonar data.frame selected, we'll be presented with a preview of our data to visually inspect. Once we select the appropriate variable to predict, clicking `Run autoCaret` will launch the underlying process.
When modeling completes, the other tabs in the Addin are enabled and you may select them to view their contents.
### Model Summary
The model summary screen provides an interactive way for you to review the resultant model object created when running `autoCaret`.
<img src="images/examples/model_summary.png" id="overview-image" width=550 height=400 align="middle"/>
Clicking on the left hand side results visualization provides a layer of interactivity. This is a useful for comparing the results of various models tried during the training phase and for gaining a better intuitive sense of the strengths and weaknesses of your models.
### Predicting new data
Like in the section above, we use the `new` data.frame to simulate predictions.
```{r eval=TRUE, warning=FALSE, message=FALSE}
#For the sake of example, simulate new data by resampling our original data frame
new <- Sonar[sample(1:nrow(Sonar), 50, replace=TRUE),]
```
Having it pre-exist in our Global Environment, we can select it from dropbown on the `Use your model` tab. Once we compute predictions, they will be saved in our Environment and named _autoModelPredictionResult_.
<img src="images/examples/predict_new.png" id="overview-image" width=550 height=400 align="middle"/>
### Learning more
Finally, during the modelling process, `autoCaret` runs a variety of diagnostics that help validate the underlying distribution and features of your data prior to pre-processing. Depending on what we find, we surface these as qualitative suggestions on the `learning more tab`. Feel free to review and incorporate these suggestions in the data cleaning step as you build <a href="https://en.wikipedia.org/wiki/Binary_classification" target="_top">binary classifiers</a> with `autoCaret`.
<img src="images/examples/learn_more.png" id="overview-image" width=550 height=400 align="middle"/>