diff --git a/06-fitting-models.Rmd b/06-fitting-models.Rmd index afd21d1..54557d3 100644 --- a/06-fitting-models.Rmd +++ b/06-fitting-models.Rmd @@ -21,7 +21,7 @@ Specifically, we will focus on how to `fit()` and `predict()` directly with a `r Once the data have been encoded in a format ready for a modeling algorithm, such as a numeric matrix, they can be used in the model building process. -Suppose that a linear regression model was our initial choice. This is equivalent to specifying that the outcome data is numeric and that the predictors are related to the outcome in terms of simple slopes and intercepts: +Suppose that a linear regression model was our initial choice. This is equivalent to specifying that the outcome data are numeric and that the predictors are related to the outcome in terms of simple slopes and intercepts: $$y_i = \beta_0 + \beta_1 x_{1i} + \ldots + \beta_p x_{pi}$$ @@ -121,7 +121,7 @@ lm_form_fit lm_xy_fit ``` -[^fitxy]: What are the differences between `fit()` and `fit_xy()`? The `fit_xy()` function always passes the data as is to the underlying model function. It will not create dummy/indicator variables before doing so. When `fit()` is used with a model specification, this almost always means that dummy variables will be created from qualitative predictors. If the underlying function requires a matrix (like glmnet), it will make the matrix. However, if the underlying function uses a formula, `fit()` just passes the formula to that function. We estimate that 99% of modeling functions using formulas make dummy variables. The other 1% include tree-based methods that do not require purely numeric predictors. See Section \@ref(workflow-encoding) for more about using formulas in tidymodels. +[^fitxy]: What are the differences between `fit()` and `fit_xy()`? The `fit_xy()` function always passes the data as they are to the underlying model function. It will not create dummy/indicator variables before doing so. When `fit()` is used with a model specification, this almost always means that dummy variables will be created from qualitative predictors. If the underlying function requires a matrix (like glmnet), it will make the matrix. However, if the underlying function uses a formula, `fit()` just passes the formula to that function. We estimate that 99% of modeling functions using formulas make dummy variables. The other 1% include tree-based methods that do not require purely numeric predictors. See Section \@ref(workflow-encoding) for more about using formulas in tidymodels. Not only does `r pkg(parsnip)` enable a consistent model interface for different packages, it also provides consistency in the model arguments. It is common for different functions that fit the same model to have different argument names. Random forest model functions are a good example. Three commonly used arguments are the number of trees in the ensemble, the number of predictors to randomly sample with each split within a tree, and the number of data points required to make a split. For three different R packages implementing this algorithm, those arguments are shown in Table \@ref(tab:rand-forest-args).