Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

20 changes: 10 additions & 10 deletions 03_generalization_and_cv/01-cross_validation_and_metrics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"\n",
"## Table of contents\n",
"\n",
"* [1 The benefits of cross-calidation](#benefitscv)\n",
"* [1 The benefits of cross-validation](#benefitscv)\n",
" * [1.1 Load our dataset](#benefitscv_load)\n",
" * [1.2 Empirical error vs generalization error](#benefitscv_empirical)\n",
" * [1.3 A single error is not enough... what about the variance?](#benefitscv_single)\n",
Expand All @@ -23,7 +23,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"A core question in machine learning is how to evaluate the performance of a model once it's parameters are estimated (i.e. the model has been trained). In this notebook, we aim at presenting how you should answer this question in a statistically sound way. First, we will present the benefits of using cross-validation for this task and then have a quick look at different strategies and metrics that one should use in supervised learning."
"A core question in machine learning is how to evaluate the performance of a model once its parameters are estimated (i.e. the model has been trained). In this notebook, we aim at presenting how you should answer this question in a statistically sound way. First, we will present the benefits of using cross-validation for this task and then have a quick look at different strategies and metrics that one should use in supervised learning."
]
},
{
Expand Down Expand Up @@ -465,11 +465,11 @@
" <h3>Generalization error:</h3>\n",
" The aim of model training is to select the model $f$ out of a class of models $\\mathcal F$ that minimizes a measure of the risk. The risk is measured with a loss $l$ between the true value $y$ associated to $x$ and the prediction $f(x)$ and thus we want to find: \n",
" $$\n",
" f^\\star = \\arg\\min_{f \\in \\mathcal F}\\mathbb E_{(x, y) \\sim \\pi}[l(f(x), y]\n",
" f^\\star = \\arg\\min_{f \\in \\mathcal F}\\mathbb E_{(x, y) \\sim \\pi}[l(f(x), y)]\n",
" $$ \n",
" The issue is that we cannot compute the expectation $\\mathbb E_{(x, y) \\sim \\pi}$ because we don't know the input distribution $\\pi$. Therefore, we approximate it with a set of examples $\\{(x_1, y_1), \\dots (x_N, y_N)\\}$ drawn <i>i.i.d.</i> from $\\pi$ and use the <b>Empirical Risk Minimization</b> (ERM):\n",
" $$\n",
" \\widehat{f} = \\arg\\min_{f \\in \\mathcal F}\\frac1N\\sum_{i=1}^Nl(f(x_i), y_i]\n",
" \\widehat{f} = \\arg\\min_{f \\in \\mathcal F}\\frac1N\\sum_{i=1}^Nl(f(x_i), y_i)\n",
" $$\n",
" If the samples are drawn independently, we know that the error has a variance of $\\mathcal O\\left(\\frac{1}{\\sqrt{N}}\\right)$. Thus there is a gap between the minimizer of the risk and the minimizer of the empirical risk. If we optimize too much for the ERM, the gap might be big and the selected model will have bad performance on unseen data. This is what is called <b>overfitting</b>. To control, this, one need to have a measure of the risk independent from the measure of the risk which is used to select the model: the Empirical Risk on the test set!\n",
"</div>\n"
Expand All @@ -486,14 +486,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"While we were able to estimate the generalization error, we are indeed unable to know anything about the variance of our model and thus if it is robust or not. This is where the framework of cross-validation is used. Indeed, we can repeat our experiment and compute several time our generalization error and get intuition about the stability of our model."
"While we were able to estimate the generalization error, we are indeed unable to know anything about the variance of our model and thus if it is robust or not. This is where the framework of cross-validation is used. Indeed, we can repeat our experiment and compute several times our generalization error and get intuition about the stability of our model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The simplest way that we can think of is to shuffle our data and split it into two sets as we previously did and repeat several time our experiment. In scikit-learn, using the function `cross_validate` with the cross-validation `ShuffleSplit` allows us to make such evaluation."
"The simplest way that we can think of is to shuffle our data and split it into two sets as we previously did and repeat several times our experiment. In scikit-learn, using the function `cross_validate` with the cross-validation `ShuffleSplit` allows us to make such evaluation."
]
},
{
Expand Down Expand Up @@ -902,7 +902,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that the median value range from 50 k\\\\$ up to 500 k\\\\$. Thus an error range of 3 k\\\\$ means that our cross-validation results can be trusted and do not suffer from an execessive variance. Regarding the performance of our model itself, we can see that making an error of 45 k\\\\$ would be problematic even more if this happen for housing with low value. However, we also see some limitation regarding the metric that we are using. Making an error of 45 k\\\\$ for a target at 50 k\\\\$ and at 500 k\\\\$ should not have the same impact. We should instead use the mean absolute percentage error which will give a relative error."
"We see that the median value range from 50 k\\\\$ up to 500 k\\\\$. Thus, an error range of 3 k\\\\$ means that our cross-validation results can be trusted and do not suffer from an excessive variance. Regarding the performance of our model itself, we can see that making an error of 45 k\\\\$ would be problematic even more if this happens for housing with low value. However, we also see some limitation regarding the metric that we are using. Making an error of 45 k\\\\$ for a target at 50 k\\\\$ and at 500 k\\\\$ should not have the same impact. We should instead use the mean absolute percentage error which will give a relative error."
]
},
{
Expand Down Expand Up @@ -1077,7 +1077,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that with a low number of samples, the variance is much larger. Indeed, for low number of sample, we cannot even trust our cross-validation and therefore cannot conclude anything about our regressor. Therefore, it is really important to make experiment with a large enough sample size to be sure about the conclusions which would be drawn."
"We see that with a low number of samples, the variance is much larger. Indeed, for low number of samples, we cannot even trust our cross-validation and therefore cannot conclude anything about our regressor. Therefore, it is really important to make experiment with a large enough sample size to be sure about the conclusions which would be drawn."
]
},
{
Expand Down Expand Up @@ -1127,7 +1127,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We plot the generalization errors for each of the experiment. We see that even our regressor does not perform well, it is far above chances our a regressor that would predict the mean target."
"We plot the generalization errors for each of the experiment. We see that even if our regressor does not perform well, it is far above chances our a regressor that would predict the mean target."
]
},
{
Expand Down Expand Up @@ -1193,7 +1193,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take an example of some financial quotes. These are the value of compagny stocks with the time."
"Let's take an example of some financial quotes. These are the value of company's stocks with the time."
]
},
{
Expand Down
174 changes: 101 additions & 73 deletions 04_metrics/01-evaluation_metrics_regression.ipynb

Large diffs are not rendered by default.

Loading