Skip to content

Commit

Permalink
replace Boston housing dataset with diabetes dataset since sklearn ha…
Browse files Browse the repository at this point in the history
…s deprecated the Boston housing dataset
  • Loading branch information
paulbkoch committed Jan 12, 2023
1 parent 0cb3ab8 commit df8cf74
Show file tree
Hide file tree
Showing 10 changed files with 33 additions and 39 deletions.
Binary file not shown.
Binary file not shown.
14 changes: 7 additions & 7 deletions examples/python/notebooks/EBM Feature Importances.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,12 @@
"outputs": [],
"source": [
"import pandas as pd\n",
"from sklearn.datasets import load_boston\n",
"from sklearn.datasets import load_diabetes\n",
"from interpret.glassbox import ExplainableBoostingRegressor\n",
"\n",
"boston = load_boston()\n",
"df = pd.DataFrame(boston.data, columns=boston.feature_names)\n",
"df[\"target\"] = boston.target\n",
"dataset = load_diabetes()\n",
"df = pd.DataFrame(dataset.data, columns=dataset.feature_names)\n",
"df[\"target\"] = dataset.target\n",
"\n",
"train_cols = df.columns[0:-1]\n",
"label = df.columns[-1]\n",
Expand Down Expand Up @@ -104,11 +104,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Going beyond overall term importances, because EBMs are additive models we can measure exactly how each term contributes to a prediction. Let's take a look at the graph of the term, `LSTAT`, by selecting it in the drop-down menu.\n",
"Going beyond overall term importances, because EBMs are additive models we can measure exactly how each term contributes to a prediction. Let's take a look at the graph of the term, `bp`, by selecting it in the drop-down menu.\n",
"\n",
"![Global Explanation - LSTAT](../assets/importance_notebook_global_lstat.png)\n",
"\n",
"The way to interpret this is that if a new datapoint came in with `LSTAT` = 5, the model adds about +2.7 to the final prediction. However, for a different datapoint with `LSTAT` = 10, the model would now add approx. -0.47 to the prediction.\n",
"The way to interpret this is that if a new datapoint came in with `bp` = 0.1, the model adds about +33.1 to the final prediction. However, for a different datapoint with `bp` = 0.13, the model would now add approx. +36.7 to the prediction.\n",
"\n",
"To make individual predictions, the model uses each term graph as a look up table, notes the contribution per term, and sums them together with the learned intercept to make a prediction. In regression, the intercept is the mean target (label) of the training set, and each term adds or subtracts to this mean. In classification, the intercept reflects the base rate of the positive class on a log scale. The gray above and below the graph shows the confidence of the model in that region of the graph."
]
Expand Down Expand Up @@ -146,7 +146,7 @@
"\n",
"![Local Explanation](../assets/importance_notebook_local_exp.png)\n",
"\n",
"The model prediction is 26.8. We can see that the intercept adds about +22.5, `LSTAT` adds ~+2.7, and `RAD` adds about -1.2. So far, for the top 3 contributing terms, we're at a cumulative prediction of ~+24. If we repeat this process for all the terms, we'll arrive exactly at the model prediction of 26.8."
"The model prediction is 188.50. We can see that the intercept adds about +151.9, `bp` subtracts about 0.02, and `age` adds about 0.04. If we repeat this process for all the terms, we'll arrive exactly at the model prediction of 188.50."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
"outputs": [],
"source": [
"import pandas as pd\n",
"from sklearn.datasets import load_boston\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"df = pd.read_csv(\n",
Expand Down Expand Up @@ -205,4 +204,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
10 changes: 5 additions & 5 deletions examples/python/notebooks/Explaining Blackbox Regressors.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@
"outputs": [],
"source": [
"import pandas as pd\n",
"from sklearn.datasets import load_boston\n",
"from sklearn.datasets import load_diabetes\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"boston = load_boston()\n",
"feature_names = list(boston.feature_names)\n",
"df = pd.DataFrame(boston.data, columns=feature_names)\n",
"df[\"target\"] = boston.target\n",
"dataset = load_diabetes()\n",
"feature_names = list(dataset.feature_names)\n",
"df = pd.DataFrame(dataset.data, columns=feature_names)\n",
"df[\"target\"] = dataset.target\n",
"# df = df.sample(frac=0.1, random_state=1)\n",
"train_cols = df.columns[0:-1]\n",
"label = df.columns[-1]\n",
Expand Down
12 changes: 6 additions & 6 deletions examples/python/notebooks/Interpretable Regression Methods.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@
"outputs": [],
"source": [
"import pandas as pd\n",
"from sklearn.datasets import load_boston\n",
"from sklearn.datasets import load_diabetes\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"boston = load_boston()\n",
"feature_names = list(boston.feature_names)\n",
"df = pd.DataFrame(boston.data, columns=feature_names)\n",
"df[\"target\"] = boston.target\n",
"dataset = load_diabetes()\n",
"feature_names = list(dataset.feature_names)\n",
"df = pd.DataFrame(dataset.data, columns=feature_names)\n",
"df[\"target\"] = dataset.target\n",
"# df = df.sample(frac=0.1, random_state=1)\n",
"train_cols = df.columns[0:-1]\n",
"label = df.columns[-1]\n",
Expand Down Expand Up @@ -234,4 +234,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@
# Distributed under the MIT software license

from ..decisiontree import ClassificationTree, RegressionTree
from sklearn.datasets import load_breast_cancer, load_boston
from sklearn.datasets import load_breast_cancer, load_diabetes
from sklearn.tree import DecisionTreeClassifier as SKDT
from sklearn.tree import DecisionTreeRegressor as SKRT
import numpy as np


def test_rt():
boston = load_boston()
X, y = boston.data, boston.target
feature_names = boston.feature_names
dataset = load_diabetes()
X, y = dataset.data, dataset.target
feature_names = dataset.feature_names

sk_dt = SKRT(random_state=1, max_depth=3)
our_dt = RegressionTree(feature_names=feature_names, random_state=1)
Expand Down
8 changes: 4 additions & 4 deletions python/interpret-core/interpret/glassbox/test/test_linear.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@
# Distributed under the MIT software license

from ..linear import LogisticRegression, LinearRegression
from sklearn.datasets import load_breast_cancer, load_boston
from sklearn.datasets import load_breast_cancer, load_diabetes
from sklearn.linear_model import LogisticRegression as SKLogistic
from sklearn.linear_model import Lasso as SKLinear
import numpy as np


def test_linear_regression():
boston = load_boston()
X, y = boston.data, boston.target
feature_names = boston.feature_names
dataset = load_diabetes()
X, y = dataset.data, dataset.target
feature_names = dataset.feature_names

sk_lr = SKLinear(random_state=1)
our_lr = LinearRegression(feature_names=feature_names, random_state=1)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@


from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import load_boston
from sklearn.datasets import load_diabetes
from ..treeinterpreter import TreeInterpreter

import pytest
Expand All @@ -17,13 +17,13 @@ def test_that_tree_works():
# http://blog.datadive.net/random-forest-interpretation-with-scikit-learn/

# Fit tree
boston = load_boston()
dataset = load_diabetes()
rf = RandomForestRegressor()
X, y = boston.data[:300], boston.target[:300]
feature_names = boston.feature_names
X, y = dataset.data[:300], dataset.target[:300]
feature_names = dataset.feature_names

X_new = boston.data[[300, 309]]
y_new = boston.target[[300, 309]]
X_new = dataset.data[[300, 309]]
y_new = dataset.target[[300, 309]]
rf.fit(X, y)

# Build expected local explanation
Expand Down
5 changes: 0 additions & 5 deletions python/interpret-core/interpret/test/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,11 +95,6 @@ def _synthetic(mode="regression"):

return dataset


def boston_regression():
return None


def iris_classification():
from sklearn.datasets import load_iris

Expand Down

0 comments on commit df8cf74

Please sign in to comment.