Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/development' into unit_tests_f…
Browse files Browse the repository at this point in the history
…or_input_check
  • Loading branch information
Weixuan Fu committed Mar 22, 2017
2 parents f48b31c + 0920fc9 commit b49f1ac
Show file tree
Hide file tree
Showing 4 changed files with 165 additions and 54 deletions.
2 changes: 1 addition & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -245,5 +245,5 @@

<!--
MkDocs version : 0.16.0
Build Date UTC : 2017-03-22 15:56:01
Build Date UTC : 2017-03-22 18:39:33
-->
13 changes: 9 additions & 4 deletions docs/mkdocs/search_index.json

Large diffs are not rendered by default.

123 changes: 85 additions & 38 deletions docs/using/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@

<li><a class="toctree-l4" href="#scoring-functions">Scoring functions</a></li>

<li><a class="toctree-l4" href="#customizing-tpots-operators-and-parameters">Customizing TPOT's operators and parameters</a></li>


</ul>

Expand Down Expand Up @@ -285,22 +287,7 @@ <h1 id="tpot-on-the-command-line">TPOT on the command line</h1>
<td>-config</td>
<td>CONFIG_FILE</td>
<td>String path to a file</td>
<td>Configuration file for customizing the operators and parameters that TPOT uses in the optimization process. For example, the configuration file's format could be like:
<pre lang="nemerle">
classifier_config_dict = {
'sklearn.naive_bayes.GaussianNB': {
},
'sklearn.naive_bayes.BernoulliNB': {
'alpha': [1e-3, 1e-2, 1e-1, 1., 10., 100.],
'fit_prior': [True, False]
},
'sklearn.naive_bayes.MultinomialNB': {
'alpha': [1e-3, 1e-2, 1e-1, 1., 10., 100.],
'fit_prior': [True, False]
}
}
</pre>
</td>
<td>Configuration file for customizing the operators and parameters that TPOT uses in the optimization process. See the <a href="#customconfig">custom configuration</a> section for more information and examples.</td>
</tr>
<tr>
<td>-v</td>
Expand Down Expand Up @@ -407,20 +394,7 @@ <h1 id="tpot-with-code">TPOT with code</h1>
<tr>
<td>config_dict</td>
<td>Python dictionary</td>
<td>Configuration dictionary for customizing the operators and parameters that TPOT uses in the optimization process. For example:
<pre lang="nemerle">
classifier_config_dict = {
'sklearn.naive_bayes.GaussianNB': {
},
'sklearn.naive_bayes.BernoulliNB': {
'alpha': [1e-3, 1e-2, 1e-1, 1., 10., 100.],
'fit_prior': [True, False]
},
'sklearn.naive_bayes.MultinomialNB': {
'alpha': [1e-3, 1e-2, 1e-1, 1., 10., 100.],
'fit_prior': [True, False]
}
}
<td>Configuration dictionary for customizing the operators and parameters that TPOT uses in the optimization process. See the <a href="#customconfig">custom configuration</a> section for more information and examples.
</pre>
</td>
</tr>
Expand All @@ -444,29 +418,33 @@ <h1 id="tpot-with-code">TPOT with code</h1>
<p>Some example code with custom TPOT parameters might look like:</p>
<pre><code class="Python">from tpot import TPOTClassifier

pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5, random_state=42, verbosity=2)
pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
random_state=42, verbosity=2)
</code></pre>

<p>Now TPOT is ready to optimize a pipeline for you. You can tell TPOT to optimize a pipeline based on a data set with the <code>fit</code> function:</p>
<pre><code class="Python">from tpot import TPOTClassifier

pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5, random_state=42, verbosity=2)
pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
random_state=42, verbosity=2)
pipeline_optimizer.fit(training_features, training_classes)
</code></pre>

<p>The <code>fit()</code> function takes in a training data set and uses k-fold cross-validation when evaluating pipelines. It then initializes the genetic programming algoritm to find the best pipeline based on average k-fold score.</p>
<p>You can then proceed to evaluate the final pipeline on the testing set with the <code>score()</code> function:</p>
<pre><code class="Python">from tpot import TPOTClassifier

pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5, random_state=42, verbosity=2)
pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
random_state=42, verbosity=2)
pipeline_optimizer.fit(training_features, training_classes)
print(pipeline_optimizer.score(testing_features, testing_classes))
</code></pre>

<p>Finally, you can tell TPOT to export the corresponding Python code for the optimized pipeline to a text file with the <code>export()</code> function:</p>
<pre><code class="Python">from tpot import TPOTClassifier

pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5, random_state=42, verbosity=2)
pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
random_state=42, verbosity=2)
pipeline_optimizer.fit(training_features, training_classes)
print(pipeline_optimizer.score(testing_features, testing_classes))
pipeline_optimizer.export('tpot_exported_pipeline.py')
Expand All @@ -476,18 +454,87 @@ <h1 id="tpot-with-code">TPOT with code</h1>
<p>Check our <a href="../examples/MNIST_Example/">examples</a> to see TPOT applied to some specific data sets.</p>
<p><a name="scoringfunctions"></a></p>
<h2 id="scoring-functions">Scoring functions</h2>
<p>TPOT makes use of <code>sklearn.model_selection.cross_val_score</code>, and as such offers the same support for scoring functions. There are two ways to make use of scoring functions with TPOT:</p>
<p>TPOT makes use of <code>sklearn.model_selection.cross_val_score</code> for evaluating pipelines, and as such offers the same support for scoring functions. There are two ways to make use of scoring functions with TPOT:</p>
<ol>
<li>
<p>You can pass in a string from the list described in the table above. Any other strings will cause internal issues that may break your code down the line.</p>
<p>You can pass in a string to the <code>scoring</code> parameter from the list above. Any other strings will cause TPOT to throw an exception.</p>
</li>
<li>
<p>You can pass in a function with the signature <code>scorer(y_true, y_pred)</code>, where <code>y_true</code> are the true target values and <code>y_pred</code> are the predicted target values from an estimator. To do this, you should implement your own function. See the example below for further explanation.</p>
<p>You can pass a function with the signature <code>scorer(y_true, y_pred)</code>, where <code>y_true</code> are the true target values and <code>y_pred</code> are the predicted target values from an estimator. To do this, you should implement your own function. See the example below for further explanation.</p>
</li>
</ol>
<pre><code class="Python">def accuracy(y_true, y_pred):
<pre><code class="Python">from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
train_size=0.75, test_size=0.25)

def accuracy(y_true, y_pred):
return float(sum(y_pred == y_true)) / len(y_true)

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2,
scoring=accuracy)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')
</code></pre>

<p><a name="customconfig"></a></p>
<h2 id="customizing-tpots-operators-and-parameters">Customizing TPOT's operators and parameters</h2>
<p>TPOT comes with a handful of default operators and parameter configurations that we believe work well for optimizing machine learning pipelines. However, in some cases it is useful to limit the algorithms and parameters that TPOT explores. For that reason, we allow users to provide TPOT with a custom configuration for its operators and parameters.</p>
<p>The custom TPOT configuration must be in nested dictionary format, where the first level key is the path and name of the operator (e.g., <code>sklearn.naive_bayes.MultinomialNB</code>) and the second level key is the corresponding parameter name for that operator (e.g., <code>fit_prior</code>). The second level key should point to a list of parameter values for that parameter, e.g., <code>'fit_prior': [True, False]</code>.</p>
<p>For a simple example, the configuration could be:</p>
<pre><code class="Python">classifier_config_dict = {
'sklearn.naive_bayes.GaussianNB': {
},
'sklearn.naive_bayes.BernoulliNB': {
'alpha': [1e-3, 1e-2, 1e-1, 1., 10., 100.],
'fit_prior': [True, False]
},
'sklearn.naive_bayes.MultinomialNB': {
'alpha': [1e-3, 1e-2, 1e-1, 1., 10., 100.],
'fit_prior': [True, False]
}
}
</code></pre>

<p>in which case TPOT would only explore pipelines containing <code>GaussianNB</code>, <code>BernoulliNB</code>, <code>MultinomialNB</code>, and tune those algorithm's parameters in the ranges provided. This dictionary can be passed directly within the code to the <code>TPOTClassifier</code>/<code>TPOTRegressor</code> <code>config_dict</code> parameter, described above. For example:</p>
<pre><code class="Python">from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
train_size=0.75, test_size=0.25)

classifier_config_dict = {
'sklearn.naive_bayes.GaussianNB': {
},
'sklearn.naive_bayes.BernoulliNB': {
'alpha': [1e-3, 1e-2, 1e-1, 1., 10., 100.],
'fit_prior': [True, False]
},
'sklearn.naive_bayes.MultinomialNB': {
'alpha': [1e-3, 1e-2, 1e-1, 1., 10., 100.],
'fit_prior': [True, False]
}
}

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2,
config_dict=classifier_config_dict)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')
</code></pre>

<p>Command-line users must create a separate <code>.py</code> file with the custom configuration and provide the path to the file to the <code>tpot</code> call. For example, if the simple example configuration above is saved in <code>tpot_classifier_config.py</code>, that configuration could be used on the command line with the command:</p>
<pre><code>tpot data/mnist.csv -is , -target class -config tpot_classifier_config.py -g 5 -p 20 -v 2 -o tpot_exported_pipeline.py
</code></pre>

<p>For more detailed examples of how to customize TPOT's operator configuration, see the default configurations for <a href="https://github.com/rhiever/tpot/blob/master/tpot/config_classifier.py">classification</a> and <a href="https://github.com/rhiever/tpot/blob/master/tpot/config_regressor.py">regression</a> in TPOT's source code.</p>
<p>Note that you must have all of the corresponding packages for the operators installed on your computer, otherwise TPOT will not be able to use them. For example, if XGBoost is not installed on your computer, then TPOT will simply not import nor use XGBoost in the pipelines it explores.</p>

</div>
</div>
Expand Down
81 changes: 70 additions & 11 deletions docs_sources/using.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,15 +252,17 @@ Some example code with custom TPOT parameters might look like:
```Python
from tpot import TPOTClassifier

pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5, random_state=42, verbosity=2)
pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
random_state=42, verbosity=2)
```

Now TPOT is ready to optimize a pipeline for you. You can tell TPOT to optimize a pipeline based on a data set with the `fit` function:

```Python
from tpot import TPOTClassifier

pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5, random_state=42, verbosity=2)
pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
random_state=42, verbosity=2)
pipeline_optimizer.fit(training_features, training_classes)
```

Expand All @@ -271,7 +273,8 @@ You can then proceed to evaluate the final pipeline on the testing set with the
```Python
from tpot import TPOTClassifier

pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5, random_state=42, verbosity=2)
pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
random_state=42, verbosity=2)
pipeline_optimizer.fit(training_features, training_classes)
print(pipeline_optimizer.score(testing_features, testing_classes))
```
Expand All @@ -281,7 +284,8 @@ Finally, you can tell TPOT to export the corresponding Python code for the optim
```Python
from tpot import TPOTClassifier

pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5, random_state=42, verbosity=2)
pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
random_state=42, verbosity=2)
pipeline_optimizer.fit(training_features, training_classes)
print(pipeline_optimizer.score(testing_features, testing_classes))
pipeline_optimizer.export('tpot_exported_pipeline.py')
Expand All @@ -294,25 +298,41 @@ Check our [examples](examples/MNIST_Example/) to see TPOT applied to some specif
<a name="scoringfunctions"></a>
## Scoring functions

TPOT makes use of `sklearn.model_selection.cross_val_score`, and as such offers the same support for scoring functions. There are two ways to make use of scoring functions with TPOT:
TPOT makes use of `sklearn.model_selection.cross_val_score` for evaluating pipelines, and as such offers the same support for scoring functions. There are two ways to make use of scoring functions with TPOT:

1. You can pass in a string from the list described in the table above. Any other strings will cause internal issues that may break your code down the line.
1. You can pass in a string to the `scoring` parameter from the list above. Any other strings will cause TPOT to throw an exception.

2. You can pass in a function with the signature `scorer(y_true, y_pred)`, where `y_true` are the true target values and `y_pred` are the predicted target values from an estimator. To do this, you should implement your own function. See the example below for further explanation.
2. You can pass a function with the signature `scorer(y_true, y_pred)`, where `y_true` are the true target values and `y_pred` are the predicted target values from an estimator. To do this, you should implement your own function. See the example below for further explanation.

```Python
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
train_size=0.75, test_size=0.25)

def accuracy(y_true, y_pred):
return float(sum(y_pred == y_true)) / len(y_true)

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2,
scoring=accuracy)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')
```

<a name="customconfig"></a>
## Customizing TPOT's operators and parameters

TPOT comes with a handful of default operators and parameter configurations that we believe work well for optimizing machine learning pipelines. However, sometimes it's useful to limit the algorithms and parameters that TPOT explores
TPOT comes with a handful of default operators and parameter configurations that we believe work well for optimizing machine learning pipelines. However, in some cases it is useful to limit the algorithms and parameters that TPOT explores. For that reason, we allow users to provide TPOT with a custom configuration for its operators and parameters.

For example, the configuration file's format could be like:
The custom TPOT configuration must be in nested dictionary format, where the first level key is the path and name of the operator (e.g., `sklearn.naive_bayes.MultinomialNB`) and the second level key is the corresponding parameter name for that operator (e.g., `fit_prior`). The second level key should point to a list of parameter values for that parameter, e.g., `'fit_prior': [True, False]`.

<pre lang="nemerle">
For a simple example, the configuration could be:

```Python
classifier_config_dict = {
'sklearn.naive_bayes.GaussianNB': {
},
Expand All @@ -325,6 +345,45 @@ classifier_config_dict = {
'fit_prior': [True, False]
}
}
</pre>
```

in which case TPOT would only explore pipelines containing `GaussianNB`, `BernoulliNB`, `MultinomialNB`, and tune those algorithm's parameters in the ranges provided. This dictionary can be passed directly within the code to the `TPOTClassifier`/`TPOTRegressor` `config_dict` parameter, described above. For example:

```Python
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
train_size=0.75, test_size=0.25)

classifier_config_dict = {
'sklearn.naive_bayes.GaussianNB': {
},
'sklearn.naive_bayes.BernoulliNB': {
'alpha': [1e-3, 1e-2, 1e-1, 1., 10., 100.],
'fit_prior': [True, False]
},
'sklearn.naive_bayes.MultinomialNB': {
'alpha': [1e-3, 1e-2, 1e-1, 1., 10., 100.],
'fit_prior': [True, False]
}
}

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2,
config_dict=classifier_config_dict)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')
```

Command-line users must create a separate `.py` file with the custom configuration and provide the path to the file to the `tpot` call. For example, if the simple example configuration above is saved in `tpot_classifier_config.py`, that configuration could be used on the command line with the command:

```
tpot data/mnist.csv -is , -target class -config tpot_classifier_config.py -g 5 -p 20 -v 2 -o tpot_exported_pipeline.py
```

For more detailed examples of how to customize TPOT's operator configuration, see the default configurations for [classification](https://github.com/rhiever/tpot/blob/master/tpot/config_classifier.py) and [regression](https://github.com/rhiever/tpot/blob/master/tpot/config_regressor.py) in TPOT's source code.

Note that you must have all of the corresponding packages for the operators installed on your computer, otherwise TPOT will not be able to use them. For example, if XGBoost is not installed on your computer, then TPOT will simply not import nor use XGBoost in the pipelines it explores.

0 comments on commit b49f1ac

Please sign in to comment.