Skip to content

Commit c2c182f

Browse files
authored
Merge pull request #158 from perib/configspace_update
Configspace update
2 parents d8ea9ec + 2e4bc0f commit c2c182f

File tree

82 files changed

+2787
-384
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+2787
-384
lines changed

README.md

+101-7
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,54 @@
1-
# TPOT2 ALPHA
1+
# TPOT2
22

33
![Tests](https://github.com/EpistasisLab/tpot2/actions/workflows/tests.yml/badge.svg)
44
[![PyPI Downloads](https://img.shields.io/pypi/dm/tpot2?label=pypi%20downloads)](https://pypi.org/project/TPOT2)
55
[![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/tpot2?label=conda%20downloads)](https://anaconda.org/conda-forge/tpot2)
66

7-
TPOT stands for Tree-based Pipeline Optimization Tool. TPOT2 is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. Consider TPOT2 your Data Science Assistant.
7+
TPOT stands for Tree-based Pipeline Optimization Tool. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. Consider TPOT your Data Science Assistant.
88

9-
TPOT2 is a rewrite of TPOT with some additional functionality. Notably, we added support for graph-based pipelines and additional parameters to better specify the desired search space.
10-
TPOT2 is currently in Alpha. This means that there will likely be some backwards incompatible changes to the API as we develop. Some implemented features may be buggy. There is a list of known issues written at the bottom of this README. Some features have placeholder names or are listed as "Experimental" in the doc string. These are features that may not be fully implemented and may or may not work with all other features.
9+
## Contributors
10+
11+
TPOT recently went through a major refactoring. The package was rewritten from scratch to improve efficiency and performance, support new features, and fix numerous bugs. New features include genetic feature selection, a significantly expanded and more flexible method of defining search spaces, multi-objective optimization, a more modular framework allowing for easier customization of the evolutionary algorithm, and more. While in development, this new version was referred to as "TPOT2" but we have now merged what was once TPOT2 into the main TPOT package. You can learn more about this new version of TPOT in our GPTP paper titled "TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning."
12+
13+
Ribeiro, P. et al. (2024). TPOT2: A New Graph-Based Implementation of the Tree-Based Pipeline Optimization Tool for Automated Machine Learning. In: Winkler, S., Trujillo, L., Ofria, C., Hu, T. (eds) Genetic Programming Theory and Practice XX. Genetic and Evolutionary Computation. Springer, Singapore. https://doi.org/10.1007/978-981-99-8413-8_1
14+
15+
The current version of TPOT was developed at Cedars-Sinai by:
16+
- Pedro Henrique Ribeiro (Lead developer - https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/)
17+
- Anil Saini ([email protected])
18+
- Jose Hernandez ([email protected])
19+
- Jay Moran ([email protected])
20+
- Nicholas Matsumoto ([email protected])
21+
- Hyunjun Choi ([email protected])
22+
- Miguel E. Hernandez ([email protected])
23+
- Jason Moore ([email protected])
24+
25+
The original version of TPOT was primarily developed at the University of Pennsylvania by:
26+
- Randal S. Olson ([email protected])
27+
- Weixuan Fu ([email protected])
28+
- Daniel Angell ([email protected])
29+
- Jason Moore ([email protected])
30+
- and many more generous open-source contributors
1131

12-
If you are interested in using the current stable release of TPOT, you can do that here: [https://github.com/EpistasisLab/tpot/](https://github.com/EpistasisLab/tpot/).
1332

1433

1534
## License
1635

1736
Please see the [repository license](https://github.com/EpistasisLab/tpot2/blob/main/LICENSE) for the licensing and usage information for TPOT2.
1837
Generally, we have licensed TPOT2 to make it as widely usable as possible.
1938

39+
TPOT is free software: you can redistribute it and/or modify
40+
it under the terms of the GNU Lesser General Public License as
41+
published by the Free Software Foundation, either version 3 of
42+
the License, or (at your option) any later version.
43+
44+
TPOT is distributed in the hope that it will be useful,
45+
but WITHOUT ANY WARRANTY; without even the implied warranty of
46+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
47+
GNU Lesser General Public License for more details.
48+
49+
You should have received a copy of the GNU Lesser General Public
50+
License along with TPOT. If not, see <http://www.gnu.org/licenses/>.
51+
2052
## Documentation
2153

2254
[The documentation webpage can be found here.](https://epistasislab.github.io/tpot2/)
@@ -54,9 +86,8 @@ matplotlib
5486
traitlets
5587
lightgbm
5688
optuna
57-
baikal
5889
jupyter
59-
networkx>
90+
networkx
6091
dask
6192
distributed
6293
dask-ml
@@ -186,6 +217,69 @@ Setting `verbose` to 5 can be helpful during debugging as it will print out the
186217

187218
We welcome you to check the existing issues for bugs or enhancements to work on. If you have an idea for an extension to TPOT2, please file a new issue so we can discuss it.
188219

220+
## Citing TPOT
221+
222+
If you use TPOT in a scientific publication, please consider citing at least one of the following papers:
223+
224+
Trang T. Le, Weixuan Fu and Jason H. Moore (2020). [Scaling tree-based automated machine learning to biomedical big data with a feature set selector](https://academic.oup.com/bioinformatics/article/36/1/250/5511404). *Bioinformatics*.36(1): 250-256.
225+
226+
BibTeX entry:
227+
228+
```bibtex
229+
@article{le2020scaling,
230+
title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector},
231+
author={Le, Trang T and Fu, Weixuan and Moore, Jason H},
232+
journal={Bioinformatics},
233+
volume={36},
234+
number={1},
235+
pages={250--256},
236+
year={2020},
237+
publisher={Oxford University Press}
238+
}
239+
```
240+
241+
242+
Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore (2016). [Automating biomedical data science through tree-based pipeline optimization](http://link.springer.com/chapter/10.1007/978-3-319-31204-0_9). *Applications of Evolutionary Computation*, pages 123-137.
243+
244+
BibTeX entry:
245+
246+
```bibtex
247+
@inbook{Olson2016EvoBio,
248+
author={Olson, Randal S. and Urbanowicz, Ryan J. and Andrews, Peter C. and Lavender, Nicole A. and Kidd, La Creis and Moore, Jason H.},
249+
editor={Squillero, Giovanni and Burelli, Paolo},
250+
chapter={Automating Biomedical Data Science Through Tree-Based Pipeline Optimization},
251+
title={Applications of Evolutionary Computation: 19th European Conference, EvoApplications 2016, Porto, Portugal, March 30 -- April 1, 2016, Proceedings, Part I},
252+
year={2016},
253+
publisher={Springer International Publishing},
254+
pages={123--137},
255+
isbn={978-3-319-31204-0},
256+
doi={10.1007/978-3-319-31204-0_9},
257+
url={http://dx.doi.org/10.1007/978-3-319-31204-0_9}
258+
}
259+
```
260+
261+
Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore (2016). [Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science](http://dl.acm.org/citation.cfm?id=2908918). *Proceedings of GECCO 2016*, pages 485-492.
262+
263+
BibTeX entry:
264+
265+
```bibtex
266+
@inproceedings{OlsonGECCO2016,
267+
author = {Olson, Randal S. and Bartley, Nathan and Urbanowicz, Ryan J. and Moore, Jason H.},
268+
title = {Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science},
269+
booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference 2016},
270+
series = {GECCO '16},
271+
year = {2016},
272+
isbn = {978-1-4503-4206-3},
273+
location = {Denver, Colorado, USA},
274+
pages = {485--492},
275+
numpages = {8},
276+
url = {http://doi.acm.org/10.1145/2908812.2908918},
277+
doi = {10.1145/2908812.2908918},
278+
acmid = {2908918},
279+
publisher = {ACM},
280+
address = {New York, NY, USA},
281+
}
282+
```
189283

190284
### Support for TPOT2
191285

Tutorial/2_Search_Spaces.ipynb

+2-6
Original file line numberDiff line numberDiff line change
@@ -1008,11 +1008,7 @@
10081008
"\n",
10091009
"### EstimatorNode\n",
10101010
"\n",
1011-
"The EstimatorNode represents the hyperparameter search space for a scikit-learn estimator. \n",
1012-
"\n",
1013-
"Note that `ConfigSpace` doesn't support `None` in its search space, and does not support the booleans True or False as fixed parameters (though booleans seem to be allowed in Categorical search spaces). To get around this, use the macros defined in:\n",
1014-
"\n",
1015-
"`from tpot2.search_spaces.nodes.estimator_node import NONE_SPECIAL_STRING, TRUE_SPECIAL_STRING, FALSE_SPECIAL_STRING`"
1011+
"The EstimatorNode represents the hyperparameter search space for a scikit-learn estimator. "
10161012
]
10171013
},
10181014
{
@@ -19632,7 +19628,7 @@
1963219628
],
1963319629
"metadata": {
1963419630
"kernelspec": {
19635-
"display_name": "tpot2env",
19631+
"display_name": "myenv",
1963619632
"language": "python",
1963719633
"name": "python3"
1963819634
},

tpot2/__init__.py

+34
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,37 @@
1+
"""
2+
This file is part of the TPOT library.
3+
4+
The current version of TPOT was developed at Cedars-Sinai by:
5+
- Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/)
6+
- Anil Saini ([email protected])
7+
- Jose Hernandez ([email protected])
8+
- Jay Moran ([email protected])
9+
- Nicholas Matsumoto ([email protected])
10+
- Hyunjun Choi ([email protected])
11+
- Miguel E. Hernandez ([email protected])
12+
- Jason Moore ([email protected])
13+
14+
The original version of TPOT was primarily developed at the University of Pennsylvania by:
15+
- Randal S. Olson ([email protected])
16+
- Weixuan Fu ([email protected])
17+
- Daniel Angell ([email protected])
18+
- Jason Moore ([email protected])
19+
- and many more generous open-source contributors
20+
21+
TPOT is free software: you can redistribute it and/or modify
22+
it under the terms of the GNU Lesser General Public License as
23+
published by the Free Software Foundation, either version 3 of
24+
the License, or (at your option) any later version.
25+
26+
TPOT is distributed in the hope that it will be useful,
27+
but WITHOUT ANY WARRANTY; without even the implied warranty of
28+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
29+
GNU Lesser General Public License for more details.
30+
31+
You should have received a copy of the GNU Lesser General Public
32+
License along with TPOT. If not, see <http://www.gnu.org/licenses/>.
33+
34+
"""
135

236
#TODO: are all the imports in the init files done correctly?
337
#TODO clean up import organization

tpot2/_version.py

+34
Original file line numberDiff line numberDiff line change
@@ -1 +1,35 @@
1+
"""
2+
This file is part of the TPOT library.
3+
4+
The current version of TPOT was developed at Cedars-Sinai by:
5+
- Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/)
6+
- Anil Saini ([email protected])
7+
- Jose Hernandez ([email protected])
8+
- Jay Moran ([email protected])
9+
- Nicholas Matsumoto ([email protected])
10+
- Hyunjun Choi ([email protected])
11+
- Miguel E. Hernandez ([email protected])
12+
- Jason Moore ([email protected])
13+
14+
The original version of TPOT was primarily developed at the University of Pennsylvania by:
15+
- Randal S. Olson ([email protected])
16+
- Weixuan Fu ([email protected])
17+
- Daniel Angell ([email protected])
18+
- Jason Moore ([email protected])
19+
- and many more generous open-source contributors
20+
21+
TPOT is free software: you can redistribute it and/or modify
22+
it under the terms of the GNU Lesser General Public License as
23+
published by the Free Software Foundation, either version 3 of
24+
the License, or (at your option) any later version.
25+
26+
TPOT is distributed in the hope that it will be useful,
27+
but WITHOUT ANY WARRANTY; without even the implied warranty of
28+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
29+
GNU Lesser General Public License for more details.
30+
31+
You should have received a copy of the GNU Lesser General Public
32+
License along with TPOT. If not, see <http://www.gnu.org/licenses/>.
33+
34+
"""
135
__version__ = '0.1.8a0'

tpot2/builtin_modules/arithmetictransformer.py

+34
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,37 @@
1+
"""
2+
This file is part of the TPOT library.
3+
4+
The current version of TPOT was developed at Cedars-Sinai by:
5+
- Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/)
6+
- Anil Saini ([email protected])
7+
- Jose Hernandez ([email protected])
8+
- Jay Moran ([email protected])
9+
- Nicholas Matsumoto ([email protected])
10+
- Hyunjun Choi ([email protected])
11+
- Miguel E. Hernandez ([email protected])
12+
- Jason Moore ([email protected])
13+
14+
The original version of TPOT was primarily developed at the University of Pennsylvania by:
15+
- Randal S. Olson ([email protected])
16+
- Weixuan Fu ([email protected])
17+
- Daniel Angell ([email protected])
18+
- Jason Moore ([email protected])
19+
- and many more generous open-source contributors
20+
21+
TPOT is free software: you can redistribute it and/or modify
22+
it under the terms of the GNU Lesser General Public License as
23+
published by the Free Software Foundation, either version 3 of
24+
the License, or (at your option) any later version.
25+
26+
TPOT is distributed in the hope that it will be useful,
27+
but WITHOUT ANY WARRANTY; without even the implied warranty of
28+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
29+
GNU Lesser General Public License for more details.
30+
31+
You should have received a copy of the GNU Lesser General Public
32+
License along with TPOT. If not, see <http://www.gnu.org/licenses/>.
33+
34+
"""
135
import random
236
import numpy as np
337
from sklearn.base import BaseEstimator, TransformerMixin

tpot2/builtin_modules/column_one_hot_encoder.py

+34
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,37 @@
1+
"""
2+
This file is part of the TPOT library.
3+
4+
The current version of TPOT was developed at Cedars-Sinai by:
5+
- Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/)
6+
- Anil Saini ([email protected])
7+
- Jose Hernandez ([email protected])
8+
- Jay Moran ([email protected])
9+
- Nicholas Matsumoto ([email protected])
10+
- Hyunjun Choi ([email protected])
11+
- Miguel E. Hernandez ([email protected])
12+
- Jason Moore ([email protected])
13+
14+
The original version of TPOT was primarily developed at the University of Pennsylvania by:
15+
- Randal S. Olson ([email protected])
16+
- Weixuan Fu ([email protected])
17+
- Daniel Angell ([email protected])
18+
- Jason Moore ([email protected])
19+
- and many more generous open-source contributors
20+
21+
TPOT is free software: you can redistribute it and/or modify
22+
it under the terms of the GNU Lesser General Public License as
23+
published by the Free Software Foundation, either version 3 of
24+
the License, or (at your option) any later version.
25+
26+
TPOT is distributed in the hope that it will be useful,
27+
but WITHOUT ANY WARRANTY; without even the implied warranty of
28+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
29+
GNU Lesser General Public License for more details.
30+
31+
You should have received a copy of the GNU Lesser General Public
32+
License along with TPOT. If not, see <http://www.gnu.org/licenses/>.
33+
34+
"""
135
import numpy as np
236
from scipy import sparse
337

tpot2/builtin_modules/estimatortransformer.py

+34
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,37 @@
1+
"""
2+
This file is part of the TPOT library.
3+
4+
The current version of TPOT was developed at Cedars-Sinai by:
5+
- Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/)
6+
- Anil Saini ([email protected])
7+
- Jose Hernandez ([email protected])
8+
- Jay Moran ([email protected])
9+
- Nicholas Matsumoto ([email protected])
10+
- Hyunjun Choi ([email protected])
11+
- Miguel E. Hernandez ([email protected])
12+
- Jason Moore ([email protected])
13+
14+
The original version of TPOT was primarily developed at the University of Pennsylvania by:
15+
- Randal S. Olson ([email protected])
16+
- Weixuan Fu ([email protected])
17+
- Daniel Angell ([email protected])
18+
- Jason Moore ([email protected])
19+
- and many more generous open-source contributors
20+
21+
TPOT is free software: you can redistribute it and/or modify
22+
it under the terms of the GNU Lesser General Public License as
23+
published by the Free Software Foundation, either version 3 of
24+
the License, or (at your option) any later version.
25+
26+
TPOT is distributed in the hope that it will be useful,
27+
but WITHOUT ANY WARRANTY; without even the implied warranty of
28+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
29+
GNU Lesser General Public License for more details.
30+
31+
You should have received a copy of the GNU Lesser General Public
32+
License along with TPOT. If not, see <http://www.gnu.org/licenses/>.
33+
34+
"""
135
from numpy import ndarray
236
from sklearn.base import BaseEstimator, TransformerMixin
337
from sklearn.model_selection import cross_val_predict

tpot2/builtin_modules/feature_set_selector.py

+18-6
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,22 @@
1-
#!/usr/bin/env python3
2-
# -*- coding: utf-8 -*-
3-
"""This file is part of the TPOT library.
4-
5-
TPOT was primarily developed at the University of Pennsylvania by:
1+
"""
2+
This file is part of the TPOT library.
3+
4+
The current version of TPOT was developed at Cedars-Sinai by:
5+
- Pedro Henrique Ribeiro (https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/)
6+
- Anil Saini ([email protected])
7+
- Jose Hernandez ([email protected])
8+
- Jay Moran ([email protected])
9+
- Nicholas Matsumoto ([email protected])
10+
- Hyunjun Choi ([email protected])
11+
- Miguel E. Hernandez ([email protected])
12+
- Jason Moore ([email protected])
13+
14+
The original version of TPOT was primarily developed at the University of Pennsylvania by:
615
- Randal S. Olson ([email protected])
716
- Weixuan Fu ([email protected])
817
- Daniel Angell ([email protected])
9-
- and many more generous open source contributors
18+
- Jason Moore ([email protected])
19+
- and many more generous open-source contributors
1020
1121
TPOT is free software: you can redistribute it and/or modify
1222
it under the terms of the GNU Lesser General Public License as
@@ -20,7 +30,9 @@
2030
2131
You should have received a copy of the GNU Lesser General Public
2232
License along with TPOT. If not, see <http://www.gnu.org/licenses/>.
33+
2334
"""
35+
2436
#TODO handle sparse input?
2537

2638
import numpy as np

0 commit comments

Comments
 (0)