-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Refactor tpot to many sklearn models #164
Conversation
Hey! I'm stoked that you're so into TPOT lately. I'm currently focused on getting v0.4 out, but I promise I'll join the conversation about the major refactor soon. :-) BTW, one thing to keep in mind: We're trying to keep Python lean in terms of dependencies, so adding a new dependency (especially ones not in Anaconda) will be a hard sell. It's very important to me that TPOT remains easy to install. |
I totally respect adding dependencies. I am going to keep working on this. I'd be stoked to bounce ideas off of @teaearlgraycold while you are tied up with the 0.4 release. |
Well I'm the main guy who will be pushing the 0.4 release forward. I'd say Also you seem to be changing up the code style a lot, which I'd warn again. On Fri, Jun 3, 2016, 11:19 PM Tony Fast [email protected] wrote:
|
I intend to bring the coding style back closer to what y'all have been working with. All of the code is pep8 compliant at the moment except for some comments. I am trying to get a hold of the model itself; it is a bit confusing. This pull request is part research and part serious. I am offering up this code to see if I am understanding the model clearly from a total outsider perspective. I think there are some awesome UI features that can be built onto |
Below are the UML diagrams for the current refactor. The refactor is mostly working, I need to track down some heisenbugs. It is weird when you get different errors every time you run the same function. I have been using this notebook for development. The I made some changes to the Primitive diagram. The highest score is 0.982261640798fit errors 4 vs. score errors 2 of 275 executions
|
@tonyfast, check out the development branch if you'd like to see where we're heading with TPOT in the immediate future. I think, using the same kind of compile-DEAP-pipelines-to-sklearn-pipelines code, we could also have TPOT directly evolving sklearn pipelines as well. |
Going to close this PR since we have a version of it in the dev branch now. |
What does this PR do?
This PR is a major refactor #91 of
tpot
usingsklearn
models. It introduces 2 new packagestoolz
andtraitlets
and eases the creation of new models.I really wanted to understand the inner workings of
tpot
so this is half research/mostly serious. I used the existing refactor that @teaearlgraycold is working on for inspiration. I think there may be a meshing of both of these pull requests to lead to the big refactor.I still need to add quite a few models, currently everything seems to work except for the scoring. I am going to need to write tests to confirm.
High level changes
tpot
isClassifierMixin
allowing the different scoring functions to be applied in Rework custom scoring functionality #156. Usingsklearn
mixins should make it easier to control error functions. Add option to limit the Classifiers/Regressors that TPOT uses #146deap
methods andtpot
methods. Bespoke primitive functions were moved toprimitives.py
.PipelineEstimator
class that can score anindividual
during evaluation.EvaluateEstimator
class. This class allowssklearn
to be introduced with strongly typed parameters. This base class is in the subdirectorymodels
and the underlying scripts are mapped to theirsklearn
toolbox.Creating a model
Creates a MultiIndex Pandas DataFrame for the source data.
This should cut down on pandas operations. The first indices use boolean indices to indicate
test
ortrain
,True
orFalse
. The next slice of indices are the classes as integers, string names can be recovered later.#113 suggests using numpy array, but a well structured dataframe could extend to using
xarray
anddask
. It should be easier to discover any copying problems #78Where should the reviewer start?
How should this PR be tested?
I still need to add tests and replace the documentation.
Any background context you want to provide?
I love
tpot
. It is the first tool I have used that truly discovers things I wouldn't have found myself.What are the relevant issues?
I added the references above.
Screenshots (if appropriate)
Questions: