Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas update #121

Merged
merged 6 commits into from
Mar 27, 2024
Merged

Pandas update #121

merged 6 commits into from
Mar 27, 2024

Conversation

nickotto
Copy link
Contributor

[please review the Contribution Guidelines prior to submitting your pull request. go ahead and delete this line if you've already reviewed said guidelines.]

What does this PR do?

Creates a new column in evaluated individuals in "Eval Error". This is to keep the columns for scores as floats without strings. All evaluation errors would be in the "Eval Error" column which is an object dtype. This would resolve any incompatibilities with pandas 2.0+. Updated for both steady state and base estimator version of tpot2

Where should the reviewer start?

Simply run tpot2 with single and multiple objectives. There should be no warning about incompatibility as we saw before when pandas was above 2.0+.

How should this PR be tested?

Run across different python versions.

@nickotto nickotto changed the base branch from main to dev March 22, 2024 23:23
@perib
Copy link
Collaborator

perib commented Mar 23, 2024

While trying to print out some of the evaluated individual columns, I noticed a simple bug in how str was defined from the graph individuals. it normally works by exporting a pipeline then printing the string for that, but that can fail if the hyperparameters are invalid. Just added a try-except block to catch those cases. (This will be changed again in the next update, so I didn't want to make a whole new PR for that.)

There is an edge case where if an individual is created, but its evaluation is incomplete, the value in Eval Errors column in np.nan instead of None. This can happen if the global timeout is triggered (max_time_seconds). We don't want to label those as "timeout" since that should be reserved for going over max_eval_time_seconds.

But I'm not sure if we can change the default missing value in pandas to None, and it doesn't allow us to add nans at the same time we add the strings for the error. I think we just leave it as is for now?

@perib perib merged commit 14922f6 into dev Mar 27, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants