Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost supports NaN but tpot enforces imputation #836

Open
trhallam opened this issue Feb 19, 2019 · 2 comments
Open

XGBoost supports NaN but tpot enforces imputation #836

trhallam opened this issue Feb 19, 2019 · 2 comments

Comments

@trhallam
Copy link

As a general rule tpot enforces imputation to match sklearn requirements for all real values in the input and output data. XGboost as a special case allows for the input of NaN values.

Context of the issue

I am trying to optimise XGboost specifically using a data set with quite a lot of holes in it. I do not want to perform imputation as it affects the results. I looked in base.py and quickly modified the _check_data function to ignore NaN values and to not perform imputation but was wondering if tpot can be modified to accommodate this scenario with XGboost?

A 'no_imputation' keyword might be added to TPOTBase .__init__ for example to prevent imputation.

Example Edits:

else:
            if not self._imputed and np.any(np.isnan(features)):
                self._imputed = True
                features = self._impute_values(features)

        try:
            if target is not None:
                X, y = check_X_y(features, target, accept_sparse=True, dtype=np.float64, 

force_all_finite='allow-nan')

                return X, y
@weixuanfu
Copy link
Contributor

weixuanfu commented Feb 19, 2019

TPOT enforces imputation on dataset with NaN because most operators in TPOT configuration do not support NaN. We may need another configuration if this no_impuation option is added.

@trhallam
Copy link
Author

I understand, it is a very specific case that I'm working on. Just currently there is no way to escape imputation with TPOT unless you modify the source. It is not a necessity perhaps more a nice to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants