You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as I can see, the autoxgboost function internally uses holdout for the objective function within the mbo tuning (it is hard-coded).
Wouldn't it be cool if users could also specify their own objective here? For example, I want to use 3-fold CV (or stratified CV) instead of the hard-coded holdout.
Currently, mbo seems to use the same test-set in each iteration as the resample instance (e.g. test splits) are computed outside from the objective function. This way I am not able to different test splits in each iteration, right?
Isn't mbo somehow starting to overfit for those holdout test splits at some point?
The text was updated successfully, but these errors were encountered:
And yes, I would ignore task.test data here completely (on which the early-stopping is based). But maybe it is better to let the user decide if he really wants to do this or not. Or do you see any other problem here?
The main idea is that no resampling should be necessary and xgboost can utilize the full parallelism of the system. But I see the point that there are cases in which this would be totally useful.
This is usually how it is done, otherwise a lot of noise is added. I experimented on some datasets to see how bad the overfitting is, but I couldn't directly find (or create artificially) any "overtuning" on the holdout data. But in general this is something I'm quite interested in to improve, but I need to find cases where this is actually a problem first
As far as I can see, the autoxgboost function internally uses holdout for the objective function within the mbo tuning (it is hard-coded).
Isn't mbo somehow starting to overfit for those holdout test splits at some point?
The text was updated successfully, but these errors were encountered: