Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can define my own objective function for mbo? #55

Open
giuseppec opened this issue Jul 21, 2018 · 2 comments
Open

can define my own objective function for mbo? #55

giuseppec opened this issue Jul 21, 2018 · 2 comments

Comments

@giuseppec
Copy link

As far as I can see, the autoxgboost function internally uses holdout for the objective function within the mbo tuning (it is hard-coded).

  1. Wouldn't it be cool if users could also specify their own objective here? For example, I want to use 3-fold CV (or stratified CV) instead of the hard-coded holdout.
  2. Currently, mbo seems to use the same test-set in each iteration as the resample instance (e.g. test splits) are computed outside from the objective function. This way I am not able to different test splits in each iteration, right?
    Isn't mbo somehow starting to overfit for those holdout test splits at some point?
@giuseppec
Copy link
Author

giuseppec commented Jul 21, 2018

I would be happy with replacing this line here

res = performance(pred, measure)

with crossval(lrn, task.train, measure)$aggr.

And yes, I would ignore task.test data here completely (on which the early-stopping is based). But maybe it is better to let the user decide if he really wants to do this or not. Or do you see any other problem here?

@ja-thomas
Copy link
Owner

  1. The main idea is that no resampling should be necessary and xgboost can utilize the full parallelism of the system. But I see the point that there are cases in which this would be totally useful.

  2. This is usually how it is done, otherwise a lot of noise is added. I experimented on some datasets to see how bad the overfitting is, but I couldn't directly find (or create artificially) any "overtuning" on the holdout data. But in general this is something I'm quite interested in to improve, but I need to find cases where this is actually a problem first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants