-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BoundsError on split_set_threads! #201
Comments
I've managed to reduce the example to the following, again let me know how to best share the dataset if you want to reproduce: using CSV, DataFrames, MLJBase, EvoTrees
using StableRNGs
data = CSV.read("/Users/olivierlabayle/Downloads/pb_data.csv", DataFrame)
y = categorical(data.target)
X = data[!, Not(:target)]
train, test = MLJBase.train_test_pairs(Holdout(), 1:size(X, 1), X, y)[1]
rng = StableRNG(1)
model = EvoTreeClassifier(nround=100, lambda=1e-5, max_depth=7, rng=rng)
Xtrain, ytrain = MLJBase.reformat(model, selectrows(X, train), selectrows(y, train))
MLJBase.fit(model, 1, Xtrain, ytrain) The issue arises because |
Thanks for raising this! Could you confirm the EvoTrees's version you're using? |
Yes this is with version v0.14.2. Out of curiosity I've tried v0.13 with 100 different random seeds and can't reproduce the bug so you are probably right! |
I tried various runs, including with |
@olivierlabayle Could you test current |
Thanks, I confirm I can't seem to be able to reproduce the bug on this dataset with |
Did you manage to find the origin of the problem? |
Not yet! It's actually quite puzzling as I failed to reduce the issue to a simpler reproducible problem. I'm afraid it's unlikely someone will be willing to investigate based on a full EvoTrees training that just hit an issue at 5th iteration. That being, I think there are some relevant cues to help contnue the investigation. Notably, bugs appears to creep in in the
My next step would be to try if I can be successful reproducing in a MWE the failure on the GPU side, so I can submit a relvant issue. Let me know if there's something you'd like to investigate on your end. |
Thank you for the feedback! I will try to investigate further the original issue since I think I've just managed to trigger the same error on a similar dataset with v0.13.1. Since I don't know the internals it might take me some time though. |
Have you encountered any new issue? With v0.15.0, I've paid closer attention to any numerical instabilities and found the new release to be realiable under all tested scenarios. I would therefore close given the significant revamp unless there're still scenarios leading to crashes. |
Sorry it was just faster to move to XGBoost.jl, I think you can close this and I'll try again later when I have time |
Hi,
I think I am facing an edge case where the tree split seems to result in a
BoundsError
. It has been quite tedious to come up with a reproducible example and it is not ideal since it originates from a large dataset (That I can probably share if needed). Due to the asynchronous fitting strategy of MLJ this is also hard to debug (I can't step into...). The line that throws an error is this one. Do you see any reason for which this could result in aBoundsError
? I must also say that this error is stochastic since changing torng = StableRNG(1234)
for instance, does not raise.The code and stacktrace are below (but you wont be able to reproduce without the dataset):
code:
stacktrace:
The text was updated successfully, but these errors were encountered: