-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bad_alloc even with swap being available #4184
Comments
Version is 0.81 from here https://s3-us-west-2.amazonaws.com/xgboost-wheels/list.html?prefix= Just ran some experiments. The code crashed even at 50% of RAM being used stack trace: |
Current dataset 2_000_000 rows x 1500 features on 0.72
|
Just tried with old 0.72 multigpu version, same code, same dataset GPU memory: [08:55:31] Allocated 4937MB on [0] GeForce GTX 1080 Ti, 6031MB remaining. File "/home/yhgtemp/anaconda3/lib/python3.6/site-packages/xgboost/training.py", line 204, in train |
Seems like this issue #2874 Code crashes on predict part, on 7-8 iterations through cv folds Edit: moved predict part to "cpu_predictor", and still getting the crash on latest iterations During cv iteration we delete the model to release memory
|
@trivialfis is that GPU failing because of temporary memory burst? |
@hlbkin Possibly. But I think you can use tools like nvidia-smi to monitor the status of GPU. And please use latest version, I have been fixing various bugs for XGBoost, it not gonna be bug free but still I believe the latest commit is much more robust than 0.7x. |
Yes, using 0.81 is much better, generally do not have this kind of behaviour, and does not crash with same dataset nvidia-smi does not really show memory bursts - i.e. generally GPU is using less then 50% of memory and then suddenly crashes (with 0.72). I will run some tests with bigger (but similar) dataset and report |
@hlbkin Are things looking better now? |
Yes, everything is fine for now. But what I did also was moving numpy dataset to float32 manually (which is natural for xgboost as I understood) |
Hi Team!
Im getting bad_alloc during xgboost train test iterations in cross validation
I have very specific cross validation in Python, thus we iterate over chunks of numpy array for train and test
Some of the train array sizes only fits in memory using the swap on linux (say additional 30-50 GB of swap memory).
It seems that right after some swap memory is used xgboost crashes
is it expected result?
The text was updated successfully, but these errors were encountered: