-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jupyter kernel dies / Segmentation fault : 11, when upgrading xgboost to > v0.90 on MacOS M1 Chip #7504
Comments
Could you please try latest xgboost? I suspect that's a memory usage issue since the parameters are tested. |
Latest I can go is 1.3-1 as 1.5 is not yet supported by AWS SageMaker. I have tried to upgrade to 1.3-1 and I run into the same issue, dead kernel. The XGBoost 0.90 version will be deprecated on December 31, 2021 by AWS SageMaker. I need to be on v1.0-1 or >. https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html |
@trivialfis Is there a stable version > 0.90 and <=1.3-1 that I can use? AWS SageMaker doesn't support the latest version, unfortunately. |
I took another look, your data is small oom shouldn't happen. Could you please share a reproducible script I can try? |
Yeah, it's not a memory issue. Data is of very small size. It works fine with v0.90 and whenever I upgrade to v >1.0, I run into issues. Here's some reproducible code. I am using mock data here, so not sure if the issue with be reproducible or if it is with the contents of data. For reference, I am running this on Mac notebook with Apple M1 chip.
|
I can't run your script:
|
Try this. This works for me.
|
Did you reproduce the crash with this script? |
That's weird. I just ran 1.1:
|
Yeah, strange. I haven't been able to fit on > v0.90 |
@hcho3 Is there any issue for macOS in old versions on top of your mind? |
I am on MacOS Monterey v 12.0.1, Apple M1 2020 Chip |
Ah, I think xgboost doesn't work on m1. #7501 |
v0.90 works though. Could you suggest any work-arounds? v0.90 is being deprecated by AWS SageMaker on 31st December. At that point, our model will no longer be able to serve. All of our training is done locally, so we need to make it work on local Jupyter Notebooks. |
Any insight into what the issue is with version > 0.90? I guess what changed between the versions that makes it crash. |
I'm not sure. None of the maintainer has access to m1 |
At a guess I'd say that 0.9 was compiled without x86 SIMD instructions (which aren't supported in Rosetta), and newer versions have been. So you're probably pulling in the x86 version, running that through Rosetta and then it crashes when it hits an unsupported SIMD instruction (either AVX, AVX2 or AVX-512). You may be able to recompile one of the other versions without SIMD instructions producing a wheel that will run in Rosetta on an M1. I'm not particularly familiar with how the Python builds are compiled though. |
Thank you for pointing that out. To install the Python package from source one can simply run
Under the Python package directory. Hopefully we will see support from github action and build the wheel from there. |
@Craigacp @trivialfis Thanks. I tried the following:
from the discussion here: https://discuss.xgboost.ai/t/xgboost-on-apple-m1/2004/9 It didn't work for me. I am not familiar with the architecture - could you provide some pointers on how I could recompile one of the other versions without SIMD instructions and produce a wheel that will run in Rosetta on an M1? |
Did it fail during compilation, or at runtime? What kind of error did you get? Are you running a native ARM version of Homebrew, or the x86 version via Rosetta? |
It fails when I run the I tried to search for Homebrew in mac system report, however, I am not seeing it in the list under
How do I find what Homebrew version I am running? |
Assuming you installed it to the default locations then on ARM it's in To get a more helpful error message try running the fit command from a regular python interpreter rather than a notebook. Though the notebook server process might have more information in its log. |
It's in /usr/local/Homebrew, so x86 i suppose. I ran it in a
|
I have successfully compiled and installed XGBoost v1.3.3 on an M1 in Python, using a M1 native version of homebrew & python 3.9 installed from that homebrew. It compiled with no trouble, though I did compile the native library first (by doing I built a test regression using some randomly generated data from numpy on a squared loss and the |
Thanks. @Craigacp I replicated the steps, but it still didn't work for me. Maybe, I am missing something. Here's what I did exactly - Installed the Apple M1 native
Install python 3.9 using brew
In xgboost directory,
Now, in my notebook:
Kernel dead. |
If you pip install xgboost then it will pull in the public one which doesn't have the right binary. This suggests that you aren't running jupyter from the ARM64 native python virtual environment that you installed the native ARM64 xgboost into. Try using that venv to run jupyter. Unfortunately the Python ML ecosystem is only starting to catch up to the idea that there are different CPU architectures. TensorFlow sort of works, but pytorch doesn't (or at least it didn't the last time I checked), and scikit-learn doesn't have binaries available. Jupyter seems to work ok, but I've not tried matplotlib, seaborn or pandas. This is an ecosystem wide problem, much of which is predicated on the fact that most useful Python ML libraries are wrappers around native code, that there wasn't a fortran compiler available for macOS M1 to compile scipy with (that's now been fixed), and then that Github Actions doesn't provide M1 build resources. At the moment the M1 Macs are not suitable for data science or ML work unless you understand the differences in CPU architecture and how to build your environment from source. That will get fixed over time, but it's not as easy as it is on x86 Macs yet. |
Thanks, that makes sense. I created a virtual environment and launched jupyter from there. Now, when I run my the notebook, it the
throws an error.
|
|
got it. I am using a bunch of other interfaces from Looks like
|
I don't have any experience with using conda, and it looks like scikit-learn still has some difficulties installing on M1 scikit-learn/scikit-learn#19137. |
Thank you for the discussion. I will close this one as it duplicates #6408 . Maybe we can build the wheel using https://aws.amazon.com/about-aws/whats-new/2021/12/amazon-ec2-m1-mac-instances-macos/ @hcho3 Let's move the conversation to #6408 . ;-) |
I am training a xgboost model locally. My data is not large, few 1000 rows and 100 columns. I have successfully trained model using xgboost v0.90 on python v3.9. I need to upgrade xgboost to v > 1.0 as the older ones are being deprecated. I run
%pip install xgboost==1.1.0
within jupyter notebook and cmd terminal as well. Upon upgrading, when I attempt to fit the model, myjupyter kernel dies
.This is the step where my kernel dies.
Upon check versions installed in pip and conda:
I have also tried to use
conda-forge
.The text was updated successfully, but these errors were encountered: