Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter kernel dies / Segmentation fault : 11, when upgrading xgboost to > v0.90 on MacOS M1 Chip #7504

Closed
kevalshah90 opened this issue Dec 10, 2021 · 32 comments

Comments

@kevalshah90
Copy link

kevalshah90 commented Dec 10, 2021

I am training a xgboost model locally. My data is not large, few 1000 rows and 100 columns. I have successfully trained model using xgboost v0.90 on python v3.9. I need to upgrade xgboost to v > 1.0 as the older ones are being deprecated. I run %pip install xgboost==1.1.0 within jupyter notebook and cmd terminal as well. Upon upgrading, when I attempt to fit the model, my jupyter kernel dies.

import pandas
import xgboost
from xgboost import XGBRegressor
import numpy as np
from sklearn.model_selection import train_test_split, RandomizedSearchCV

print(xgboost.__version__)
1.1.0


# read data
df = pd.read_csv('') 

# split df into train and test
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,0:21], df.iloc[:,-1], test_size=0.1)

X_train.shape
(2000,100)

# xgboost regression model
model = XGBRegressor(objective = 'reg:squarederror')

# Parameter distributions

params = { 
          "colsample_bytree": uniform(0.5, 0.3), 
          "gamma": uniform(0, 0.5), 
          "learning_rate": uniform(0.01, 0.5),  
          "max_depth": randint(2, 8), 
          "n_estimators": randint(100, 150), 
          "subsample": uniform(0.3, 0.6) 
}

This is the step where my kernel dies.

# Hyperparameter tuning
r = RandomizedSearchCV(model, param_distributions=params, n_iter=10, scoring="neg_mean_absolute_error", cv=3, verbose=1, n_jobs=1, return_train_score=True, error_score='raise')

# Fit model
r.fit(X_train.toarray(), y_train.values)

Upon check versions installed in pip and conda:

pip list
xgboost                   1.1.0

conda list
xgboost                   1.1.0                    pypi_0    pypi

I have also tried to use conda-forge.

    conda install -c conda-forge py-xgboost=1.0.2
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: - 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                                                           

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - py-xgboost=1.0.2 -> python[version='>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0']

Your python: python=3.9

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.
@trivialfis
Copy link
Member

Could you please try latest xgboost? I suspect that's a memory usage issue since the parameters are tested.

@kevalshah90
Copy link
Author

kevalshah90 commented Dec 11, 2021

Latest I can go is 1.3-1 as 1.5 is not yet supported by AWS SageMaker. I have tried to upgrade to 1.3-1 and I run into the same issue, dead kernel. The XGBoost 0.90 version will be deprecated on December 31, 2021 by AWS SageMaker. I need to be on v1.0-1 or >.

https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html

@kevalshah90
Copy link
Author

@trivialfis Is there a stable version > 0.90 and <=1.3-1 that I can use? AWS SageMaker doesn't support the latest version, unfortunately.

@trivialfis
Copy link
Member

I took another look, your data is small oom shouldn't happen. Could you please share a reproducible script I can try?

@kevalshah90
Copy link
Author

kevalshah90 commented Dec 13, 2021

@trivialfis

Yeah, it's not a memory issue. Data is of very small size. It works fine with v0.90 and whenever I upgrade to v >1.0, I run into issues. Here's some reproducible code. I am using mock data here, so not sure if the issue with be reproducible or if it is with the contents of data. For reference, I am running this on Mac notebook with Apple M1 chip.


import numpy as np
import pandas as pd
import seaborn as sns
import scipy as stats
import sklearn
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'
import random
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
import xgboost
from xgboost import XGBRegressor, plot_importance
from sklearn.model_selection import train_test_split, GridSearchCV, KFold, RandomizedSearchCV
from sklearn.metrics import mean_squared_error
from scipy.stats import uniform, randint
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler, OrdinalEncoder
import warnings

pd.set_option('use_inf_as_na', True)

print(xgboost.__version__)
# 0.90

# read data
# df = pd.read_csv('xxx.csv')

# make up some data
data = np.random.randint(5,50,size=(1000,9))

cats = ['cat1','cat2','cat3','cat4','cat5']

df = pd.DataFrame(data, columns=['col1', 'col2', 'col3', 'col4', 'col5','col6','col7','col8','col9'])

df['cat_var1'] = np.random.choice(cats, 1000)
df['cat_var2'] = np.random.choice(cats, 1000)

df['response'] = np.random.randint(5,50,1000)

# split df into train and test
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,0:11], df.iloc[:,-1], test_size=0.1, random_state=42)


# In[197]:
X_train.shape
(1000,11)

# In[198]:
X_test.shape
(1000,11)

# Encode categorical variables
cat_vars = ['cat1','cat2','cat3','cat4','cat5']
cat_transform = ColumnTransformer([('cat', OneHotEncoder(handle_unknown='ignore'), cat_vars)], remainder='passthrough')

encoder = cat_transform.fit(X_train)
X_train = encoder.transform(X_train)
X_test = encoder.transform(X_test)


# Define a xgboost regression model
model = XGBRegressor(objective = 'reg:squarederror')


# In[216]:

params = {
          "colsample_bytree": uniform(0.6, 0.4), # fraction of cols to sample
          "gamma": uniform(0, 0.4), # min loss reduction required for next split
          "learning_rate": uniform(0.01, 0.3), # default 0.1
          "max_depth": randint(2, 6), # default 6, controls model complexity and overfitting
          "n_estimators": randint(100, 150), # default 100
          "subsample": uniform(0.3, 0.6) # % of rows to use in training sample
}



# Hyperparameter tuning
rsearch = RandomizedSearchCV(model, param_distributions=params, random_state=42, n_iter=100, scoring="neg_mean_absolute_error", cv=3, verbose=1, n_jobs=1, return_train_score=True, error_score='raise')

# Fit model
rsearch.fit(X_train.toarray(), y_train.values)

@kevalshah90 kevalshah90 changed the title Jupyter kernel dies when upgrading xgboost v90 to > v1.0 Jupyter kernel dies when upgrading xgboost v90 to > v1.0 <= v1.3. Dec 13, 2021
@trivialfis
Copy link
Member

I can't run your script:

Traceback (most recent call last):
  File "/home/fis/Workspace/XGBoost/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'cat1'

@kevalshah90
Copy link
Author

kevalshah90 commented Dec 14, 2021

@trivialfis

Try this. This works for me.

import numpy as np
import pandas as pd
import seaborn as sns
import scipy as stats
import sklearn
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'
import random
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
import xgboost
from xgboost import XGBRegressor, plot_importance
from sklearn.model_selection import train_test_split, GridSearchCV, KFold, RandomizedSearchCV
from sklearn.metrics import mean_squared_error
from scipy.stats import uniform, randint
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler, OrdinalEncoder
import warnings

pd.set_option('use_inf_as_na', True)

print(xgboost.__version__)
# 0.90

# read data
# df = pd.read_csv('xxx.csv')

# make up some data
data = np.random.randint(5,50,size=(1000,9))

cats = ['cat1','cat2','cat3','cat4','cat5']

df = pd.DataFrame(data, columns=['col1', 'col2', 'col3', 'col4', 'col5','col6','col7','col8','col9'])

df['cat_var1'] = np.random.choice(cats, 1000)
df['cat_var2'] = np.random.choice(cats, 1000)

df['response'] = np.random.randint(5,50,1000)


# In[24]:


# split df into train and test
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,0:11], df.iloc[:,-1], test_size=0.1, random_state=42)


# In[26]:


# Encode categorical variables
cat_vars = ['cat_var1','cat_var2']
cat_transform = ColumnTransformer([('cat', OneHotEncoder(handle_unknown='ignore'), cat_vars)], remainder='passthrough')

encoder = cat_transform.fit(X_train)
X_train = encoder.transform(X_train)
X_test = encoder.transform(X_test)



# In[27]:


# Define a xgboost regression model
model = XGBRegressor(objective = 'reg:squarederror')


# In[216]:

params = {
          "colsample_bytree": uniform(0.6, 0.4), # fraction of cols to sample
          "gamma": uniform(0, 0.4), # min loss reduction required for next split
          "learning_rate": uniform(0.01, 0.3), # default 0.1
          "max_depth": randint(2, 6), # default 6, controls model complexity and overfitting
          "n_estimators": randint(100, 150), # default 100
          "subsample": uniform(0.3, 0.6) # % of rows to use in training sample
}



# Hyperparameter tuning
rsearch = RandomizedSearchCV(model, param_distributions=params, random_state=42, n_iter=100, scoring="neg_mean_absolute_error", cv=3, verbose=1, n_jobs=1, return_train_score=True, error_score='raise')

# Fit model
rsearch.fit(X_train, y_train)


# In[218]:


xgbest = rsearch.best_estimator_
xgbest


# In[219]:
y_pred = xgbest.predict(X_test.toarray())


# Obtain accuracy score - y_test - y_pred
mse = mean_squared_error(y_test, y_pred)
print(mse)


# In[221]:


# Mean absolute percentage error
def mean_absolute_percentage_error(y_true, y_pred):
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100


# In[222]:


mape = mean_absolute_percentage_error(y_test, y_pred)
"Mean Absolute Percentage Error {:,.1f}%".format(mape)

@trivialfis
Copy link
Member

Did you reproduce the crash with this script?

@kevalshah90
Copy link
Author

yes, I did. I upgraded to v1.1.0.

Screen Shot

@trivialfis
Copy link
Member

That's weird. I just ran 1.1:

1.1.1
Fitting 3 folds for each of 100 candidates, totalling 300 fits
183.75835586636694

@kevalshah90
Copy link
Author

Yeah, strange. I haven't been able to fit on > v0.90

@trivialfis
Copy link
Member

@hcho3 Is there any issue for macOS in old versions on top of your mind?

@kevalshah90
Copy link
Author

I am on MacOS Monterey v 12.0.1, Apple M1 2020 Chip

@trivialfis
Copy link
Member

Ah, I think xgboost doesn't work on m1. #7501

@kevalshah90
Copy link
Author

kevalshah90 commented Dec 14, 2021

v0.90 works though. Could you suggest any work-arounds? v0.90 is being deprecated by AWS SageMaker on 31st December. At that point, our model will no longer be able to serve. All of our training is done locally, so we need to make it work on local Jupyter Notebooks.

@kevalshah90
Copy link
Author

Any insight into what the issue is with version > 0.90? I guess what changed between the versions that makes it crash.

@trivialfis
Copy link
Member

I'm not sure. None of the maintainer has access to m1

@Craigacp
Copy link
Contributor

At a guess I'd say that 0.9 was compiled without x86 SIMD instructions (which aren't supported in Rosetta), and newer versions have been. So you're probably pulling in the x86 version, running that through Rosetta and then it crashes when it hits an unsupported SIMD instruction (either AVX, AVX2 or AVX-512). You may be able to recompile one of the other versions without SIMD instructions producing a wheel that will run in Rosetta on an M1. I'm not particularly familiar with how the Python builds are compiled though.

@trivialfis
Copy link
Member

Thank you for pointing that out. To install the Python package from source one can simply run

python setup.py install

Under the Python package directory. Hopefully we will see support from github action and build the wheel from there.

@kevalshah90
Copy link
Author

kevalshah90 commented Dec 19, 2021

@Craigacp @trivialfis Thanks.

I tried the following:

brew install gcc
brew install cmake

git clone --recursive https://github.com/dmlc/xgboost
cd xgboost/
make -j4
cd python-package
python3 setup.py install

from the discussion here: https://discuss.xgboost.ai/t/xgboost-on-apple-m1/2004/9

It didn't work for me. I am not familiar with the architecture - could you provide some pointers on how I could recompile one of the other versions without SIMD instructions and produce a wheel that will run in Rosetta on an M1?

@Craigacp
Copy link
Contributor

Did it fail during compilation, or at runtime? What kind of error did you get?

Are you running a native ARM version of Homebrew, or the x86 version via Rosetta?

@kevalshah90
Copy link
Author

It fails when I run the .fit method. I don't get an except, Jupyter Kernel dies in a second, everytime I attempt to run the statement. Here's a link to the screenshot:

#7504 (comment)

I tried to search for Homebrew in mac system report, however, I am not seeing it in the list under Software > Applications.
Not sure if this help, but when I run brew -v, here's the what's returned:

Homebrew 3.3.8
Homebrew/homebrew-core (git revision e7a27d318aa; last commit 2021-12-14)

How do I find what Homebrew version I am running?

@Craigacp
Copy link
Contributor

Assuming you installed it to the default locations then on ARM it's in /opt/homebrew and in x86 it's in /usr/local/ - https://docs.brew.sh/Installation.

To get a more helpful error message try running the fit command from a regular python interpreter rather than a notebook. Though the notebook server process might have more information in its log.

@kevalshah90
Copy link
Author

kevalshah90 commented Dec 20, 2021

@Craigacp

It's in /usr/local/Homebrew, so x86 i suppose.

I ran it in a .py, and here's what I get:

Fitting 5 folds for each of 50 candidates, totalling 250 fits
Segmentation fault: 11 

@kevalshah90 kevalshah90 changed the title Jupyter kernel dies when upgrading xgboost v90 to > v1.0 <= v1.3. Jupyter kernel dies / Segmentation fault : 11, when upgrading xgboost to > v0.90 on MacOS M1 Chip Dec 20, 2021
@Craigacp
Copy link
Contributor

Craigacp commented Dec 22, 2021

I have successfully compiled and installed XGBoost v1.3.3 on an M1 in Python, using a M1 native version of homebrew & python 3.9 installed from that homebrew. It compiled with no trouble, though I did compile the native library first (by doing mkdir build; cd build; cmake ..; make -j4 in the root of the XGBoost repo) before doing python setup.py install from the python_package directory. I did need to install numpy and scipy in the virtual environment first. I had some difficulty making scikit-learn work, looks like they don't have binaries available for M1, but I didn't try to figure that out.

I built a test regression using some randomly generated data from numpy on a squared loss and the xgb.train and predict(DMatrix) interfaces worked fine.

@kevalshah90
Copy link
Author

Thanks. @Craigacp

I replicated the steps, but it still didn't work for me. Maybe, I am missing something. Here's what I did exactly -

Installed the Apple M1 native homebrew version.

==> This script will install:
/opt/homebrew/bin/brew
/opt/homebrew/share/doc/homebrew
/opt/homebrew/share/man/man1/brew.1
/opt/homebrew/share/zsh/site-functions/_brew
/opt/homebrew/etc/bash_completion.d/brew
/opt/homebrew

Install python 3.9 using brew

brew install [email protected]
Python has been installed as
  /opt/homebrew/bin/python3

git clone --recursive https://github.com/dmlc/xgboost

In xgboost directory,


mkdir build; 
cd build; 
cmake ..; 
make -j4

cd python-package 

python setup.py install

Now, in my notebook:


!pip install xgboost==1.3.3
import xgboost

# Train test split

# Encoding

# Hyper-parameter optimization

# Fit model
model.fit(X_train.toarray(), y_train.values)

Kernel dead.

@Craigacp
Copy link
Contributor

Craigacp commented Dec 22, 2021

If you pip install xgboost then it will pull in the public one which doesn't have the right binary. This suggests that you aren't running jupyter from the ARM64 native python virtual environment that you installed the native ARM64 xgboost into. Try using that venv to run jupyter.

Unfortunately the Python ML ecosystem is only starting to catch up to the idea that there are different CPU architectures. TensorFlow sort of works, but pytorch doesn't (or at least it didn't the last time I checked), and scikit-learn doesn't have binaries available. Jupyter seems to work ok, but I've not tried matplotlib, seaborn or pandas. This is an ecosystem wide problem, much of which is predicated on the fact that most useful Python ML libraries are wrappers around native code, that there wasn't a fortran compiler available for macOS M1 to compile scipy with (that's now been fixed), and then that Github Actions doesn't provide M1 build resources. At the moment the M1 Macs are not suitable for data science or ML work unless you understand the differences in CPU architecture and how to build your environment from source. That will get fixed over time, but it's not as easy as it is on x86 Macs yet.

@kevalshah90
Copy link
Author

kevalshah90 commented Dec 23, 2021

Thanks, that makes sense. I created a virtual environment and launched jupyter from there. Now, when I run my the notebook, it the import xgboost as xgb works fine. However running:

model = xgb.XGBRegressor(objective = 'reg:squarederror')

throws an error.

----> 2 
model = xgb.XGBRegressor(objective = 'reg:squarederror')

AttributeError: module 'xgboost' has no attribute 'XGBRegressor'

@Craigacp
Copy link
Contributor

XGBRegressor and XGBClassifier are the scikit-learn interfaces, which require scikit-learn to be installed in the environment. You can use the train interface and supply a DMatrix & parameters without scikit-learn.

@kevalshah90
Copy link
Author

kevalshah90 commented Dec 23, 2021

got it. I am using a bunch of other interfaces from scikit-learn, so would be worthwhile to install it.

Looks like scikit-learn is already install in my virtual environment venv.

source venv/bin/activate
(venv) (base) user:folder user$ pip install scikit-learn
Requirement already satisfied: scikit-learn in /Applications/Anaconda/anaconda3/lib/python3.9/site-packages (1.0)
Requirement already satisfied: numpy>=1.14.6 in /Applications/Anaconda/anaconda3/lib/python3.9/site-packages (from scikit-learn) (1.20.3)
Requirement already satisfied: joblib>=0.11 in /Applications/Anaconda/anaconda3/lib/python3.9/site-packages (from scikit-learn) (1.1.0)
Requirement already satisfied: scipy>=1.1.0 in /Applications/Anaconda/anaconda3/lib/python3.9/site-packages (from scikit-learn) (1.7.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /Applications/Anaconda/anaconda3/lib/python3.9/site-packages (from scikit-learn) (2.2.0)

@Craigacp
Copy link
Contributor

I don't have any experience with using conda, and it looks like scikit-learn still has some difficulties installing on M1 scikit-learn/scikit-learn#19137.

@trivialfis
Copy link
Member

Thank you for the discussion. I will close this one as it duplicates #6408 . Maybe we can build the wheel using https://aws.amazon.com/about-aws/whats-new/2021/12/amazon-ec2-m1-mac-instances-macos/ @hcho3 Let's move the conversation to #6408 . ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants