Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong architecture tokenizers.cpython-39-darwin.so (x86_64) when installing on apple silicon (arm64) #712

Closed
hkennyv opened this issue May 24, 2021 · 20 comments

Comments

@hkennyv
Copy link

hkennyv commented May 24, 2021

Hey there,

I just wanted to share an issue I came by when trying to get the transformers quick tour example working on my machine.

It seems like, currently, installing tokenizers via pypi builds or bundles the tokenizers.cpython-39-darwin.so automatically for x86_64 instead of arm64 for users with apple silicon m1 computers.

System Info: Macbook Air M1 2020 with Mac OS 11.0.1

To reproduce:

  1. create virtualenv virtualenv venv-bad and activate it source venv-bad/bin/activate

  2. install pytorch (easiest way i've found so far on arm64 is to install nightly via pip) pip install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

  3. install transformers (tokenizers will be installed as a dependency) pip install transformers

  4. create a file with quick tour example:

main.py

from transformers import pipeline
classifier = pipeline('sentiment-analysis')

classifier('We are very happy to show you the 🤗 Transformers library.')
  1. try running quick tour example

Results in error:

ImportError: dlopen(/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so, 2): no suitable image found.  Did find:
        /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
        /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
Full stacktrace
(venv-bad) khuynh@kmba:test ‹main*›$ python main.py
Traceback (most recent call last):
  File "/Users/khuynh/me/test/temp.py", line 5, in 
    from transformers import pipeline
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/__init__.py", line 2709, in __getattr__
    return super().__getattr__(name)
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/file_utils.py", line 1821, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/__init__.py", line 2703, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/opt/homebrew/Cellar/[email protected]/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/pipelines/__init__.py", line 25, in 
    from ..models.auto.configuration_auto import AutoConfig
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/__init__.py", line 19, in 
    from . import (
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/layoutlm/__init__.py", line 23, in 
    from .tokenization_layoutlm import LayoutLMTokenizer
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/layoutlm/tokenization_layoutlm.py", line 19, in 
    from ..bert.tokenization_bert import BertTokenizer
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/bert/tokenization_bert.py", line 23, in 
    from ...tokenization_utils import PreTrainedTokenizer, _is_control, _is_punctuation, _is_whitespace
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 26, in 
    from .tokenization_utils_base import (
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 69, in 
    from tokenizers import AddedToken
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/__init__.py", line 79, in 
    from .tokenizers import (
ImportError: dlopen(/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so, 2): no suitable image found.  Did find:
        /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
        /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture

Looking at the architecture of the shared lib using find, we can see it's a dynamically linked x86_64 library

(venv-bad) khuynh@kmba:test ‹main*›$ file /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so
/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64

Solution:

The solution I found requires installing the rust toolchain on your machine and installing the tokenizers module from source so I think this is best as a temporary solution. I already have the rust nightly toolchain installed on my machine, so that's what I used. Otherwise, instructions for installing are here.

  1. clone tokenizers
git clone [email protected]:huggingface/tokenizers.git
  1. cd tokenizers/bindings/python
  2. install tokenizers, python setup.py install
  3. now go back and successfully re-run the transformers quick tour

We can also now see that the shared library is the proper architecture using file:

(venv-bad) khuynh@kmba:test ‹main*›$ file /Users/khuynh/me/test/venv2/lib/python3.9/site-packages/tokenizers-0.10.2-py3.9-macosx-11-arm64.egg/tokenizers/tokenizers.cpython-39-darwin.so
/Users/khuynh/me/test/venv2/lib/python3.9/site-packages/tokenizers-0.10.2-py3.9-macosx-11-arm64.egg/tokenizers/tokenizers.cpython-39-darwin.so: Mach-O 64-bit dynamically linked shared library arm64

I'm not super well versed in setuptools, so I'm not sure best way to fix this. Maybe release a different pre-built shared tokenizers.cpython-39-darwin.so for arm64 users? I'd be happy to help if needed.

@hkennyv hkennyv changed the title wrong architecture tokenizers.cpython-39-darwin.so when installing on apple silicon wrong architecture tokenizers.cpython-39-darwin.so (x86_64) when installing on apple silicon (arm64) May 24, 2021
@n1t0
Copy link
Member

n1t0 commented May 24, 2021

Hi @hkennyv and thank you for reporting this.

We don't build wheels for Apple Silicon at the moment because there is no environment for this on our Github CI. (cf actions/runner-images#2187). The only way to have it working is, as you mentioned, to build it yourself. We'll add support for this as soon as it is available!

@hkennyv
Copy link
Author

hkennyv commented May 25, 2021

@n1t0 thanks for the response & explanation! i've +1'd the issue you linked to (hopefully) help :)

@McPatate
Copy link
Member

McPatate commented Mar 1, 2022

Hi there !

I've manually build binaries for tokenizers on arm m1 and released them for tokenizers 0.11.6.

We'll try our best to keep building those by hand while waiting for actions/runner#805.

Expect some delay between normal releases and m1 releases for now :)

Have a great day !

@etan18
Copy link

etan18 commented Jun 30, 2022

I followed the manual build instructions from the solution of the original comment, but am getting the error

RuntimeError: Failed to import transformers.models.camembert.configuration_camembert because of the following error (look up to see its traceback): partially initialized module 'tokenizers.pre_tokenizers' has no attribute 'PreTokenizer' (most likely due to a circular import)

I am trying to run AutoModelForTokenClassification

@vi3itor
Copy link

vi3itor commented Jul 1, 2022

Hi @McPatate, thanks for building the bindings manually! Two months after your post, there was an announcement about pre-release version of the macOS-ARM64 runner. Will it make things easier?

@n1t0 you can also track the recent roadmap issue github/roadmap#528.

@WALEX2000
Copy link

WALEX2000 commented Sep 14, 2022

I'm having the same issue. After running:
pip install tokenizers
My machine builds the wheel, but for some reason it's always x86_64 architecture
I'm installing the latest version, should I try an earlier one?

Full output:
Collecting tokenizers Downloading tokenizers-0.12.1.tar.gz (220 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 220.7/220.7 kB 1.4 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: tokenizers Building wheel for tokenizers (pyproject.toml) ... done Created wheel for tokenizers: filename=tokenizers-0.12.1-cp310-cp310-macosx_12_0_arm64.whl size=3760213 sha256=885cf11eb9f1fbd1a6be3366f2d5d8a7591890b96ed84a3121cc6bcd66be938a Stored in directory: /private/var/folders/k_/szxh8w4n0hl32b_j8dkxl76h0000gn/T/pip-ephem-wheel-cache-8p1jggwq/wheels/bd/22/bc/fa8337ce1ccf384c8fc4c1dbfa9cb1687934c0f24719082d49 Successfully built tokenizers Installing collected packages: tokenizers Successfully installed tokenizers-0.12.1 (ldm) alexandrecarqueja@MacBook-Pro stable-diffusion % file /opt/miniconda3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so /opt/miniconda3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64

@McPatate
Copy link
Member

@WALEX2000 I'm not sure we have arm binaries for 0.12.1, we've been working on the CI with self-hosted runners but I'm unsure where we're at atm.

Maybe @Narsil can chime in :)

@thetonus
Copy link

I have followed the instructions to build from source, and I still see the library be x86_64 compiled.

I cloned the repo, made sure the Python environment is configured for shared library, and ran python setup.py install.

tokenizers was installed in the virtual environment.

Ran the following command to check the built compiled lib.

file .venv/lib/python3.10/site-packages/tokenizers-0.13.0.dev0-py3.10-macosx-12.2-arm64.egg/tokenizers/tokenizers.cpython-310-darwin.so

Output:

.venv/lib/python3.10/site-packages/tokenizers-0.13.0.dev0-py3.10-macosx-12.2-arm64.egg/tokenizers/tokenizers.cpython-310-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64

I do not understand why it is not compiling for the correct target.

I am on a M1 Macbook pro. Python version is 3.10.7. Cargo version is 1.63.0.

@spullara
Copy link

spullara commented Sep 21, 2022

I am on a M1 Macbook pro. Python version is 3.10.7. Cargo version is 1.63.0.

I ran into this as well. It turned out that I was using the brew installed rust rather than the rustup one. Try which rustc to make sure it is coming from the ~/.cargo directory.

@thetonus
Copy link

@spullara I did. It was the rustup one and not Brew.

@spullara
Copy link

spullara commented Sep 23, 2022

It may also be defaulting to the wrong toolchain. You might also try setting the default toolchain with

rustup default stable-aarch64-apple-darwin

I think I also had to delete rust-toolchain as when it was present it would change to the x86_64 toolchain. You can check to make sure the right one is selected with

rustup toolchain list

Edit: I was able to fix the rust-toolchain issue by doing

rustup set default-host aarch64-apple-darwin

@thetonus
Copy link

@spullara I ran rustup toolchain list and the output is as follows:

stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin

@Narsil
Copy link
Collaborator

Narsil commented Sep 27, 2022

tokenizers==0.13.0 should now be built automatically for M1.

The errors you are seeing are super odd indeed, are you running into some sort of compatibility mode ?
I asked around other users using M1 and no one had the issue you were seeing :(

Could you try and check the rust install is OK by running cargo test within tokenizers/tokenizers/ directory for instance ?

@spullara
Copy link

@spullara I ran rustup toolchain list and the output is as follows:

stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin

Did you run this in the tokenizers/bindings/python directory?

@thetonus
Copy link

thetonus commented Oct 4, 2022

I get this when I run it in tokenizers/bindings/python:

stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin (override)

@spullara
Copy link

spullara commented Oct 5, 2022

I get this when I run it in tokenizers/bindings/python:

stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin (override)

That means you need to this command I had to do to change the default host:

rustup set default-host aarch64-apple-darwin

@thetonus
Copy link

thetonus commented Oct 5, 2022

Thanks.

@bolducp
Copy link

bolducp commented Jul 21, 2023

@hkennyv thank you so much for this! It's July 2023, and following your instructions for the tokenizers (and the same thing for safetensors) was the only way I could get the huggingface dependencies I needed all running.

Does anyone know if there's a better way yet that I couldn't find?

@Narsil
Copy link
Collaborator

Narsil commented Jul 25, 2023

You're running a too old Python version (or too new).
Thatś the only reason for needing to build from source, everything else should be prebuilt.

@Narsil Narsil closed this as completed Jul 25, 2023
@ppiyush28
Copy link

Hi @hkennyv

Is there any permanent solution to install tokenizer in MAC M1 pro ??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests