Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No local packages or working download links found for tokenizers==0.12.1.dev0 #1036

Closed
KMFODA opened this issue Jul 29, 2022 · 8 comments
Closed

Comments

@KMFODA
Copy link

KMFODA commented Jul 29, 2022

I get the following Error when installing tokenisers from source (as I'm on Macbook's M1 so can't install using pip I believe):

Copying rust artifact from /Users/karimfoda/Documents/STUDIES/PYTHON/SHORTFORM/tokenizers/bindings/python/target/release/libtokenizers.dylib to build/lib.macosx-11.1-arm64-cpython-310/tokenizers/tokenizers.cpython-310-darwin.so
creating build/bdist.macosx-11.1-arm64/egg
creating build/bdist.macosx-11.1-arm64/egg/tokenizers
creating build/bdist.macosx-11.1-arm64/egg/tokenizers/normalizers
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/normalizers/__init__.pyi -> build/bdist.macosx-11.1-arm64/egg/tokenizers/normalizers
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/normalizers/__init__.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/normalizers
creating build/bdist.macosx-11.1-arm64/egg/tokenizers/tools
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/tools/visualizer-styles.css -> build/bdist.macosx-11.1-arm64/egg/tokenizers/tools
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/tools/__init__.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/tools
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/tools/visualizer.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/tools
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/__init__.pyi -> build/bdist.macosx-11.1-arm64/egg/tokenizers
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/__init__.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers
creating build/bdist.macosx-11.1-arm64/egg/tokenizers/models
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/models/__init__.pyi -> build/bdist.macosx-11.1-arm64/egg/tokenizers/models
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/models/__init__.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/models
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/tokenizers.cpython-310-darwin.so -> build/bdist.macosx-11.1-arm64/egg/tokenizers
creating build/bdist.macosx-11.1-arm64/egg/tokenizers/trainers
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/trainers/__init__.pyi -> build/bdist.macosx-11.1-arm64/egg/tokenizers/trainers
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/trainers/__init__.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/trainers
creating build/bdist.macosx-11.1-arm64/egg/tokenizers/processors
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/processors/__init__.pyi -> build/bdist.macosx-11.1-arm64/egg/tokenizers/processors
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/processors/__init__.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/processors
creating build/bdist.macosx-11.1-arm64/egg/tokenizers/decoders
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/decoders/__init__.pyi -> build/bdist.macosx-11.1-arm64/egg/tokenizers/decoders
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/decoders/__init__.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/decoders
creating build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/implementations/byte_level_bpe.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/implementations/sentencepiece_unigram.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/implementations/sentencepiece_bpe.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/implementations/base_tokenizer.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/implementations/__init__.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/implementations/char_level_bpe.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/implementations/bert_wordpiece.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations
creating build/bdist.macosx-11.1-arm64/egg/tokenizers/pre_tokenizers
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/pre_tokenizers/__init__.pyi -> build/bdist.macosx-11.1-arm64/egg/tokenizers/pre_tokenizers
copying build/lib.macosx-11.1-arm64-cpython-310/tokenizers/pre_tokenizers/__init__.py -> build/bdist.macosx-11.1-arm64/egg/tokenizers/pre_tokenizers
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/normalizers/__init__.py to __init__.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/tools/__init__.py to __init__.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/tools/visualizer.py to visualizer.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/__init__.py to __init__.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/models/__init__.py to __init__.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/trainers/__init__.py to __init__.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/processors/__init__.py to __init__.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/decoders/__init__.py to __init__.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations/byte_level_bpe.py to byte_level_bpe.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations/sentencepiece_unigram.py to sentencepiece_unigram.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations/sentencepiece_bpe.py to sentencepiece_bpe.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations/base_tokenizer.py to base_tokenizer.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations/__init__.py to __init__.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations/char_level_bpe.py to char_level_bpe.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/implementations/bert_wordpiece.py to bert_wordpiece.cpython-310.pyc
byte-compiling build/bdist.macosx-11.1-arm64/egg/tokenizers/pre_tokenizers/__init__.py to __init__.cpython-310.pyc
creating build/bdist.macosx-11.1-arm64/egg/EGG-INFO
copying py_src/tokenizers.egg-info/PKG-INFO -> build/bdist.macosx-11.1-arm64/egg/EGG-INFO
copying py_src/tokenizers.egg-info/SOURCES.txt -> build/bdist.macosx-11.1-arm64/egg/EGG-INFO
copying py_src/tokenizers.egg-info/dependency_links.txt -> build/bdist.macosx-11.1-arm64/egg/EGG-INFO
copying py_src/tokenizers.egg-info/not-zip-safe -> build/bdist.macosx-11.1-arm64/egg/EGG-INFO
copying py_src/tokenizers.egg-info/requires.txt -> build/bdist.macosx-11.1-arm64/egg/EGG-INFO
copying py_src/tokenizers.egg-info/top_level.txt -> build/bdist.macosx-11.1-arm64/egg/EGG-INFO
writing build/bdist.macosx-11.1-arm64/egg/EGG-INFO/native_libs.txt
creating 'dist/tokenizers-0.12.1.dev0-py3.10-macosx-11.1-arm64.egg' and adding 'build/bdist.macosx-11.1-arm64/egg' to it
removing 'build/bdist.macosx-11.1-arm64/egg' (and everything under it)
Processing tokenizers-0.12.1.dev0-py3.10-macosx-11.1-arm64.egg
removing '/opt/homebrew/Caskroom/miniforge/base/envs/shortform/lib/python3.10/site-packages/tokenizers-0.12.1.dev0-py3.10-macosx-11.1-arm64.egg' (and everything under it)
creating /opt/homebrew/Caskroom/miniforge/base/envs/shortform/lib/python3.10/site-packages/tokenizers-0.12.1.dev0-py3.10-macosx-11.1-arm64.egg
Extracting tokenizers-0.12.1.dev0-py3.10-macosx-11.1-arm64.egg to /opt/homebrew/Caskroom/miniforge/base/envs/shortform/lib/python3.10/site-packages
tokenizers 0.12.1.dev0 is already the active version in easy-install.pth

Installed /opt/homebrew/Caskroom/miniforge/base/envs/shortform/lib/python3.10/site-packages/tokenizers-0.12.1.dev0-py3.10-macosx-11.1-arm64.egg
Processing dependencies for tokenizers==0.12.1.dev0
Searching for tokenizers==0.12.1.dev0
Reading https://pypi.org/simple/tokenizers/
/opt/homebrew/Caskroom/miniforge/base/envs/shortform/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning:  is an invalid version and will not be supported in a future release
  warnings.warn(
No local packages or working download links found for tokenizers==0.12.1.dev0
error: Could not find suitable distribution for Requirement.parse('tokenizers==0.12.1.dev0')

python-version: python 3.10

@Narsil
Copy link
Collaborator

Narsil commented Jul 29, 2022

@McPatate is working on enabling prebuilt packages.

But currently they don't exist afaik.

You can build the python package using

cd bindings/python
pip install setuptools_rust
pip install -e .

(You also need to have the rust compiler installed https://www.rust-lang.org/learn/get-started) and it should work.

@McPatate
Copy link
Member

Should get something running in the coming weeks :)

@KMFODA
Copy link
Author

KMFODA commented Jul 29, 2022

Ah I see thanks for the heads up! When I try building the python package using the code you supplied I'm once again faced with the Mac M1 error:

RuntimeError: Failed to import transformers.models.auto because of the following error (look up to see its traceback):
dlopen(/Users/karimfoda/Documents/STUDIES/PYTHON/SHORTFORM/tokenizers/bindings/python/py_src/tokenizers/tokenizers.cpython-310-darwin.so, 0x0002): tried: '/Users/karimfoda/Documents/STUDIES/PYTHON/SHORTFORM/tokenizers/bindings/python/py_src/tokenizers/tokenizers.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))

From #712 I gathered that building the package locally should prevent this issue. Do you by any chance know why I might be seeing this error?

@Narsil
Copy link
Collaborator

Narsil commented Aug 1, 2022

Sorry I am not an expert in Mac ARM since I don't own one of those beasts.

I do know some people were able to build it though.

Letting better informed ppl chime in.

@McPatate
Copy link
Member

McPatate commented Aug 8, 2022

@KMFODA you are using an .so that was built for x86 arch.

I had to do some ninja tricks to get it to work :

python3 -m pip install setuptools_rust
git clone [email protected]:huggingface/tokenizers.git
cd tokenizers/bindings/python
python3 setup.py install
python3 -m pip install transformers
rm -rf /path/to/venv/lib/python3.x/site-packages/tokenizers
cp -R /path/to/venv/lib/python3.x/site-packages/tokenizers-x.x.x-py3.x-macosx-11-arm64.egg/tokenizers /path/to/venv/lib/python3.x/site-packages/tokenizers

@KMFODA
Copy link
Author

KMFODA commented Oct 3, 2022

Thanks @McPatate really appreciate the wizardry. Sorry for only trying this now. I just tried following all the steps and I get blocked on the last step as there is no folder following the format tokenizers-x.x.x-py3.x-macosx-11-arm64.egg. I just have tokenisers and tokenizers-0.12.1.dist-info in my /path/to/venv/lib/python3.x/site-packages/ folder.

@McPatate
Copy link
Member

McPatate commented Oct 5, 2022

Would using 0.13.0 work for you ? We now have pre-built macos arm binaries, cf #712 (comment)

@KMFODA
Copy link
Author

KMFODA commented Oct 6, 2022

amazing 0.13.0 worked like a charm! Thanks for all the hard work on this. Just have to get other libraries to update their tokenisers < 0.13 dependency now and I'm all set on my M1 MacBook. Thanks again! Will close this now as I believe we've resolved the main issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants