Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you provide an up to date version of tokenizers please? #504

Closed
ghost opened this issue Jun 9, 2021 · 10 comments
Closed

Can you provide an up to date version of tokenizers please? #504

ghost opened this issue Jun 9, 2021 · 10 comments

Comments

@ghost
Copy link

ghost commented Jun 9, 2021

I need an up to date version for my project of this module, only 0.7.0 is available in the package repository. And also I want to thank you for building Chaquopy, it's really amazing and useful!

@mhsmith
Copy link
Member

mhsmith commented Jun 9, 2021

Why specifically do you need a newer version?

@ghost
Copy link
Author

ghost commented Jun 9, 2021

@mhsmith Because the latest versions of transformers require a newer version of tokenizers

@mhsmith
Copy link
Member

mhsmith commented Jun 9, 2021

OK, we won't be able to provide an update right now, but please subscribe to this issue for updates. And if anyone else needs the same thing, please click the thumbs up button above.

Meanwhile, perhaps you can use an older version, "transformers==2.11.0".

@ghost
Copy link
Author

ghost commented Jun 10, 2021

I did try to use an older version but I get errors, and these errors are fixed with the newest versions of transformers. Anyway, I'll just search for an alternative meanwhile and wait till you update the package. Thanks for the help Malcolm!

@mhsmith
Copy link
Member

mhsmith commented Jun 10, 2021

Thanks for letting me know. Could you post the errors here, with some example code which causes them?

@ghost
Copy link
Author

ghost commented Jun 10, 2021

@mhsmith
When I use transformers==2.11.0 and tokenizers==0.7.0 this error appears:

W/python.stderr(4135): /data/user/0/com.example.worklow/files/chaquopy/AssetFinder/requirements/joblib/_multiprocessing_helpers.py:53: UserWarning: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.. joblib will operate in serial mode

And when I use the latest version of transformers along with tokenizers==0.7.0, this error appears:
tokenizers>=0.10.1,<0.11 is required for a normal functioning of this module, but found tokenizers==0.7.0.
Try: pip install transformers -U or pip install -e '.[dev]' if you're working with git master
(I already tried to install the develop branch, still the same error appears)

@ghost
Copy link
Author

ghost commented Jun 10, 2021

@mhsmith
I already tried to force joblib to use threading instead of multiprocessing by modifying the code but still doesn't work (it doesn't throw an error, unlike when using the normal joblib).

I also tried to use this modified version which is supposed to provide support for joblib for platforms that don't support sem: https://github.com/jrgriffiniii/joblib/tree/issues-825-jrgriffiniii-no-sem-support

But doesn't work either.

So my conclusion is the easiest and fastest way is to use an up to date version of tokenizers.

@mhsmith
Copy link
Member

mhsmith commented Jun 11, 2021

UserWarning: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.. joblib will operate in serial mode

That's only a warning, not an error. It will be shown by anything that uses joblib, so it probably wouldn't be fixed by the current version of transformers or tokenizers. But it shouldn't stop your code from working. If you had any other problems then please explain exactly what happened.

@ghost
Copy link
Author

ghost commented Jun 11, 2021

Okay I think I know why it isn't working. Also, thanks for explaining the joblib thing, I am not very familiar with it. I tought it was an actual error. I'll leave the issue opened just in case anyone needs the tokenizers module up to date for their project. Really appreciate that you took the time to help me!

@mhsmith
Copy link
Member

mhsmith commented Jan 17, 2022

Tokenizers version 0.10.3 is now available. This is compatible with the current version of Transformers (4.15.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant