-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preprocess_dataset dataset.map crashed with TypeError: cannot pickle 'builtins.CoreBPE' object #328
Labels
solved
This problem has been already solved
Comments
use |
This problem is related to the tiktoken tokenizer, looks like you are using the Qwen-7B model. Related issue: huggingface/datasets#5536 huggingface/datasets#5769 |
@hiyouga Thanks. |
hiyouga
added
solved
This problem has been already solved
and removed
pending
This problem is yet to be addressed
labels
Aug 3, 2023
Closed
使用替代方案GPT2Tokenizer支持多线程:https://huggingface.co/vonjack/Qwen-LLaMAfied-HFTok-7B-Chat/tree/main |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
During
preprocess_dataset
mapping the dataset, it crashed. Could you please give some advice?The text was updated successfully, but these errors were encountered: