Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. #22222

Closed
4 tasks
candowu opened this issue Mar 17, 2023 · 50 comments
Closed
4 tasks

Comments

@candowu
Copy link

candowu commented Mar 17, 2023

System Info

4.27.1

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

i test llama in colab here is my code and output:

!pip install git+https://github.com/huggingface/transformers
!pip install sentencepiece

import torch
from transformers import pipeline,LlamaTokenizer,LlamaForCausalLM

device = "cuda:0" if torch.cuda.is_available() else "cpu"
print(device)

tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")

model = LlamaForCausalLM.from_pretrained("decapoda-research/llama-7b-hf")

generator = pipeline(model="decapoda-research/llama-7b-hf", device=device)
generator("I can't believe you did such a ")

ValueError Traceback (most recent call last)
in
7 # tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")
8 # model = LlamaForCausalLM.from_pretrained("decapoda-research/llama-7b-hf")
----> 9 generator = pipeline(model="decapoda-research/llama-7b-hf", device=device)
10 generator("I can't believe you did such a ")

1 frames
/usr/local/lib/python3.9/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
675
676 if tokenizer_class is None:
--> 677 raise ValueError(
678 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
679 )

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

Expected behavior

expect output generated info

@yhifny
Copy link

yhifny commented Mar 17, 2023

I face the same issue

@amyeroberts
Copy link
Collaborator

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.

This is likely due to the configuration files being created before the final PR was merged in.

@yhifny
Copy link

yhifny commented Mar 17, 2023

I cloned the repo and changed the tokenizer in the config file to LlamaTokenizer
but I got
ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.

@mbehm
Copy link

mbehm commented Mar 17, 2023

For anybody interested I was able to load an earlier saved model with the same issue using my fork with the capitalization restored. That being said for future it's probably better to try find or save a new model with the new naming.

@amyeroberts
Copy link
Collaborator

@yhifny Are you able to import the tokenizer directly using from transformers import LlamaTokenizer ?

If not, can you make sure that you are working from the development branch in your environment using:
pip install git+https://github.com/huggingface/transformers

more details here.

@nadahlberg
Copy link
Contributor

I can import the LlamaTokenizer class, but getting error that from_pretrained method is None. Anyone else having this issue?

@sgugger
Copy link
Collaborator

sgugger commented Mar 17, 2023

As the error message probably mentions, you need to install sentencepiece: pip install sentencepiece.

@nadahlberg
Copy link
Contributor

Working now. I swear I had sentencepiece, but probably forgot to reset the runtime 🤦 My bad!

@xhinker
Copy link

xhinker commented Mar 17, 2023

For anybody interested I was able to load an earlier saved model with the same issue using my fork with the capitalization restored. That being said for future it's probably better to try find or save a new model with the new naming.

Thanks, man, your link solved all the problem

@nameless0704
Copy link

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.

This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

@vdattwani2005
Copy link

For anybody interested I was able to load an earlier saved model with the same issue using my fork with the capitalization restored. That being said for future it's probably better to try find or save a new model with the new naming.

Thank you so much for this! Works!

@sarrahbbh
Copy link

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.
This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

I assume this is applied to the llama-7b cloned repo from HuggingFace right? How can I instantiate the model and the tokenizer after doing that please?

@thekevshow
Copy link

you are a life saver. There docs on the site should be updated for this reference.

@RiseInRose
Copy link

Thank you so much for this! Works! That's amazing!

@alvations
Copy link

alvations commented Apr 2, 2023

You can try this for a ather crazy way to find out what is the right casing for the module:

import transformers

from itertools import product
import importlib

def find_variable_case(s, max_tries=1000):
  var_permutations = list(map("".join, product(*zip(s.upper(), s.lower()))))
  # Intuitively, any camel casing should minimize the no. of upper chars.
  # From https://stackoverflow.com/a/58789587/610569
  var_permutations.sort(key=lambda ss: (sum(map(str.isupper, ss)), len(ss)))
  for i, v in enumerate(var_permutations):
    if i > max_tries:
      return
    try:
      dir(transformers).index(v)
      return v
    except:
      continue


v = find_variable_case('LLaMatokenizer')
exec(f"from transformers import {v}")
vars()[v]

[out]:

transformers.utils.dummy_sentencepiece_objects.LlamaTokenizer

@FatCache
Copy link

FatCache commented Apr 2, 2023

I encountered the same issue identified at the thread today 4/2/2023. The post #22222 (comment) fixed the problem for me.

Thank you.

@qufy6
Copy link

qufy6 commented Apr 12, 2023

Hi! I am facing the same problem. I try to import LlamaTokenizer,
But:---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[27], line 1
----> 1 from transformers import LlamaTokenizer

ImportError: cannot import name 'LlamaTokenizer' from 'transformers' (/usr/local/anaconda3/envs/abc/lib/python3.10/site-packages/transformers/init.py)

and the version of transformers is "transformers 4.28.0.dev0 pypi_0 pypi"

plz tell me how to fix it.

@sgugger
Copy link
Collaborator

sgugger commented Apr 12, 2023

You need to install the library from source to be able to use the LLaMA model.

@sarrahbbh
Copy link

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.
This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

Can you please enlighten me on how this could be achieved please? I'm new to this

@zhanghanghitomi
Copy link

zhanghanghitomi commented Jun 13, 2023

Hi, @nameless0704. First, I would like to thank you for the insightful comment of changing the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer . I am fairly new in this area. May I ask how to Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer?

I could not figure it out and would like to seek your help. Any information is appreciated. Thank you very much in advance!

@SupritYoung
Copy link

I share a experiment, Just replace your llama model to https://huggingface.co/elinas/llama-7b-hf-transformers-4.29 will solve the error like ImportError: cannot import name 'LLaMATokenizer' from 'transformers'

@JessicaLopezEspejel
Copy link

JessicaLopezEspejel commented Jun 26, 2023

Example of how to use LLaMA AutoTokenizer

!pip install tokenizers==0.13.3
!pip install sentencepiece

from transformers import AutoTokenizer, AutoModelForCausalLM

# model_name = "openlm-research/open_llama_3b"
# model_name = "openlm-research/open_llama_7b"
model_name = "openlm-research/open_llama_13b"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

@MasterLivens
Copy link

@MasterLivens hi, i am currently using colab, which file should i add this code?

the specified error file in the error message.

@AkshayVerma26
Copy link

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.
This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

Where is the tokenizer_config.json?

@sifei
Copy link

sifei commented Jul 7, 2023

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.
This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

Where is the tokenizer_config.json?

I think this is the location:
.cache/huggingface/hub/models--decapoda-research--llama-65b-hf/snapshots/47d2b93e8c0a3d5d6582bdec13f233ca0527499a/tokenizer_config.json

@SanjayKotabagi
Copy link

Please. Im facing the same issue. Can anyone help ? I tried all the above methods.

@PawelFaron
Copy link

Please. Im facing the same issue. Can anyone help ? I tried all the above methods.

I had the same issue and it was solved by:
pip uninstall transformers
pip install transformers

@Nayahei
Copy link

Nayahei commented Jul 26, 2023

in my code, transformer==4.30.0 can fix it

@calam1
Copy link

calam1 commented Jul 26, 2023

looked at tokenization_auto.py in the transformers package that was installed via pip install git+https://github.com/huggingface/transformers

 (
                "llama",
                (
                    "LlamaTokenizer" if is_sentencepiece_available() else None,
                    "LlamaTokenizerFast" if is_tokenizers_available() else None,
                ),
            ),

I had to install sentencepiece to bypass the not found error, running into other errors though :)

@alexovai
Copy link

I had similar issues. root cause was that I was using python3.7 where pip install transformers was older version 4.18. Once I upgraded to python3.9 pip3 install transformers install 4.30.0

Also make sure you have recent cuda drivers, running only on CPU was very slow for me.

@youshikyou
Copy link

youshikyou commented Aug 23, 2023

Hi, I still have the error. I tried all the solutions above....
Screenshot 2023-08-23 at 22 34 24

@ndvbd
Copy link

ndvbd commented Sep 2, 2023

Do we need to load the tokenizer using LlamaTokenizer or can we use it with AutoTokenizer? If the latter is possible, what is the fully qualified tokenizer model on the hub? I get:
OSError: LLamaTokenizer is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
Where is the tokenizer_config.json stored?

@ArthurZucker
Copy link
Collaborator

@ndvbd you should be able to use AutoTokenizer with any tokenizers on the hub.
If you have an issue and want us to help you, we really need a small reproducer, and the full traceback.

For anyone still getting the same error:

  • make sure you are using the correct version of transformers. print(transformers.__version__)
  • if you are working on a notebook cc @youshikyou make sure to restart the kernel after you have installed the packages, to make sure the changes are taken into account.
  • make sure the repository you are loading from (for example meta-llama/Llama-2-7b-hg) has the correct LlamaTokenizer class if you are using AutoModel.

If you have a different issue, make sure to open a new issue and ping me 🤗

@PGTBoos
Copy link

PGTBoos commented Sep 18, 2023

same error on model codellama/CodeLlama-13b-hf
can onyone post a valid json config here ?

@ArthurZucker
Copy link
Collaborator

No, the json is valid, you are just not working on main or at least the release that included CodeLlamaTokenizer... (https://github.com/huggingface/transformers/releases/tag/v4.33.1)

@dhruvsinha
Copy link

I got the same error. I read that we have to edit 'tokenizer_config.json'. I did that on huggingface by changing the json file available in 'Files and versions' of the 'decapoda-research/llama-7b-hf', but I am not sure how to use the edited changes. This is my code:

`import torch
import transformers
from transformers import AutoTokenizer
from langchain import LLMChain, HuggingFacePipeline, PromptTemplate

print("Import Complete")

model = "meta-llama/Llama-2-7b-hf
access_token = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
tokenizer = AutoTokenizer.from_pretrained(model, use_auth_token=access_token)`

I get the same error- ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

@ArthurZucker
Copy link
Collaborator

You should use model = "path/to/your/decapoda/modifided/model

@k3ybladewielder
Copy link

k3ybladewielder commented Feb 24, 2024

Any solution to this error ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported. ?

@mahsan-py
Copy link

Install this library
pip install -U transformers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.