Skip to content

While weight conversion of llama-13b getting this error: RuntimeError: Internal: unk is not defined. #22873

@Ahtesham00

Description

@Ahtesham00

System Info

OS : Ubunto

Virtual Env :

accelerate==0.18.0
certifi==2022.12.7
charset-normalizer==3.1.0
cmake==3.26.3
filelock==3.12.0
huggingface-hub==0.13.4
idna==3.4
Jinja2==3.1.2
lit==16.0.1
MarkupSafe==2.1.2
mpmath==1.3.0
networkx==3.1
numpy==1.24.2
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
packaging==23.1
psutil==5.9.5
PyYAML==6.0
regex==2023.3.23
requests==2.28.2
sentencepiece==0.1.98
sympy==1.11.1
tokenizers==0.13.3
torch==2.0.0
tqdm==4.65.0
transformers==4.28.1
triton==2.0.0
typing_extensions==4.5.0
urllib3==1.26.15

Who can help?

@ArthurZucker
@younesbelkada

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Used following command to convert llama-13 weights into hf.

python src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir /home/unconveretd-weights --model_size 13B --output_dir /home/test-converted

Expected behavior

It should generated the converted weights. But instead it is generating this error

Loading the checkpoint in a Llama model.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 41/41 [00:17<00:00, 2.35it/s]
Saving in the Transformers format.
Saving a LlamaTokenizerFast to /home/test-converted.
Traceback (most recent call last):
File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 278, in
main()
File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 274, in main
write_tokenizer(args.output_dir, spm_path)
File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 248, in write_tokenizer
tokenizer = tokenizer_class(input_tokenizer_path)
File "/home/myenv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 89, in init
super().init(
File "/home/myenv/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 117, in init
slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)
File "/home/myenv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in init
self.sp_model.Load(vocab_file)
File "/home/myenv/lib/python3.10/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/home/myenv/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: unk is not defined.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions