While weight conversion of llama-13b getting this error: RuntimeError: Internal: unk is not defined.

### System Info

OS : Ubunto

Virtual Env :

**accelerate==0.18.0
certifi==2022.12.7
charset-normalizer==3.1.0
cmake==3.26.3
filelock==3.12.0
huggingface-hub==0.13.4
idna==3.4
Jinja2==3.1.2
lit==16.0.1
MarkupSafe==2.1.2
mpmath==1.3.0
networkx==3.1
numpy==1.24.2
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
packaging==23.1
psutil==5.9.5
PyYAML==6.0
regex==2023.3.23
requests==2.28.2
sentencepiece==0.1.98
sympy==1.11.1
tokenizers==0.13.3
torch==2.0.0
tqdm==4.65.0
transformers==4.28.1
triton==2.0.0
typing_extensions==4.5.0
urllib3==1.26.15**

### Who can help?

@ArthurZucker
@younesbelkada

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Used following command to convert llama-13 weights into hf.

`python src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir /home/unconveretd-weights --model_size 13B --output_dir /home/test-converted`

### Expected behavior

**It should generated the converted weights. But instead it is generating this error**

Loading the checkpoint in a Llama model.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 41/41 [00:17<00:00,  2.35it/s]
Saving in the Transformers format.
Saving a LlamaTokenizerFast to /home/test-converted.
Traceback (most recent call last):
  File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 278, in <module>
    main()
  File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 274, in main
    write_tokenizer(args.output_dir, spm_path)
  File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 248, in write_tokenizer
    tokenizer = tokenizer_class(input_tokenizer_path)
  File "/home/myenv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 89, in __init__
    super().__init__(
  File "/home/myenv/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 117, in __init__
    slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)
  File "/home/myenv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in __init__
    self.sp_model.Load(vocab_file)
  File "/home/myenv/lib/python3.10/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/home/myenv/lib/python3.10/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: unk is not defined.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

While weight conversion of llama-13b getting this error: RuntimeError: Internal: unk is not defined. #22873

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

While weight conversion of llama-13b getting this error: RuntimeError: Internal: unk is not defined. #22873

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions