You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
convert.py: --pad-vocab not working with SPM, 'SentencePieceVocab' object has no attribute 'added_tokens_dict'. Did you mean: 'added_tokens_list'?#4958
I've just noticed that since the recent convert.py refactor, the new --pad-vocab feature does not work with SPM vocabs. It does work as expected with HFFT. EDIT: actually there might be a different bug with HFFT, see next post on that.
Writing /workspace/process/tigerresearch_tigerbot-13b-chat-v5/gguf/tigerbot-13b-chat-v5.fp16.gguf, format 1
Padding vocab with 2 token(s) - <dummy00001> through <dummy00002>
Traceback (most recent call last):
File "/workspace/git/llama.cpp/./convert.py", line 1658, in <module>
main(sys.argv[1:]) # Exclude the first element (script name) from sys.argv
^^^^^^^^^^^^^^^^^^
File "/workspace/git/llama.cpp/./convert.py", line 1643, in main
OutputFile.write_all(
File "/workspace/git/llama.cpp/./convert.py", line 1188, in write_all
check_vocab_size(params, vocab, pad_vocab=pad_vocab)
File "/workspace/git/llama.cpp/./convert.py", line 1008, in check_vocab_size
vocab.added_tokens_dict[f"<dummy{i:05}>"] = -1
^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'SentencePieceVocab' object has no attribute 'added_tokens_dict'. Did you mean: 'added_tokens_list'?
In this example, I did the conversion with --vocab-type hfft instead which worked OK.