py : fix missing added_tokens_dict for SPM and BPE vocabs #4971

ggerganov · 2024-01-16T11:49:53Z

ggml-ci

TheBloke · 2024-01-16T21:43:10Z

Confirming this now works, as per my comment: #4958 (comment)

Many thanks

) * py : fix missing added_tokens_dict for SPM vocab * py : pad with unknown tokens when data is missing ggml-ci * py : fix BPE vocab conversion ggml-ci * py : fix padded dummy tokens (I hope)

py : fix missing added_tokens_dict for SPM vocab

9b464b4

ggerganov mentioned this pull request Jan 16, 2024

convert.py: --pad-vocab not working with SPM, 'SentencePieceVocab' object has no attribute 'added_tokens_dict'. Did you mean: 'added_tokens_list'? #4958

Closed

py : pad with unknown tokens when data is missing

a137273

ggml-ci

ggerganov force-pushed the gg/fix-spm-added-tokens-dict-4958 branch from 9aefd14 to a137273 Compare January 16, 2024 12:08

py : fix BPE vocab conversion

d92351e

ggml-ci

ggerganov changed the title ~~py : fix missing added_tokens_dict for SPM vocab~~ py : fix missing added_tokens_dict for SPM and BPE vocabs Jan 16, 2024

ggerganov mentioned this pull request Jan 16, 2024

can' quantize deekseek model #4925

Closed

ggerganov added the need feedback Testing and feedback with results are needed label Jan 16, 2024

py : fix padded dummy tokens (I hope)

23742de

ggerganov merged commit 4f4bf35 into master Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

py : fix missing added_tokens_dict for SPM and BPE vocabs #4971

py : fix missing added_tokens_dict for SPM and BPE vocabs #4971

Uh oh!

ggerganov commented Jan 16, 2024 •

edited

Loading

Uh oh!

TheBloke commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

py : fix missing added_tokens_dict for SPM and BPE vocabs #4971

py : fix missing added_tokens_dict for SPM and BPE vocabs #4971

Uh oh!

Conversation

ggerganov commented Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheBloke commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggerganov commented Jan 16, 2024 •

edited

Loading