Add `from_bytes` approach for creating tokenizers #1024

HaoboGu · 2022-07-11T16:10:31Z

This PR fixes #1013.
Tokenizer can be created from bytes in memory using added Tokenizers::from_bytes.

Signed-off-by: HaoboGu haobogu@outlook.com

Signed-off-by: HaoboGu <haobogu@outlook.com>

Narsil

LGTM, left a nit (feel free to ignore it if you don't agree).

tokenizers/src/tokenizer/mod.rs

HuggingFaceDocBuilderDev · 2022-07-15T11:49:40Z

The documentation is not available anymore as the PR was closed or merged.

Narsil · 2022-07-15T12:30:55Z

We can consider the tests wokring, this is linked to clippy update.

Signed-off-by: HaoboGu <haobogu@outlook.com>

Add from_bytes approach for creating tokenizers

521a56c

Signed-off-by: HaoboGu <haobogu@outlook.com>

Narsil approved these changes Jul 15, 2022

View reviewed changes

tokenizers/src/tokenizer/mod.rs Show resolved Hide resolved

Narsil merged commit 3564f24 into huggingface:main Jul 18, 2022

Narsil pushed a commit that referenced this pull request Aug 23, 2022

Add from_bytes approach for creating tokenizers (#1024)

e74045e

Signed-off-by: HaoboGu <haobogu@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `from_bytes` approach for creating tokenizers #1024

Add `from_bytes` approach for creating tokenizers #1024

Uh oh!

HaoboGu commented Jul 11, 2022

Uh oh!

Narsil left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 15, 2022 •

edited

Loading

Uh oh!

Narsil commented Jul 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add from_bytes approach for creating tokenizers #1024

Add from_bytes approach for creating tokenizers #1024

Uh oh!

Conversation

HaoboGu commented Jul 11, 2022

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Narsil commented Jul 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add `from_bytes` approach for creating tokenizers #1024

Add `from_bytes` approach for creating tokenizers #1024

HuggingFaceDocBuilderDev commented Jul 15, 2022 •

edited

Loading