-
Couldn't load subscription status.
- Fork 13.4k
llama : Added support for SmolLm pre-tokenizer (#8608) #8609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Stillerman marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| ied 4 ½ months | ||
| __ggml_vocab_test__ | ||
| Führer | ||
| __ggml_vocab_test__ | ||
|
|
||
| __ggml_vocab_test__ | ||
|
|
||
| __ggml_vocab_test__ | ||
|
|
||
| __ggml_vocab_test__ | ||
|
|
||
| __ggml_vocab_test__ | ||
|
|
||
| __ggml_vocab_test__ | ||
|
|
||
|
|
||
| __ggml_vocab_test__ | ||
|
|
||
|
|
||
|
|
||
| __ggml_vocab_test__ | ||
|
|
||
|
|
||
|
|
||
|
|
||
| __ggml_vocab_test__ | ||
|
|
||
|
|
||
| __ggml_vocab_test__ | ||
| Hello world | ||
| __ggml_vocab_test__ | ||
| Hello world | ||
| __ggml_vocab_test__ | ||
| Hello World | ||
| __ggml_vocab_test__ | ||
| Hello World | ||
| __ggml_vocab_test__ | ||
| Hello World! | ||
| __ggml_vocab_test__ | ||
| Hello, world! | ||
| __ggml_vocab_test__ | ||
| Hello, world! | ||
| __ggml_vocab_test__ | ||
| this is 🦙.cpp | ||
| __ggml_vocab_test__ | ||
| w048 7tuijk dsdfhu | ||
| __ggml_vocab_test__ | ||
| нещо на Български | ||
| __ggml_vocab_test__ | ||
| កាន់តែពិសេសអាចខលចេញ | ||
| __ggml_vocab_test__ | ||
| 🚀 (normal) 😶🌫️ (multiple emojis concatenated) ✅ (only emoji that has its own token) | ||
| __ggml_vocab_test__ | ||
| Hello | ||
| __ggml_vocab_test__ | ||
| Hello | ||
| __ggml_vocab_test__ | ||
| Hello | ||
| __ggml_vocab_test__ | ||
| Hello | ||
| __ggml_vocab_test__ | ||
| Hello | ||
| __ggml_vocab_test__ | ||
| Hello | ||
| Hello | ||
| __ggml_vocab_test__ | ||
| ( | ||
| __ggml_vocab_test__ | ||
|
|
||
| = | ||
| __ggml_vocab_test__ | ||
| ' era | ||
| __ggml_vocab_test__ | ||
| Hello, y'all! How are you 😁 ?我想在apple工作1314151天~ | ||
| __ggml_vocab_test__ | ||
| !!!!!! | ||
| __ggml_vocab_test__ | ||
| 3 | ||
| __ggml_vocab_test__ | ||
| 33 | ||
| __ggml_vocab_test__ | ||
| 333 | ||
| __ggml_vocab_test__ | ||
| 3333 | ||
| __ggml_vocab_test__ | ||
| 33333 | ||
| __ggml_vocab_test__ | ||
| 333333 | ||
| __ggml_vocab_test__ | ||
| 3333333 | ||
| __ggml_vocab_test__ | ||
| 33333333 | ||
| __ggml_vocab_test__ | ||
| 333333333 | ||
| __ggml_vocab_test__ | ||
| Cửa Việt | ||
| __ggml_vocab_test__ | ||
| discards | ||
| __ggml_vocab_test__ | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
| 🚀 (normal) 😶🌫️ (multiple emojis concatenated) ✅ 🦙🦙 3 33 333 3333 33333 333333 3333333 33333333 3.3 3..3 3...3 កាន់តែពិសេសអាច😁 ?我想在apple工作1314151天~ ------======= нещо на Български ''''''```````""""......!!!!!!?????? I've been 'told he's there, 'RE you sure? 'M not sure I'll make it, 'D you like some tea? We'Ve a'lL | ||
| __ggml_vocab_test__ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| 885 216 36 216 16738 2704 | ||
| 54 46991 16863 | ||
|
|
||
| 216 | ||
| 256 | ||
| 333 | ||
| 197 | ||
| 198 | ||
| 1116 | ||
| 16506 | ||
| 197 198 | ||
| 19556 905 | ||
| 38699 905 | ||
| 19556 2260 | ||
| 38699 2260 | ||
| 38699 2260 17 | ||
| 19556 28 905 17 | ||
| 38699 28 905 17 | ||
| 451 314 15107 116 243 30 35392 | ||
| 103 32 36 40 216 39 24961 47112 21554 3492 15995 | ||
| 8831 6643 46438 6485 40610 5470 235 156 228 12681 29441 6511 9175 39511 7872 | ||
| 40478 218 40478 131 40478 237 172 249 229 40478 233 172 249 220 40478 240 40478 132 40478 249 172 249 219 40478 249 40478 112 40478 131 40478 223 40478 219 40478 245 40478 223 172 249 219 40478 227 | ||
| 10813 244 218 365 5472 25 40303 131 321 231 10813 230 121 31752 365 30404 649 21658 271 46336 483 25 4636 246 223 365 8979 649 33777 338 553 624 1038 9624 25 | ||
| 19556 | ||
| 38699 | ||
| 216 38699 | ||
| 256 38699 | ||
| 333 38699 | ||
| 333 38699 472 38699 | ||
| 365 | ||
| 198 446 | ||
| 23 5741 | ||
| 19556 28 329 23 449 17 1073 359 346 40303 219 9148 19805 235 177 221 128 32632 21949 36149 115 40994 33 35 33 36 33 37 33 18614 119 186 138 248 | ||
| 36689 10095 | ||
| 35 | ||
| 35 35 | ||
| 35 35 35 | ||
| 35 35 35 35 | ||
| 35 35 35 35 35 | ||
| 35 35 35 35 35 35 | ||
| 35 35 35 35 35 35 35 | ||
| 35 35 35 35 35 35 35 35 | ||
| 35 35 35 35 35 35 35 35 35 | ||
| 51 25275 251 81 10506 25275 225 100 | ||
| 937 1563 | ||
| 3805 8866 1116 3805 197 216 1656 216 197 11181 472 2367 3914 198 10813 244 218 365 5472 25 40303 131 321 231 10813 230 121 31752 365 30404 649 21658 271 46336 483 25 4636 246 223 15107 116 243 10813 116 243 216 35 216 35 35 216 35 35 35 216 35 35 35 35 216 35 35 35 35 35 216 35 35 35 35 35 35 216 35 35 35 35 35 35 35 216 35 35 35 35 35 35 35 35 216 35 30 35 216 35 950 35 216 35 2026 35 15822 248 218 40478 131 40478 237 172 249 229 40478 233 172 249 220 40478 240 40478 132 40478 249 172 249 219 40478 249 40478 112 40478 131 40478 223 10813 242 219 9148 19805 235 177 221 128 32632 21949 36149 115 40994 33 35 33 36 33 37 33 18614 119 186 138 248 216 21771 2031 28733 28050 6643 46438 6485 40610 5470 235 156 228 12681 29441 6511 9175 39511 7872 7855 11193 1969 1969 3725 1093 1093 5592 950 36689 10095 16693 16693 16693 339 3543 719 637 100 793 384 506 665 28 637 3256 346 2090 47 637 61 441 2090 339 3060 919 357 28 637 52 346 702 634 7188 47 1046 23 25917 253 23 92 60 |
Stillerman marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.