-
Notifications
You must be signed in to change notification settings - Fork 1k
Adding unstable_wasm feature + example to run tokenizers on wasm.
#1009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
Co-Authored-By: josephrocca <[email protected]> Co-Authored-By: Matthias Brunel <[email protected]>
McPatate
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should remove unnecessary files like licenses and CI stuff in the example.
Was the deletion of the Cargo.locks in the bindings intended ?
Also I'm curious, what does Sys in SysRegex stand for ?
Cool to bring tokenizers to wasm ! 🔥
| @@ -0,0 +1,11 @@ | |||
| install: | |||
| - appveyor-retry appveyor DownloadFile https://win.rustup.rs/ -FileName rustup-init.exe | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use appveyor for the CI ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the default template for wasm app. I don't really see a lot of downsides of keeping them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave it up to the user to know or not to add these kinds of files, they don't showcase anything particular to tokenizers and are readily available on wasm tutorials. But that's just my 2 cents !
| @@ -0,0 +1,69 @@ | |||
| language: rust | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah maybe these CI files were autogenerated ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, followed this tutorial:
https://rustwasm.github.io/book/game-of-life/hello-world.html
| @@ -0,0 +1,69 @@ | |||
| <div align="center"> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also may want to change the README's content at some point ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a simple line so it's not the pure wasm tutorial (commands explained in it are still useful I find)
| #[wasm_bindgen] | ||
| extern "C" { | ||
| fn alert(s: &str); | ||
| } | ||
|
|
||
| #[wasm_bindgen] | ||
| pub fn greet() { | ||
| alert("Hello, {{project-name}}!"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may want to remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed !
| @@ -1,4 +1,4 @@ | |||
| use onig::Regex; | |||
| use crate::utils::SysRegex as Regex; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SysRegex is used in other places of the code, won't this alias make the code a bit confusing ? Either work for me, SysRegex or Regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of Regex everywhere in code.
SysRegex tends to imply System meaning the real underlying regex engine.
Happy to put a better name.
And I should rename everything instead of using as Regex it was because of laziness :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with SysRegex, as long as it's consistent everywhere :)
McPatate
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm !
…#1009) * Adding `unstable_wasm` feature + example to run `tokenizers` on wasm. Co-Authored-By: josephrocca <[email protected]> Co-Authored-By: Matthias Brunel <[email protected]> * Adding some serialization tests. * Updating with comments. Co-authored-by: josephrocca <[email protected]> Co-authored-by: Matthias Brunel <[email protected]>
Should get #935 started.
Ideally we should test compliance of some well known tokenizers
Maybe a small tokenizer training.
Feature is named
unstable_wasm(as well as the example). Should at least make sure people understand that 1:1 parity is NOT to be expected.The reason is that
onigthe current regexp engine has to be swapped out forfancy_regexwhich should match, but we have no guarantee (and onig is a C dependency without anywasmsupport that we know of).The
esaxx-rshas been modified to allow dropping out of the C dependency requirements, it uses the rust code (which is slower than it's cpp counterpart). Then all optional features are disabled for wasmcli,httpandprogressbar.This PR also adds
esaxx_fastfeature to enable easier control of the C version of theesaxx-rscode. Since features are additive there was no way to remove thecppfeature only forwasm. So instead there's a new feature within the default feature, which gets disabled by the exampledefault-features = false.