Add asyncify setup#1
Merged
Merged
Conversation
Owner
Author
|
With the current setup we should have stable support on many laptops and mobile devices |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Opening a separate PR targeting the other one for now, since I think there are enough changes here to make it worth a separate review/discussion. This adds support for using ASYNCIFY, which theoretically allows the WebGPU backend to work on more systems while JSPI development continues.
Llama-3.2-1B-Instruct-Q4_0.gguf. The JSPI WASM blobs are also ~60% smaller than the ASYNCIFY ones.EXPORT_ALLin the build, so I moved all the necessary methods toEXPORTED_FUNCTIONS. See issue I opened with emscripten here.I found that there are different issues/discussion points on different browsers, so here's some of that information
Chrome
As in the other PR, JSPI works well here, and ASYNCIFY is 10% slower with the WebGPU backend. But otherwise, no other issues.
Safari
Unfortunately, using the WebGPU backend leads to the model returning gibberish (although it runs pretty quickly in doing so). I'm not sure if this is an issue with the llama.cpp WebGPU backend, or a bug in Safari's WebGPU implementation (WebKit). I'll note that WebLLM, which has a WebGPU backend, does work, but that runs through JavaScript directly, not WASM, so there could be differences in the handling there.
Firefox
The llama.cpp WebGPU backend currently doesn't work here due to Firefox's WebGPU compiler not handling some of the program constructs we use. I've opened an issue for that here, so hopefully that can be resolved and allow testing here.
Otherwise, the multi-thread CPU backend seems to work fine on all the browsers, even with the WebGPU backend otherwise enabled in the builds.