-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Add Support for vicuna-13b-delta-v1.1 #96
Comments
Agreed! WebGPU for onnxruntime-web is almost here (see microsoft/onnxruntime#14579), and Transformers.js will support it when ready! There will be a massive announcement when it does drop! As for now, it is just a matter of waiting 😅 ... |
It's here !! They just merged the [js/web] WebGPU backend via JSEP #14579 few hours ago into the main brunch. No official release yet. Looks like @fs-eire openned another pull request for code cleanup and some small fixes. But we can build from the main brunch and start coding 😄 |
it takes some time and effort from enabling building from source, to including in the NPM package, to release as experimental feature, and then to final release. I will keep working on stability, performance and coverage of the webgpu backend operator implementation in ort-web. this is going to be long-term work. Please feel free to "@" me or submit github issues to onnxruntime for feedback. |
@fs-eire I have been following the build instructions from here, and I think I've sorted out most issues regarding dependencies, versions, etc. However, when running in the browser, I get the error |
I've been looking at the updated files and under [UPDATE]: Here's what worked for me...
Then copy files as instructed in the documentation and build npm package. I'm attaching my successful npm package build zip here just in case if you wanna be lazy. 😘 |
That was it! Commit history says that change was made yesterday, so I was one day out of date haha. The package is being built now, and I will hopefully have more updates later today 🎉
This will come in handy! Thanks! |
So, I did that all, but it doesn't seem to run :/ Here's the error I get: microsoft/onnxruntime#15719 @DK013 If you're able to, could you try run the model linked here with the following input: let input = {
attention_mask: new Tensor(
'int64',
new BigInt64Array([1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n]),
[1, 12]
),
input_ids: new Tensor(
'int64',
new BigInt64Array([13959n, 1566n, 12n, 2379n, 10n, 8774n, 6n, 149n, 33n, 25n, 58n, 1n]),
[1, 12]
)
} or see here for a full demo to test with. |
@DK013 please use the workaround as described in microsoft/onnxruntime#15719 (comment) . I am working on a solution to fix the issue. |
@xenova As I can see fs-eire is working on the issues we've encountered before, and I'm running behind schedule on my own project, I'm implementing transformers.js with cpu for now in my code. Mainly I need whisper (a little more than the base model hopefully) to work right now and a suitable LLM model later on. So I'm gonna go ahead and complete the basic codes for testing right now, and wait for once you guys are done with polishing webgpu. |
@DK013 The PR mentioned above is merged, and another bugfix is in PR: microsoft/onnxruntime#15819. |
Catching up on this issue, does this mean there is conversational model support with the onnxruntime PRs? The README shows it as not supported yet. Thanks for any clarification! |
The vicuna-13b-delta-v1.1 is categorized as a text-generation model (not a conversational model), which is supported by Transformers.js. The distinction (which mainly lies in how they are used) is subtle, as both can be used for "conversations". For more information, see: |
@xenova heyo, I've been a bit busy with my own projects and running the business and all. what's the status of webgpu? How are your tests going? |
**support for vicuna-13b-delta-v1.1
NOTE: It's not listed in transformers supported models list but it does work with transformers
Reason for request
With the upcoming WebGPU support in ONNXRuntime I believe it'll be really helpful to have an LLm support for browser based applications and this repo is the best solution we got so far.
Additional context
I've been working on an AI assistant made in electron and cordova for desktop and mobile platforms respectively. I'm already using TransformerJS with whisper for speech-to-text. I intend to switch to WebGPU with JSEP as soon it's available, so I can leverage the GPU compute capabilities to run larger models. I'm trying to build the project with as much opensource resources as possible and having an LLM support would be real nice instead of using openai apis. This keeps the project cost free for users and user's data-privacy is another benifit. I'm really looking forward to see if this is gonna be possible. I'm willing to contribute as much as I can being a complete novice to the ML community.
Thanks in advance
The text was updated successfully, but these errors were encountered: