-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new models (Janus, Qwen2-VL, JinaCLIP, LLaVA-OneVision, ViTPose, MGP-STR) & refactor processors. #1001
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
This is an amazing PR and I thought I give it a test in Chrome on Linux. I see randomly these two errors: Or simply: Is there anything special I need to do to test this in the browser? So far I couldn't get it to work. By default it seems to pick WebGPU, but WebGPU is known to work very poorly on Linux, so I tried every other possibility: const model = await MultiModalityCausalLM.from_pretrained(model_id, {
dtype: {
prepare_inputs_embeds: 'fp32',
language_model: 'q4',
lm_head: 'fp32',
gen_head: 'fp32',
gen_img_embeds: 'fp32',
image_decode: 'fp32',
},
// Pick one: webnn-npu, webnn-gpu, webnn-cpu, webnn, webgpu, wasm
device: 'wasm',
}); None worked 🙈 Thank you anyway, looking forward getting this to work somehow! |
This is great, will this PR support Qwen2-VL? 🙏 |
Hey @pdufour, I was originally planning on doing this in a separate PR, but I've been following your work on getting it running (great work BTW!) and so it might be possible to squeeze into this PR! 👀 |
deepseek-ai/Janus-1.3B
(any-to-any)
Merging to put out Transformers.js v3.1. Follow-up patches may be needed, but it's good to go for now imo! |
New models
Janus (any-to-any)
This PR adds support for
deepseek-ai/Janus-1.3B
, a novel autoregressive framework that unifies multimodal understanding and generation.In particular, it can do the following:
text+image to text:
Example output:
image-to-text:
Example outputs:
This PR also refactors the way that processor classes load image/text pre-preprocessors, aligning better with the python transformers library.