Add qwen2.5 vl #2995

maximizemaxwell · 2025-06-15T22:36:14Z

What does this PR do?

Add support for Qwen-2.5-VL

Part of Issue

#2814

lucasjinreal · 2025-06-16T02:03:30Z

Nice work, does the result verified? Also, would consider add quantization support maybe

maximizemaxwell · 2025-06-16T02:13:34Z

It's still on work though..

lucasjinreal · 2025-06-16T02:20:46Z

Oh. no

akshayballal95 · 2025-06-16T20:49:25Z

Nice Work. This is definitely needed. But I see that the model uses conv3d which is not there yet in Candle. How are you planning to handle that?

maximizemaxwell · 2025-06-17T02:05:04Z

@akshayballal95 I’ve been struggling this for a while. While flattening the temporal dimension and using Conv2D (reshaping [B, T, C, H, W] to [B*T, C, H, W]) may be possible, it changes the computation and is incompatible with pretrained weights. Implementing native Conv3D support in Candle seems to be the proper solution.

akshayballal95 · 2025-06-19T04:03:33Z

Well, we have been asking for it

#795 (comment)

EricLBuehler · 2025-06-19T05:04:35Z

@maximizemaxwell you can look at: https://github.com/EricLBuehler/mistral.rs/blob/4608202c128da44b84157573dbc8ff1a1146f64c/mistralrs-core/src/layers.rs#L1965-L2036

This is written under the assumption that the temporal patch size==kernel_sizes[0]==2, which is true for Qwen 2.5 VL.

maximizemaxwell · 2025-06-19T05:16:01Z

Gosh I didn’t realize there was already an issue — I just created a new one with the same content, thinking it didn’t exist.
For now, I think it’s best to focus on implementing conv3d and postpone support for Qwen-VL.
I’ll think through a concrete implementation plan for conv3d tomorrow and open a PR under the issue you shared.

lucasjinreal · 2025-06-19T13:53:32Z

We need someone to review these features as already implemented in the community. Please merge it into candle so that other people could bring more broad model support to candle!

maximizemaxwell added 3 commits May 29, 2025 14:09

initial implement of qwen2_5

403b034

fixed reference and revised name of struct

b1c08ea

revised style

ef491e3

maximizemaxwell marked this pull request as draft June 15, 2025 22:36

SorenDreano mentioned this pull request Jul 2, 2025

Support for VLM huggingface/text-embeddings-inference#669

Open

maximizemaxwell marked this pull request as ready for review August 6, 2025 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add qwen2.5 vl #2995

Add qwen2.5 vl #2995

maximizemaxwell commented Jun 15, 2025

Uh oh!

lucasjinreal commented Jun 16, 2025

Uh oh!

maximizemaxwell commented Jun 16, 2025

Uh oh!

lucasjinreal commented Jun 16, 2025

Uh oh!

akshayballal95 commented Jun 16, 2025

Uh oh!

maximizemaxwell commented Jun 17, 2025 •

edited

Loading

Uh oh!

akshayballal95 commented Jun 19, 2025

Uh oh!

EricLBuehler commented Jun 19, 2025

Uh oh!

maximizemaxwell commented Jun 19, 2025

Uh oh!

lucasjinreal commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add qwen2.5 vl #2995

Are you sure you want to change the base?

Add qwen2.5 vl #2995

Conversation

maximizemaxwell commented Jun 15, 2025

What does this PR do?

Part of Issue

Uh oh!

lucasjinreal commented Jun 16, 2025

Uh oh!

maximizemaxwell commented Jun 16, 2025

Uh oh!

lucasjinreal commented Jun 16, 2025

Uh oh!

akshayballal95 commented Jun 16, 2025

Uh oh!

maximizemaxwell commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akshayballal95 commented Jun 19, 2025

Uh oh!

EricLBuehler commented Jun 19, 2025

Uh oh!

maximizemaxwell commented Jun 19, 2025

Uh oh!

lucasjinreal commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maximizemaxwell commented Jun 17, 2025 •

edited

Loading