[Feature] Optimizations for class Qwen3VLMoeVisionModel (Conv3d to Linear) in Qwen3VL#19788
[Feature] Optimizations for class Qwen3VLMoeVisionModel (Conv3d to Linear) in Qwen3VL#19788wili-65535 wants to merge 1 commit intosgl-project:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on optimizing the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a performance optimization by replacing the Conv3d layer with a Linear layer in Qwen3VLVisionPatchEmbed. The weights are loaded into the Conv3d layer and then copied to the Linear layer, after which the Conv3d layer is deleted to save memory. The changes are logical and well-implemented. My feedback includes a suggestion to make the weight copying method idempotent for increased robustness.
ee2b18a to
528ed81
Compare
|
Could you please provide accuracy result? |
Conv3d with kernel_size == stride does not slide over the input, making it equivalent to a reshape + linear projection. Replace it with nn.Linear after loading HF checkpoint weights for significantly faster patch embedding, especially on older cuDNN versions where Conv3d with large non-sliding kernels is extremely slow. Safety checks ensure the optimization only applies when kernel_size == stride, padding == 0, dilation == 1, and groups == 1. Reference: sgl-project/sglang#19788 Made-with: Cursor
…up (#132) Conv3d with kernel_size == stride does not slide over the input, making it equivalent to a reshape + linear projection. Replace it with nn.Linear after loading HF checkpoint weights for significantly faster patch embedding, especially on older cuDNN versions where Conv3d with large non-sliding kernels is extremely slow. Safety checks ensure the optimization only applies when kernel_size == stride, padding == 0, dilation == 1, and groups == 1. Reference: sgl-project/sglang#19788 Made-with: Cursor Co-authored-by: Hongli Mi <hmi@nvidia.com>
4b3a6f9 to
241bb50
Compare
Sure, we add a unit test |
|
/tag-and-rerun-ci |
|
/rerun-failed-ci |
241bb50 to
c0e9724
Compare
|
/rerun-failed-ci |
|
Hi @yuan-luo , I see your PR20033 for GLM4v has been merged. Cheers!
Oh, I see the CI is rerunning, thank you... It seems the CI failures are not related to our code change?
|
@wili-65535 We can probably keep this unit test as it is related with Qwen3VLMoE. BTW, can we move test to the vlm folder? Let's wait for the CI passed. |
|
This CI failure may need to take special care of. |
|
/rerun-failed-ci |
6bc6b65 to
27612f4
Compare
|
Here are also some error information reproduced stably in the CI pipeline. registered/lora/test_multi_lora_backend.pyKeyError: '/loky-7350-k06btve4' xpu/test_intel_xpu_backend.pyAttributeError: module 'torch.xpu' has no attribute 'graph_pool_handle' In addition, the output of Qwen3VL model in test |
|
/rerun-failed-ci |
27612f4 to
fc1f1c2
Compare
|
OK the error in test |
|
The latest CI results seem to indicate that this change may have introduced an accuracy issue. Could you please take a look when you have a chance? Many thanks! |
|
I opened #20282 which builds on your idea and generalizes it into a unified What do you think about this approach? Would appreciate it if you could help review when you have a chance. Thanks! |
Great idea! I had actually considered a common implementation for models with similar sub-modules, but got blocked at Qwen3VL due to my limited familiarity with sglang layers and other models. Really appreciate you picking this up and making it more general. I'll review the PR and let you know if I have any feedback. |
fc1f1c2 to
5f0da01
Compare
Fixed, root cause is Qwen3Omni needs adjust, too. |
bd9ef2b to
bf19d8e
Compare
|
/rerun-failed-ci |
2 similar comments
|
/rerun-failed-ci |
|
/rerun-failed-ci |
|
Hi @yhyang201 @samuellees
|
|
Hello, this PR (#20282) has already been merged. Could you please help check its performance and quality? Thank you! |
v1.0: align to PR20033 v1.1: add unit test v1.2: fix CI error
bf19d8e to
262a4ac
Compare
Motivation
Modifications
Linearrather thanConv3dto compute theQwen3VLVisionPatchEmbed.test/srt/test_conv3d_to_linear_unittest.pyto ensure the replacement is equivalent.Accuracy Tests
By unit test
test/srt/test_conv3d_to_linear_unittest.py.lm-evalshows no drop between main branch and this PR, which get the same score:lmms_evalshows no drop between main branch and this PR, which get the similar score:Command:
Benchmarking and Profiling
forward()ofclass Qwen3VLMoeVisionModel) to count the time.Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci