Optimizations for Qwen3VL models#18559
Conversation
Summary of ChangesHello @wili-65535, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on optimizing the Qwen3VL models by integrating VisionFly, a library designed to accelerate vision-language models. The changes include modifications to attention mechanisms, linear layers, and CPU offloading strategies. Additionally, the PR incorporates several debugging enhancements and performance tweaks to improve the overall efficiency and stability of the models. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
e0b57dd to
5af998e
Compare
|
It seems the core technology lies in vfly.utils.parallel such as dit_sp_gather, dit_sp_split. I once attempted to support ViT SP but the road forward was suffering. I'm curious about the vfly lib. Could you please share more details? |
Thank you for your attention! |
5af998e to
0d56d56
Compare
a36c311 to
919d4be
Compare
|
|
||
| return out | ||
|
|
||
| def fast_pos_embed_interpolate_v3( |
There was a problem hiding this comment.
This function has been optimized likewise in main.
| ): # wili, for jpeg base64 on NVIDIA GPU | ||
| image_bytes = pybase64.b64decode(image_file, validate=True) | ||
| image = torch.frombuffer(image_bytes, dtype=torch.uint8) | ||
| image = decode_jpeg(image, device="cuda") |
There was a problem hiding this comment.
May need to consider not breaking other device.
There was a problem hiding this comment.
Thank you! we file a separate PR for this optimization here (#19749).
6563ade to
aab79e1
Compare
aab79e1 to
469b4f4
Compare
v0.2: remove vfly related code temporarily v0.5: remove nvtx v0.6: fix back weight names in qwen3_vl.py
469b4f4 to
59b1d22
Compare
Discuss in issue #18784