[Bugfix] Fix GLM-4.6V vision regression in glm4v_moe and glm_ocr#20463
Merged
Fridge003 merged 2 commits intosgl-project:mainfrom Mar 14, 2026
Merged
Conversation
PR sgl-project#20033 replaced Conv3d with Linear in Glm4vVisionPatchEmbed and added copy_conv3d_weight_to_linear() to glm4v.py's load_weights, but missed adding it to glm4v_moe.py and glm_ocr.py. This left the linear layer with random weights, causing the vision encoder to produce garbage embeddings — the model outputs text unrelated to the image. Fixes sgl-project#20462
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Collaborator
Author
|
/tag-and-rerun-ci |
Collaborator
Author
|
/rerun-failed-ci |
2 similar comments
Collaborator
Author
|
/rerun-failed-ci |
Collaborator
Author
|
/rerun-failed-ci |
yuan-luo
approved these changes
Mar 13, 2026
Collaborator
|
/rerun-failed-ci |
Contributor
|
Need to be modified to Otherwise, loading MTP will fail to find visual and report an error This modification is applicable to these two models, as well as GLM-4V |
MTP loading calls load_weights with is_nextn=True, where self.visual does not exist. Wrap the call with `if not is_nextn` to avoid AttributeError.
Collaborator
Author
|
Thanks @zRzRzRzRzRzRzR, good catch! Fixed in 13aed3e — added |
Collaborator
We may need to add test cases to cover this case. |
yhyang201
pushed a commit
to yhyang201/sglang
that referenced
this pull request
Mar 15, 2026
Wangzheee
pushed a commit
to Wangzheee/sglang
that referenced
this pull request
Mar 21, 2026
0-693
pushed a commit
to 0-693/sglang
that referenced
this pull request
Mar 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
PR #20033 replaced
Conv3dwithLinearinGlm4vVisionPatchEmbedand addedcopy_conv3d_weight_to_linear()at the end ofglm4v.py'sload_weights(). However,glm4v_moe.pyandglm_ocr.pyhave their ownload_weights()overrides and the call was not added there. This left theLinearlayer with random weights, causing the vision encoder to produce garbage embeddings — the model outputs text completely unrelated to the image content.Fix
Add the missing
self.visual.patch_embed.copy_conv3d_weight_to_linear()call at the end ofload_weights()in bothglm4v_moe.pyandglm_ocr.py.Test Plan
git bisectover 625 commits (v0.5.9 → 7a1ca53)Fixes #20462