Skip to content

[Bugfix] Fix GLM-4.6V vision regression in glm4v_moe and glm_ocr#20463

Merged
Fridge003 merged 2 commits intosgl-project:mainfrom
JustinTong0323:fix/glm4v-moe-ocr-vision-regression
Mar 14, 2026
Merged

[Bugfix] Fix GLM-4.6V vision regression in glm4v_moe and glm_ocr#20463
Fridge003 merged 2 commits intosgl-project:mainfrom
JustinTong0323:fix/glm4v-moe-ocr-vision-regression

Conversation

@JustinTong0323
Copy link
Copy Markdown
Collaborator

Motivation

PR #20033 replaced Conv3d with Linear in Glm4vVisionPatchEmbed and added copy_conv3d_weight_to_linear() at the end of glm4v.py's load_weights(). However, glm4v_moe.py and glm_ocr.py have their own load_weights() overrides and the call was not added there. This left the Linear layer with random weights, causing the vision encoder to produce garbage embeddings — the model outputs text completely unrelated to the image content.

Fix

Add the missing self.visual.patch_embed.copy_conv3d_weight_to_linear() call at the end of load_weights() in both glm4v_moe.py and glm_ocr.py.

Test Plan

  • Verified with GLM-4.6V-FP8 (glm4v_moe) on B200 TP=4: vision responses now correctly describe image content
  • Root cause confirmed via git bisect over 625 commits (v0.5.9 → 7a1ca53)

Fixes #20462

PR sgl-project#20033 replaced Conv3d with Linear in Glm4vVisionPatchEmbed and
added copy_conv3d_weight_to_linear() to glm4v.py's load_weights, but
missed adding it to glm4v_moe.py and glm_ocr.py. This left the linear
layer with random weights, causing the vision encoder to produce
garbage embeddings — the model outputs text unrelated to the image.

Fixes sgl-project#20462
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

2 similar comments
@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@yuan-luo
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@zRzRzRzRzRzRzR
Copy link
Copy Markdown
Contributor

Need to be modified to

if not is_nextn:
            self.visual.patch_embed.copy_conv3d_weight_to_linear()

Otherwise, loading MTP will fail to find visual and report an error

This modification is applicable to these two models, as well as GLM-4V

MTP loading calls load_weights with is_nextn=True, where self.visual
does not exist. Wrap the call with `if not is_nextn` to avoid
AttributeError.
@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

Thanks @zRzRzRzRzRzRzR, good catch! Fixed in 13aed3e — added if not is_nextn guard for both glm4v_moe.py and glm_ocr.py.

@Fridge003 Fridge003 merged commit c330b68 into sgl-project:main Mar 14, 2026
113 of 133 checks passed
@yuan-luo
Copy link
Copy Markdown
Collaborator

Need to be modified to

if not is_nextn:
            self.visual.patch_embed.copy_conv3d_weight_to_linear()

Otherwise, loading MTP will fail to find visual and report an error

This modification is applicable to these two models, as well as GLM-4V

We may need to add test cases to cover this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] GLM-4.6V vision regression: model ignores image content after PR #20033

4 participants