GLM-4-0414 and GLM-4.1V Code Refactor by zRzRzRzRzRzRzR · Pull Request #12117 · sgl-project/sglang

zRzRzRzRzRzRzR · 2025-10-25T09:05:05Z

Used new interfaces, including the addition of PPMissingLayer, and discarded some useless old code

gemini-code-assist · 2025-10-25T09:05:35Z

Summary of Changes

Hello @zRzRzRzRzRzRzR, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant refactoring of the GLM-4-0414 and GLM-4.1V model implementations. The primary goal is to improve the modularity and flexibility of the model components, particularly by enabling robust support for pipeline parallelism. This involves redesigning how model layers are initialized and how data flows through them, utilizing new interfaces like PPMissingLayer for distributed operations. The changes also streamline multimodal input processing for the GLM-4.1V model and enhance overall weight management, leading to a cleaner and more scalable architecture.

Highlights

GLM-4-0414 Model Refactoring: The core GLM-4-0414 model architecture has been refactored to enhance modularity and support pipeline parallelism.
Pipeline Parallelism Integration: Introduced PPMissingLayer and PPProxyTensors to enable efficient pipeline parallelism across model components, including embedding, decoder layers, and normalization.
Modular Component Design: Glm4MLP and Glm4Attention classes were rewritten to accept direct configuration parameters, moving away from a monolithic config object for greater flexibility.
GLM-4.1V Vision Model Updates: The Glm4vVisionBlock was refactored to be a standalone nn.Module and now utilizes VisionAttention, improving its integration and flexibility.
Enhanced Weight Management: New methods like load_kv_cache_scales, get_embed_and_head, and set_embed_and_head were added for more granular control over model weights and KV cache scaling.
Multimodal Input Handling: Glm4vForConditionalGeneration now includes updated routines for handling multimodal inputs and mrope-enabled positions, streamlining processing for vision-language models.
Code Cleanup: Useless old code has been discarded, and configuration parameters are now passed directly to components, simplifying the codebase.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant refactoring for the GLM-4 and GLM-4.1V models, primarily to add support for Pipeline Parallelism (PP) and improve modularity. The changes decouple model components from a monolithic configuration object, which is a positive step towards cleaner code. However, I've identified some critical issues in the new PP implementation that need to be addressed. Specifically, the weight tying logic for the language model head is incorrect in glm4.py and completely missing in glm4v.py, which will cause runtime errors in a multi-stage pipeline. Additionally, I've noted a change in error handling during weight loading that could mask important configuration problems. Please see my detailed comments for suggestions on how to fix these issues.

gemini-code-assist · 2025-10-25T09:08:45Z

python/sglang/srt/models/glm4.py

+            else:
+                emb_token_weight = self.pp_group.recv(
+                    size=(config.vocab_size, config.hidden_size),
+                    dtype=next(self.model.parameters()).dtype,
+                    src=self.pp_group.first_rank,
+                )
+                self.lm_head.weight.copy_(emb_token_weight)


The weight tying logic for pipeline parallelism appears to be incorrect. The else block at line 451 will be executed by all non-first ranks, including intermediate pipeline stages. However, only the last rank initializes self.lm_head as a ParallelLMHead with a weight attribute. Intermediate ranks use PPMissingLayer, which does not have a weight attribute. This will cause an AttributeError on intermediate ranks when self.lm_head.weight.copy_ is called. This block should likely be elif self.pp_group.is_last_rank: to ensure only the last rank attempts to receive and copy the weights.

Suggested change

else:

emb_token_weight = self.pp_group.recv(

size=(config.vocab_size, config.hidden_size),

dtype=next(self.model.parameters()).dtype,

src=self.pp_group.first_rank,

)

self.lm_head.weight.copy_(emb_token_weight)

elif self.pp_group.is_last_rank:

emb_token_weight = self.pp_group.recv(

size=(config.vocab_size, config.hidden_size),

dtype=next(self.model.parameters()).dtype,

src=self.pp_group.first_rank,

)

self.lm_head.weight.copy_(emb_token_weight)

gemini-code-assist · 2025-10-25T09:08:45Z

python/sglang/srt/models/glm4.py

                else:
-                    raise KeyError(f"Parameter '{name}' not found in model.")
+                    logger.warning(f"Parameter {name} not found in params_dict")


Changing the error handling for missing parameters from raising a KeyError to logging a warning could mask significant issues during model loading. If a weight from the checkpoint is not found in the model's parameters, it often indicates a mismatch between the model architecture and the checkpoint, which can lead to a partially uninitialized model. This can cause subtle and hard-to-debug errors. It is generally safer to fail fast in such situations. Please consider reverting this to raise an exception, or at least make this lenient behavior configurable and disabled by default.

zRzRzRzRzRzRzR · 2025-10-25T10:48:00Z

In #12117, the implementation of is_neox_style was not considered, leading to incorrect implementation of GLM-4V.

JustinTong0323 · 2025-10-26T04:29:09Z

Benchmark result looks reasonable to me for glm4.6 and 4.5V

yuan-luo · 2025-10-27T04:51:23Z

In #12117, the implementation of is_neox_style was not considered, leading to incorrect implementation of GLM-4V.

I guess you are mentioning: #11722

zRzRzRzRzRzRzR added 4 commits October 25, 2025 16:04

Update glm4v.py

9a6e55f

2

f3de010

Update glm4v.py

48ffbbc

Merge branch 'sgl-project:main' into glm

93e49cd

gemini-code-assist bot reviewed Oct 25, 2025

View reviewed changes

fix neostyle

bafc726

zRzRzRzRzRzRzR requested review from BBuf, Edwardf0t1, HaiShaw, Ying1123, ch-wan, ispobock, kushanam, merrymercy and zhyncs as code owners October 25, 2025 10:42

zRzRzRzRzRzRzR added 2 commits October 25, 2025 18:45

2

9ea8488

update

997601f

zRzRzRzRzRzRzR added 4 commits October 25, 2025 18:50

remove rms_norm

895f2fd

restore

d2045f2

Update rotary_embedding.py

7e142a2

Update rotary_embedding.py

6fb7c7d

JustinTong0323 self-assigned this Oct 26, 2025

hnyls2002 mentioned this pull request Oct 27, 2025

[Bug] glm-4v-9b fails to load in SGLang (vision weights KeyError) #12158

Closed

5 tasks

JustinTong0323 added the run-ci label Oct 27, 2025

Merge branch 'main' into glm

96810b0

JustinTong0323 added the high priority label Oct 27, 2025

JustinTong0323 approved these changes Oct 27, 2025

View reviewed changes

yuan-luo self-assigned this Oct 27, 2025

hnyls2002 approved these changes Oct 27, 2025

View reviewed changes

hnyls2002 enabled auto-merge (squash) October 27, 2025 08:56

hnyls2002 disabled auto-merge October 27, 2025 08:57

hnyls2002 merged commit a88b006 into sgl-project:main Oct 27, 2025
68 of 127 checks passed

yuan-luo mentioned this pull request Oct 27, 2025

Optimize triton_mrope with torch compile #12112

Merged

4 tasks

zRzRzRzRzRzRzR deleted the glm branch October 27, 2025 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLM-4-0414 and GLM-4.1V Code Refactor#12117

GLM-4-0414 and GLM-4.1V Code Refactor#12117
hnyls2002 merged 12 commits intosgl-project:mainfrom
zRzRzRzRzRzRzR:glm

zRzRzRzRzRzRzR commented Oct 25, 2025

Uh oh!

gemini-code-assist bot commented Oct 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

zRzRzRzRzRzRzR commented Oct 25, 2025

Uh oh!

JustinTong0323 commented Oct 26, 2025

Uh oh!

yuan-luo commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zRzRzRzRzRzRzR commented Oct 25, 2025

Uh oh!

gemini-code-assist bot commented Oct 25, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

zRzRzRzRzRzRzR commented Oct 25, 2025

Uh oh!

JustinTong0323 commented Oct 26, 2025

Uh oh!

yuan-luo commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants