Skip to content

[model] fix: correct GLM-4.5V inference parallelism for 46-layer model#2322

Merged
yaoyu-33 merged 2 commits intomainfrom
yuya/fix-glm45v-inference-parallelism
Feb 11, 2026
Merged

[model] fix: correct GLM-4.5V inference parallelism for 46-layer model#2322
yaoyu-33 merged 2 commits intomainfrom
yuya/fix-glm45v-inference-parallelism

Conversation

@yaoyu-33
Copy link
Copy Markdown
Contributor

@yaoyu-33 yaoyu-33 commented Feb 11, 2026

What

  • Update GLM-4.5V example inference parallelism to across all commands.
  • Keep 8-GPU total while making the pipeline split valid for 46 layers.
  • Add temporary workaround in VLM and text generation conversion scripts to avoid inference failure.

Why

GLM-4.5V has 46 layers, which is not divisible by and causes assertion failure during inference.

Validation

  • fix end of files.....................................(no files to check)Skipped
    trim trailing whitespace.............................(no files to check)Skipped
    ruff.................................................(no files to check)Skipped
    ruff.................................................(no files to check)Skipped
    ruff-format..........................................(no files to check)Skipped
    Disallow '_' in Markdown filenames...................(no files to check)Skipped
  • Branch push successful

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed model parallel layer configuration issue in Megatron model inference pipelines to prevent errors during CUDA operations when using HuggingFace-converted models.
  • Chores

    • Updated parallelism distribution settings in example inference scripts for improved resource allocation: pipeline parallelism reduced from 4 to 2, expert parallelism increased from 2 to 4.

…ound

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 11, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test ccb2456

@yaoyu-33 yaoyu-33 changed the title fix: correct GLM-4.5V inference parallelism for 46-layer model [model] fix: correct GLM-4.5V inference parallelism for 46-layer model Feb 11, 2026
@yaoyu-33 yaoyu-33 added the r0.3.0 Cherry-pick label for r0.3.0 release branch label Feb 11, 2026
@yaoyu-33 yaoyu-33 enabled auto-merge (squash) February 11, 2026 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.3.0 Cherry-pick label for r0.3.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants