cp: `[model] fix: correct GLM-4.5V inference parallelism for 46-layer model (2322)` into `r0.3.0` by ko3n1g · Pull Request #2336 · NVIDIA-NeMo/Megatron-Bridge

ko3n1g · 2026-02-11T20:50:24Z

beep boop [🤖]: Hi @yaoyu-33 👋,

we've cherry picked #2322 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

Bug Fixes
- Applied runtime hardening to model initialization in both text and visual language model conversion examples.
Configuration Updates
- Optimized distributed inference parameters for visual language model processing on multi-GPU setups.

#2322) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

ko3n1g · 2026-02-11T20:50:27Z

/ok to test 0cd4f86

copy-pr-bot · 2026-02-11T20:50:28Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-11T20:53:34Z

📝 Walkthrough

Walkthrough

Three example and configuration files are modified to disable the mtp_num_layers parameter in sub-model configurations and adjust pipeline/expert parallelism settings for GLM-4.5V inference. The changes add post-initialization steps clearing mtp_num_layers and update tensor parallelism distribution parameters.

Changes

Cohort / File(s)	Summary
Conversion script hardening `examples/conversion/hf_to_megatron_generate_text.py`, `examples/conversion/hf_to_megatron_generate_vlm.py`	Adds loops to clear `mtp_num_layers` by setting `m.config.mtp_num_layers = None` for each sub-model immediately after model loading and before device allocation.
Inference configuration `examples/models/vlm/glm_45v/inference.sh`	Updates pipeline parallelism (PP) from 4 to 2 and expert parallelism (EP) from 2 to 4 in the Hugging Face inference command for 8-GPU execution.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

[model] fix: correct GLM-4.5V inference parallelism for 46-layer model #2322: Modifies the same example conversion scripts to clear mtp_num_layers and adjusts GLM-4.5V inference parameters identically.

Suggested labels

r0.3.0

Suggested reviewers

cuichenx

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR contains non-trivial changes affecting model inference parallelism and parameter configuration, but lacks documented test results, performance benchmarks, or regression testing evidence.	Include test results, performance benchmarks comparing old vs. new configuration, regression testing evidence, and justification for chosen parallelism values (PP=2, EP=4).

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: fixing GLM-4.5V inference parallelism configuration for a 46-layer model, with a reference to the original PR (`#2322`) and target branch (r0.3.0).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cherry-pick-2322-r0.3.0

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments

examples/conversion/hf_to_megatron_generate_text.py (1)
170-172: Consider adding a tracking reference for this temp fix.

The TEMP FIX comment is helpful, but it would be good to include a link to an issue or TODO ticket so this workaround doesn't become permanent. This makes it easier to track when the proper fix lands upstream.
-    # TEMP FIX for inference failure when mtp_num_layers is not None
+    # TEMP FIX for inference failure when mtp_num_layers is not None
+    # TODO: Remove once MTP inference is properly supported (see PR `#2322`)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

[model] fix: correct GLM-4.5V inference parallelism for 46-layer model (

0cd4f86

#2322) Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

ko3n1g requested a review from yaoyu-33 February 11, 2026 20:50

ko3n1g added cherry-pick Run CICD labels Feb 11, 2026

copy-pr-bot bot temporarily deployed to nemo-ci February 11, 2026 20:50 Inactive

copy-pr-bot bot temporarily deployed to test February 11, 2026 20:51 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 11, 2026 20:55 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 11, 2026 21:08 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 11, 2026 21:18 Inactive

copy-pr-bot bot temporarily deployed to public February 11, 2026 21:18 Inactive

ko3n1g merged commit 938dd5b into r0.3.0 Feb 12, 2026
52 of 54 checks passed

ko3n1g deleted the cherry-pick-2322-r0.3.0 branch February 12, 2026 21:57

coderabbitai bot mentioned this pull request Mar 4, 2026

[misc] fix: Improve compare.py robustness for multi-GPU and vocab-padded models #2647

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `[model] fix: correct GLM-4.5V inference parallelism for 46-layer model (2322)` into `r0.3.0`#2336

cp: `[model] fix: correct GLM-4.5V inference parallelism for 46-layer model (2322)` into `r0.3.0`#2336
ko3n1g merged 1 commit intor0.3.0from
cherry-pick-2322-r0.3.0

ko3n1g commented Feb 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

ko3n1g commented Feb 11, 2026

Uh oh!

copy-pr-bot bot commented Feb 11, 2026

Uh oh!

coderabbitai bot commented Feb 11, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ko3n1g commented Feb 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

ko3n1g commented Feb 11, 2026

Uh oh!

copy-pr-bot bot commented Feb 11, 2026

Uh oh!

coderabbitai bot commented Feb 11, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ko3n1g commented Feb 11, 2026 •

edited by coderabbitai bot

Loading