[build] chore: Upgrade transformers to 5.0 by yaoyu-33 · Pull Request #2068 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-01-26T18:35:30Z

Upgrade transformers dependency to version 5.0

Summary by CodeRabbit

Chores
- Updated transformer library dependencies to newer versions.
- Updated transformer-engine to the latest release version.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: root <root@pool0-00120.cm.cluster>

copy-pr-bot · 2026-01-26T18:35:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yaoyu-33 · 2026-01-26T18:37:11Z

/ok to test 8ccb17a

coderabbitai · 2026-01-26T18:40:01Z

📝 Walkthrough

Walkthrough

Updated project dependencies in pyproject.toml by bumping the transformers package from version constraint <5.0.0 to >=5.0.0, and updated the transformer-engine source from a specific commit hash to the release_v2.11 tag.

Changes

Cohort / File(s)	Summary
Dependency Version Updates `pyproject.toml`	Updated `transformers` constraint from `<5.0.0` to `>=5.0.0`; changed `transformer-engine` source rev from commit `6a34b6574fa6c29d9d07fdcddf9812cbb1488878` to release tag `release_v2.11`

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR upgrades transformers from <5.0.0 to >=5.0.0 (major version change) but lacks test results, validation, or evidence of testing in description.	Update PR description with concrete test results, evidence of correct model outputs/numerics with transformers 5.0.0, and confirmation existing tests pass.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[build] chore: Upgrade transformers to 5.0' clearly and specifically summarizes the main change: upgrading the transformers dependency to version 5.0, which aligns perfectly with the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@pyproject.toml`:
- Around line 117-121: Update the version constraint for
transformer-engine[pytorch] to allow the 2.11 release to match the source
override: locate the dependency constraint for transformer-engine[pytorch] (the
existing "<2.10.0" constraint) and change it to an appropriate 2.11-compatible
range (for example ">=2.11,<2.12") so it aligns with the [tool.uv.sources]
transformer-engine entry that points to rev "release_v2.11"; alternatively, if
you intended to stay on <2.10.0, change the [tool.uv.sources] transformer-engine
rev to a matching release (e.g., a 2.9/2.10 tag) so the constraint and the
source override are consistent.

🧹 Nitpick comments (1)

pyproject.toml (1)
68-81: Add an upper bound for the transformers major version.

Line 69 allows future majors (e.g., 6.0+) which could introduce breaking changes outside this PR's scope. Transformers v5.0.0 (released January 26, 2026) contains significant breaking changes including PyTorch-only focus, tokenizer refactoring, and new weight-loading APIs. Pin to the v5.x line to allow patch and minor updates while preventing major version surprises.
🔧 Proposed adjustment
-    "transformers>=5.0.0",
+    "transformers>=5.0.0,<6.0.0",

coderabbitai · 2026-01-26T18:40:04Z

pyproject.toml

 # uv.sources allows us to override dependencies with VCS commits.
 # Lets use this only for debugging purposes, but not for production (main).
 [tool.uv.sources]
-transformer-engine = { git = "https://github.com/NVIDIA/TransformerEngine.git", rev = "6a34b6574fa6c29d9d07fdcddf9812cbb1488878" }
+transformer-engine = { git = "https://github.com/NVIDIA/TransformerEngine.git", rev = "release_v2.11" }
 megatron-core = { path = "3rdparty/Megatron-LM/" }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and examine pyproject.toml to verify the version constraints find . -name "pyproject.toml" -type f | head -5

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 87

🏁 Script executed:

# Read the relevant lines from pyproject.toml (lines 110-125 to get full context) sed -n '110,125p' pyproject.toml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 959

🌐 Web query:

TransformerEngine release_v2.11 git tag version number package version

💡 Result:

The Git tag release_v2.11 corresponds to Transformer Engine package version 2.11.0 (i.e., “2.11” release line with patch .0). [1], [2]

Resolve transformer-engine version mismatch between source rev and override constraint.

Line 114 constrains transformer-engine[pytorch] to <2.10.0, but line 120 specifies a source override pointing to release_v2.11 (which corresponds to version 2.11.0). This creates a constraint violation that will cause resolution failures or force silent downgrades. Align the override with the intended 2.11 release.

Example alignment for 2.11.x

- "transformer-engine[pytorch]>=2.9.0a0,<2.10.0", + "transformer-engine[pytorch]>=2.11.0,<2.12.0",

🤖 Prompt for AI Agents

In `@pyproject.toml` around lines 117 - 121, Update the version constraint for transformer-engine[pytorch] to allow the 2.11 release to match the source override: locate the dependency constraint for transformer-engine[pytorch] (the existing "<2.10.0" constraint) and change it to an appropriate 2.11-compatible range (for example ">=2.11,<2.12") so it aligns with the [tool.uv.sources] transformer-engine entry that points to rev "release_v2.11"; alternatively, if you intended to stay on <2.10.0, change the [tool.uv.sources] transformer-engine rev to a matching release (e.g., a 2.9/2.10 tag) so the constraint and the source override are consistent.

Signed-off-by: root <root@pool0-00120.cm.cluster>

yaoyu-33 · 2026-01-26T18:48:15Z

/ok to test 022fcef

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-01-26T23:04:33Z

/ok to test 81ad475

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

…ron-Hub into chore/transformers_5p0

Use rope_theta_from_hf compat function in hf_config_to_provider_kwargs as fallback when CONFIG_MAPPING cannot find rope_theta as a direct attribute (transformers 5.0+ stores it in rope_parameters dict). Fix mock configs and test assertions accordingly. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>

yaoyu-33 · 2026-02-20T17:44:50Z

/ok to test 437a2ad

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

…icmethod Remove the staticmethod indirection on MegatronModelBridge for rope_theta_from_hf, rope_local_base_freq_from_hf, and rope_scaling_factor_from_hf. All call sites now import and call the functions directly from transformers_compat. Also remove unused get_common_configs from deepseek/common.py. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>

yaoyu-33 · 2026-02-20T18:00:34Z

/ok to test b5f6253

Fix config.json generation in GPT-OSS conversion test to use model.config.to_dict() instead of raw overrides, and update various functional tests for transformers 5.0 API changes. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>

yaoyu-33 · 2026-02-20T22:27:30Z

/ok to test 2e6bbe8

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # src/megatron/bridge/models/deepseek/common.py

yaoyu-33 · 2026-02-20T22:30:32Z

/ok to test 5b9cea4

…sformers 5.0+ In transformers 5.0+, Qwen2_5_VLConfig serializes with a nested structure where text model params (hidden_size, num_attention_heads) are under text_config rather than at the top level of config.json. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>

yaoyu-33 · 2026-02-23T04:03:02Z

/ok to test 05852e1

Transformers 5.0 renames rope_scaling to rope_parameters and uses rope_type instead of type. Update Qwen3 VL bridge and all related tests to prefer rope_parameters when available, falling back to rope_scaling for backward compatibility. Also fixes: add model_type to LlamaNemotron test config, use glob pattern for NemotronVL weight files, and reduce deepstack_visual_indexes to fit within PP-split layer counts. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>

yaoyu-33 · 2026-02-23T17:48:26Z

/ok to test ae12a45

…L MoE bridge Transformers <5.0 stored fused expert weights transposed as [num_experts, hidden_size, 2*intermediate_size], while transformers 5.0+ uses the standard nn.Linear convention [num_experts, 2*intermediate_size, hidden_size]. Use _align_weight_to_shape (same pattern as GLM MoE bridge) to auto-detect the layout and transpose only when necessary. Signed-off-by: Yuya Morimoto <ymorimoto@nvidia.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>

yaoyu-33 · 2026-02-23T23:42:30Z

/ok to test 432173f

Remove brittle rope_scaling assertion in llama_nemotron test and use glob pattern for safetensors filename in nemotron_vl test to handle changes in serialization shard naming. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # uv.lock

Signed-off-by: root <root@pool0-01847.cm.cluster>

yaoyu-33 · 2026-02-23T23:51:25Z

/ok to test c3c8be7

src/megatron/bridge/models/conversion/model_bridge.py

liding-nv · 2026-02-24T22:32:31Z

src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py

                return {}


+def _align_weight_to_shape(weight: torch.Tensor, target_shape: torch.Size, name: str) -> torch.Tensor:


Q: same func also defined in src/megatron/bridge/models/glm/glm_moe_mappings.py
do we expect each model has the same func in bridge?

also same a few other func like _uses_fused_experts

it's only added for specific models, I dont think it's going to be used after we fully migrate to 5.0, we might just keep one path

# Conflicts: # uv.lock

yaoyu-33 · 2026-02-25T16:44:41Z

/ok to test 5388a3c

transformers upgrade

8ccb17a

Signed-off-by: root <root@pool0-00120.cm.cluster>

yaoyu-33 requested a review from a team as a code owner January 26, 2026 18:35

copy-pr-bot bot had a problem deploying to nemo-ci January 26, 2026 18:37 Error

copy-pr-bot bot had a problem deploying to test January 26, 2026 18:37 Error

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

transformers upgrade

022fcef

Signed-off-by: root <root@pool0-00120.cm.cluster>

copy-pr-bot bot temporarily deployed to nemo-ci January 26, 2026 18:48 Inactive

copy-pr-bot bot temporarily deployed to test January 26, 2026 18:49 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci January 26, 2026 18:51 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 26, 2026 19:26 Failure

yaoyu-33 added 2 commits January 26, 2026 13:24

fix rope theta and other misc issues

588d3bb

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

fix glm 45 functional test

81ad475

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot bot temporarily deployed to nemo-ci January 26, 2026 23:05 Inactive

copy-pr-bot bot temporarily deployed to test January 26, 2026 23:05 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci January 26, 2026 23:22 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 26, 2026 23:29 Failure

copy-pr-bot bot temporarily deployed to nemo-ci January 27, 2026 04:10 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 27, 2026 04:21 Failure

copy-pr-bot bot temporarily deployed to nemo-ci January 27, 2026 04:21 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 27, 2026 04:21 Failure

copy-pr-bot bot temporarily deployed to nemo-ci January 27, 2026 04:21 Inactive

yaoyu-33 and others added 3 commits February 20, 2026 09:08

fix vlm models

96c7291

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

Merge branch 'chore/transformers_5p0' of github.com:NVIDIA-NeMo/Megat…

fcf858e

…ron-Hub into chore/transformers_5p0

yaoyu-33 and others added 3 commits February 20, 2026 10:50

Merge branch 'main' into chore/transformers_5p0

2414d37

revert unwanted change

5a99c0e

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

Merge remote-tracking branch 'origin/main' into chore/transformers_5p0

5b9cea4

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # src/megatron/bridge/models/deepseek/common.py

yaoyu-33 and others added 3 commits February 23, 2026 16:47

Merge remote-tracking branch 'origin/main' into chore/transformers_5p0

494b4b1

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # uv.lock

update uv.lock

c3c8be7

Signed-off-by: root <root@pool0-01847.cm.cluster>

liding-nv reviewed Feb 24, 2026

View reviewed changes

liding-nv previously approved these changes Feb 25, 2026

View reviewed changes

ananthsub previously approved these changes Feb 25, 2026

View reviewed changes

yaoyu-33 and others added 2 commits February 25, 2026 09:34

Merge branch 'main' into chore/transformers_5p0

d69cd23

# Conflicts: # uv.lock

update uv lock

5388a3c

chtruong814 approved these changes Feb 25, 2026

View reviewed changes

pcmoritz mentioned this pull request Apr 3, 2026

[dependencies] Upgrade transformers to >=5.0.0,<=5.3.0 NovaSky-AI/SkyRL#1426

Merged

		return {}


		def _align_weight_to_shape(weight: torch.Tensor, target_shape: torch.Size, name: str) -> torch.Tensor:

Conversation

yaoyu-33 commented Jan 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jan 26, 2026

Uh oh!

yaoyu-33 commented Jan 26, 2026

Uh oh!

coderabbitai bot commented Jan 26, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 commented Jan 26, 2026

Uh oh!

yaoyu-33 commented Jan 26, 2026

Uh oh!

yaoyu-33 commented Feb 20, 2026

Uh oh!

yaoyu-33 commented Feb 20, 2026

Uh oh!

yaoyu-33 commented Feb 20, 2026

Uh oh!

yaoyu-33 commented Feb 20, 2026

Uh oh!

yaoyu-33 commented Feb 23, 2026

Uh oh!

yaoyu-33 commented Feb 23, 2026

Uh oh!

yaoyu-33 commented Feb 23, 2026

Uh oh!

yaoyu-33 commented Feb 23, 2026

Uh oh!

Uh oh!

liding-nv Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

liding-nv Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yaoyu-33 commented Jan 26, 2026 •

edited by coderabbitai bot

Loading