Skip to content

[build] chore: Upgrade transformers to 5.0#2068

Merged
chtruong814 merged 34 commits intomainfrom
chore/transformers_5p0
Feb 25, 2026
Merged

[build] chore: Upgrade transformers to 5.0#2068
chtruong814 merged 34 commits intomainfrom
chore/transformers_5p0

Conversation

@yaoyu-33
Copy link
Copy Markdown
Contributor

@yaoyu-33 yaoyu-33 commented Jan 26, 2026

Upgrade transformers dependency to version 5.0

Summary by CodeRabbit

  • Chores
    • Updated transformer library dependencies to newer versions.
    • Updated transformer-engine to the latest release version.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: root <root@pool0-00120.cm.cluster>
@yaoyu-33 yaoyu-33 requested a review from a team as a code owner January 26, 2026 18:35
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Jan 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 8ccb17a

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 26, 2026

📝 Walkthrough

Walkthrough

Updated project dependencies in pyproject.toml by bumping the transformers package from version constraint <5.0.0 to >=5.0.0, and updated the transformer-engine source from a specific commit hash to the release_v2.11 tag.

Changes

Cohort / File(s) Summary
Dependency Version Updates
pyproject.toml
Updated transformers constraint from <5.0.0 to >=5.0.0; changed transformer-engine source rev from commit 6a34b6574fa6c29d9d07fdcddf9812cbb1488878 to release tag release_v2.11

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR upgrades transformers from <5.0.0 to >=5.0.0 (major version change) but lacks test results, validation, or evidence of testing in description. Update PR description with concrete test results, evidence of correct model outputs/numerics with transformers 5.0.0, and confirmation existing tests pass.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[build] chore: Upgrade transformers to 5.0' clearly and specifically summarizes the main change: upgrading the transformers dependency to version 5.0, which aligns perfectly with the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Around line 117-121: Update the version constraint for
transformer-engine[pytorch] to allow the 2.11 release to match the source
override: locate the dependency constraint for transformer-engine[pytorch] (the
existing "<2.10.0" constraint) and change it to an appropriate 2.11-compatible
range (for example ">=2.11,<2.12") so it aligns with the [tool.uv.sources]
transformer-engine entry that points to rev "release_v2.11"; alternatively, if
you intended to stay on <2.10.0, change the [tool.uv.sources] transformer-engine
rev to a matching release (e.g., a 2.9/2.10 tag) so the constraint and the
source override are consistent.
🧹 Nitpick comments (1)
pyproject.toml (1)

68-81: Add an upper bound for the transformers major version.

Line 69 allows future majors (e.g., 6.0+) which could introduce breaking changes outside this PR's scope. Transformers v5.0.0 (released January 26, 2026) contains significant breaking changes including PyTorch-only focus, tokenizer refactoring, and new weight-loading APIs. Pin to the v5.x line to allow patch and minor updates while preventing major version surprises.

🔧 Proposed adjustment
-    "transformers>=5.0.0",
+    "transformers>=5.0.0,<6.0.0",

pyproject.toml Outdated
Comment on lines 117 to 121
# uv.sources allows us to override dependencies with VCS commits.
# Lets use this only for debugging purposes, but not for production (main).
[tool.uv.sources]
transformer-engine = { git = "https://github.com/NVIDIA/TransformerEngine.git", rev = "6a34b6574fa6c29d9d07fdcddf9812cbb1488878" }
transformer-engine = { git = "https://github.com/NVIDIA/TransformerEngine.git", rev = "release_v2.11" }
megatron-core = { path = "3rdparty/Megatron-LM/" }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and examine pyproject.toml to verify the version constraints
find . -name "pyproject.toml" -type f | head -5

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 87


🏁 Script executed:

# Read the relevant lines from pyproject.toml (lines 110-125 to get full context)
sed -n '110,125p' pyproject.toml

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 959


🌐 Web query:

TransformerEngine release_v2.11 git tag version number package version

💡 Result:

The Git tag release_v2.11 corresponds to Transformer Engine package version 2.11.0 (i.e., “2.11” release line with patch .0). [1], [2]


Resolve transformer-engine version mismatch between source rev and override constraint.

Line 114 constrains transformer-engine[pytorch] to <2.10.0, but line 120 specifies a source override pointing to release_v2.11 (which corresponds to version 2.11.0). This creates a constraint violation that will cause resolution failures or force silent downgrades. Align the override with the intended 2.11 release.

Example alignment for 2.11.x
-    "transformer-engine[pytorch]>=2.9.0a0,<2.10.0",
+    "transformer-engine[pytorch]>=2.11.0,<2.12.0",
🤖 Prompt for AI Agents
In `@pyproject.toml` around lines 117 - 121, Update the version constraint for
transformer-engine[pytorch] to allow the 2.11 release to match the source
override: locate the dependency constraint for transformer-engine[pytorch] (the
existing "<2.10.0" constraint) and change it to an appropriate 2.11-compatible
range (for example ">=2.11,<2.12") so it aligns with the [tool.uv.sources]
transformer-engine entry that points to rev "release_v2.11"; alternatively, if
you intended to stay on <2.10.0, change the [tool.uv.sources] transformer-engine
rev to a matching release (e.g., a 2.9/2.10 tag) so the constraint and the
source override are consistent.

Signed-off-by: root <root@pool0-00120.cm.cluster>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 022fcef

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 81ad475

yaoyu-33 and others added 3 commits February 20, 2026 09:08
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Use rope_theta_from_hf compat function in hf_config_to_provider_kwargs
as fallback when CONFIG_MAPPING cannot find rope_theta as a direct
attribute (transformers 5.0+ stores it in rope_parameters dict).
Fix mock configs and test assertions accordingly.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 437a2ad

yaoyu-33 and others added 3 commits February 20, 2026 10:50
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
…icmethod

Remove the staticmethod indirection on MegatronModelBridge for
rope_theta_from_hf, rope_local_base_freq_from_hf, and
rope_scaling_factor_from_hf. All call sites now import and call the
functions directly from transformers_compat.

Also remove unused get_common_configs from deepseek/common.py.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test b5f6253

Fix config.json generation in GPT-OSS conversion test to use
model.config.to_dict() instead of raw overrides, and update
various functional tests for transformers 5.0 API changes.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 2e6bbe8

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	src/megatron/bridge/models/deepseek/common.py
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 5b9cea4

…sformers 5.0+

In transformers 5.0+, Qwen2_5_VLConfig serializes with a nested
structure where text model params (hidden_size, num_attention_heads)
are under text_config rather than at the top level of config.json.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 05852e1

Transformers 5.0 renames rope_scaling to rope_parameters and uses
rope_type instead of type. Update Qwen3 VL bridge and all related
tests to prefer rope_parameters when available, falling back to
rope_scaling for backward compatibility.

Also fixes: add model_type to LlamaNemotron test config, use glob
pattern for NemotronVL weight files, and reduce deepstack_visual_indexes
to fit within PP-split layer counts.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test ae12a45

…L MoE bridge

Transformers <5.0 stored fused expert weights transposed as
[num_experts, hidden_size, 2*intermediate_size], while transformers 5.0+
uses the standard nn.Linear convention [num_experts, 2*intermediate_size,
hidden_size]. Use _align_weight_to_shape (same pattern as GLM MoE bridge)
to auto-detect the layout and transpose only when necessary.

Signed-off-by: Yuya Morimoto <ymorimoto@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 432173f

yaoyu-33 and others added 3 commits February 23, 2026 16:47
Remove brittle rope_scaling assertion in llama_nemotron test and use glob
pattern for safetensors filename in nemotron_vl test to handle changes in
serialization shard naming.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	uv.lock
Signed-off-by: root <root@pool0-01847.cm.cluster>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test c3c8be7

return {}


def _align_weight_to_shape(weight: torch.Tensor, target_shape: torch.Size, name: str) -> torch.Tensor:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: same func also defined in src/megatron/bridge/models/glm/glm_moe_mappings.py
do we expect each model has the same func in bridge?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also same a few other func like _uses_fused_experts

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only added for specific models, I dont think it's going to be used after we fully migrate to 5.0, we might just keep one path

liding-nv
liding-nv previously approved these changes Feb 25, 2026
ananthsub
ananthsub previously approved these changes Feb 25, 2026
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 5388a3c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade to Transformers V5

4 participants