Skip to content

[for #1766] fix fc1 gate/up split with TP & fix expert layers export when EP > 1#1817

Merged
yaoyu-33 merged 4 commits intoNVIDIA-NeMo:bridge/peft_bridge_1from
HollowMan6:peft_bridge_gate_up
Dec 29, 2025
Merged

[for #1766] fix fc1 gate/up split with TP & fix expert layers export when EP > 1#1817
yaoyu-33 merged 4 commits intoNVIDIA-NeMo:bridge/peft_bridge_1from
HollowMan6:peft_bridge_gate_up

Conversation

@HollowMan6
Copy link
Copy Markdown
Contributor

@HollowMan6 HollowMan6 commented Dec 29, 2025

What does this PR do ?

For fixing expert layers export when EP > 1, after the fix, with TP=4, PP=1, EP=8, ETP=1, CP=1, the training inference mismatch is also stabilizing:

image

For fixing fc1 gate/up split with TP:
Previous method contains errors, although it can converge, it gradually turns into training inference diverges and eventually make the training collapse.

image

After the fix, now it works fine!

image

Changelog

  • Correctly split fused FC1 LoRA linear_out into gate/up with TP-aware ordering

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Copilot AI review requested due to automatic review settings December 29, 2025 00:46
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Dec 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in the splitting of fused FC1 (gate/up) LoRA weights when Tensor Parallelism (TP) is enabled. The previous implementation incorrectly split the fused weights with a simple bisection, which didn't account for TP-aware weight ordering. This caused training to diverge and eventually collapse. The fix properly handles the interleaved shard structure where each TP shard contains both gate and up components.

Key changes:

  • Added _split_fused_fc1_linear_out_weight method that implements TP-aware splitting logic
  • Updated _get_fused_adapter_linear_out_slices and _merge_lora_adapter_weights to use the new splitting method
  • Properly threaded is_expert parameter through the call chain to handle both expert and non-expert TP configurations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@HollowMan6 HollowMan6 changed the title [for #1766] fix fc1 gate/up split with TP [for #1766] fix fc1 gate/up split with TP & fix expert layers export when EP > 1 Dec 29, 2025
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yaoyu-33 yaoyu-33 merged commit 48507fd into NVIDIA-NeMo:bridge/peft_bridge_1 Dec 29, 2025
7 checks passed
@HollowMan6 HollowMan6 deleted the peft_bridge_gate_up branch December 30, 2025 00:17
@HollowMan6 HollowMan6 mentioned this pull request Mar 13, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants