Skip to content

nemotron-3-super model support#2912

Merged
yaoyu-33 merged 16 commits intomainfrom
super-v3-pr
Mar 26, 2026
Merged

nemotron-3-super model support#2912
yaoyu-33 merged 16 commits intomainfrom
super-v3-pr

Conversation

@liding-nv
Copy link
Copy Markdown
Contributor

@liding-nv liding-nv commented Mar 20, 2026

Add Nemotron-3 Super model support to Megatron Bridge

  • Add bridge, provider, and recipe for the Nemotron-3 Super model
  • Include packed parquet data format PR
  • Reorganize examples/models/nemotron_3/ into nano/ and super/ subdirectories. Super examples will be added later
  • Add initial GB300 performance benchmark scripts

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Nemotron 3 Super model support with pretraining, finetuning, and quantization-aware training capabilities
    • Introduced Parquet-backed packed sequence datasets for improved data handling
    • Added multi-token prediction (MTP) support for Mamba models
    • New mixed-precision recipe with NVFP4 quantization for the Super model
  • Improvements

    • Enhanced HuggingFace model loading with improved device placement strategy
    • Expanded Mixture of Experts (MoE) configuration options
    • New performance baseline configurations for GB300 hardware
  • Dependencies

    • Added optional pyarrow dependency for Parquet dataset support

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@liding-nv
Copy link
Copy Markdown
Contributor Author

/ok to test cbd87cc

@liding-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 34da894

@liding-nv
Copy link
Copy Markdown
Contributor Author

/claude review

Comment on lines +718 to +719
print(f"{input_ids.shape=}")
print(f"{input_ids=}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover debug prints. Consider removing before merge.

@liding-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 2130901

@liding-nv
Copy link
Copy Markdown
Contributor Author

/ok to test c908ebe

@liding-nv
Copy link
Copy Markdown
Contributor Author

/claude review

Signed-off-by: Li Ding <liding@nvidia.com>
Signed-off-by: Li Ding <liding@nvidia.com>
Signed-off-by: Li Ding <liding@nvidia.com>
Signed-off-by: Li Ding <liding@nvidia.com>
Signed-off-by: Li Ding <liding@nvidia.com>
Signed-off-by: Li Ding <liding@nvidia.com>
Signed-off-by: Li Ding <liding@nvidia.com>
@liding-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 13d58bb

Signed-off-by: Li Ding <liding@nvidia.com>
@liding-nv
Copy link
Copy Markdown
Contributor Author

/ok to test ce3d13e

Signed-off-by: Li Ding <liding@nvidia.com>
@liding-nv
Copy link
Copy Markdown
Contributor Author

/ok to test f313561

cuichenx
cuichenx previously approved these changes Mar 24, 2026
Copy link
Copy Markdown
Contributor

@cuichenx cuichenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
please follow up

  • can we merge nemotron training scripts with run_recipe.py
  • example launch scripts and W&B plots
  • add new MTP flags to MTP documentation

Signed-off-by: Li Ding <liding@nvidia.com>
@liding-nv
Copy link
Copy Markdown
Contributor Author

/ok to test af7187a

Signed-off-by: Li Ding <liding@nvidia.com>
@liding-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 62776b9

yaoyu-33
yaoyu-33 previously approved these changes Mar 25, 2026
Signed-off-by: Li Ding <liding@nvidia.com>
@liding-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 20955dd

Signed-off-by: Li Ding <liding@nvidia.com>
@liding-nv
Copy link
Copy Markdown
Contributor Author

/ok to test a7a52a5

Copy link
Copy Markdown
Contributor

@cuichenx cuichenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work ready-to-merge PR is approved, current, and only waiting for CI to pass before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants