[docs] add MTP guide by cuichenx · Pull Request #2138 · NVIDIA-NeMo/Megatron-Bridge

cuichenx · 2026-01-30T01:34:28Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Documentation
- Added comprehensive guide for Multi-Token Prediction (MTP) training covering configuration parameters, usage examples, parameter tuning guidelines, loss monitoring techniques, pipeline parallelism considerations, and troubleshooting solutions.
- Updated documentation structure to include the new MTP training guide in both the Training and Customization section and main Guides list.

Signed-off-by: Chen Cui <chcui@nvidia.com>

copy-pr-bot · 2026-01-30T01:34:31Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Chen Cui <chcui@nvidia.com>

coderabbitai · 2026-02-02T17:53:42Z

📝 Walkthrough

Walkthrough

A new documentation guide about Multi-Token Prediction (MTP) training was added to the documentation structure. The docs/index.md was updated to reference the new guide in two toctree sections, and the comprehensive guide file documenting MTP configuration, usage, monitoring, and troubleshooting was created.

Changes

Cohort / File(s)	Summary
Documentation Index `docs/index.md`	Added reference to new Multi-Token Prediction training guide in toctree sections.
New MTP Training Guide `docs/training/multi-token-prediction.md`	New comprehensive guide covering MTP overview, configuration parameters (mtp_num_layers, mtp_loss_scaling_factor), usage examples, monitoring, pipeline parallelism considerations, and troubleshooting.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	PR contains only documentation changes (docs-only), which are minor and do not introduce code features, modify behavior, or affect numerics/convergence/performance.
Title check	✅ Passed	The title '[docs] add MTP guide' is clear, specific, and directly describes the main change—adding documentation for Multi-Token Prediction training.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chcui/mtp_docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@docs/training/multi-token-prediction.md`:
- Line 80: Typo in the example config: change the string assigned to
config.dataset.path_to_cache from "/path/to/cacahe" to the correct
"/path/to/cache" so the example uses the correct "cache" spelling.
- Around line 82-83: Remove the trailing commas after the assignment statements
for config.mtp_num_layers and config.mtp_loss_scaling_factor (they are currently
written as "config.mtp_num_layers = 1," and "config.mtp_loss_scaling_factor =
0.1,") which produce Python syntax errors; update those lines to simple
assignments without the trailing commas so the config variables
(config.mtp_num_layers and config.mtp_loss_scaling_factor) are valid Python
statements.
- Around line 235-239: The doc contains absolute local filesystem links for the
"Megatron Core MTP API Guide" and "Pipeline Parallelism Guide"; update those
link targets in the multi-token-prediction section to use repository-relative
paths (remove the /Users/... prefix) so the links resolve for other devs and CI,
e.g., replace the absolute target for "Megatron Core MTP API Guide" with the
equivalent repo-relative path and do the same for "Pipeline Parallelism Guide"
in the same paragraph.
- Around line 221-229: The doc uses absolute local paths; update the three links
to repository-root relative paths instead (e.g., change
`/Users/chcui/PycharmProjects/Megatron-Bridge/src/megatron/...` to
`src/megatron/...` for DeepSeek-V3 (`deepseek_v3.py`) and Qwen3-Next
(`qwen3_next.py`), and change the third absolute path to
`3rdparty/Megatron-LM/megatron/core/transformer/multi_token_prediction.py`);
ensure the markdown link targets are corrected and verify links resolve in the
repo browser.
- Line 262: The GitHub Issues URL string
"https://github.com/NVIDIA/Megatron-Bridge/issues" is incorrect; update every
occurrence (including the instance shown) to
"https://github.com/NVIDIA-NeMo/Megatron-Bridge/issues" so links point to the
NVIDIA-NeMo/Megatron-Bridge repo—search for the exact URL literal in the docs
(e.g., in multi-token-prediction.md) and replace it with the corrected URL.

🧹 Nitpick comments (2)

docs/training/multi-token-prediction.md (2)
42-47: Add language specifier to fenced code block.

The fenced code block should specify a language for proper syntax highlighting. For pseudo-code formulas like this, use text or python as the language identifier.
📝 Proposed fix
-```
+```text
 total_loss = main_loss + (avg_mtp_loss * mtp_loss_scaling_factor)
 
 where:
121-123: Add language specifier to fenced code block.

The fenced code block containing log output should specify a language for proper rendering. Use text or log as the language identifier.
📝 Proposed fix
-```
+```text
 iteration      100/  300000 | consumed samples:         3200 | elapsed time per iteration (ms): 3738.6 | learning rate: 6.000000E-05 | global batch size:    32 | lm loss: 7.968678E+00 | load_balancing_loss: 1.329517E+00 | mtp_1 loss: 7.925096E+00 | loss scale: 1.0 | grad norm: 1.040 | number of skipped iterations:   0 | number of nan iterations:   0 |

docs/training/multi-token-prediction.md

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu>

Signed-off-by: Chen Cui <chcui@nvidia.com>

…ge into chcui/mtp_docs

docs/training/multi-token-prediction.md

Signed-off-by: Chen Cui <chcui@nvidia.com>

Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: sowmen <sowmendipta@gmail.com>

add MTP guide

9744622

Signed-off-by: Chen Cui <chcui@nvidia.com>

update with training plots

acb3d68

Signed-off-by: Chen Cui <chcui@nvidia.com>

cuichenx added the docs-only With great power comes great responsibility. label Feb 2, 2026

cuichenx marked this pull request as ready for review February 2, 2026 17:49

copy-pr-bot bot temporarily deployed to nemo-ci February 2, 2026 17:50 Inactive

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

cuichenx and others added 2 commits February 2, 2026 11:09

fix typo

db376e1

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu>

fix typo

e3d70f4

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu>

copy-pr-bot bot temporarily deployed to nemo-ci February 2, 2026 19:11 Inactive

cuichenx added 2 commits February 2, 2026 11:21

fix paths

b2d57d4

Signed-off-by: Chen Cui <chcui@nvidia.com>

Merge branch 'chcui/mtp_docs' of github.com:NVIDIA-NeMo/Megatron-Brid…

3992fb1

…ge into chcui/mtp_docs

copy-pr-bot bot temporarily deployed to nemo-ci February 2, 2026 19:22 Inactive

cuichenx changed the title ~~add MTP guide~~ [docs] add MTP guide Feb 2, 2026

cuichenx added the r0.3.0 Cherry-pick label for r0.3.0 release branch label Feb 2, 2026

Merge branch 'main' into chcui/mtp_docs

feadff4

copy-pr-bot bot temporarily deployed to nemo-ci February 2, 2026 23:40 Inactive

yaoyu-33 reviewed Feb 3, 2026

View reviewed changes

docs/training/multi-token-prediction.md Outdated Show resolved Hide resolved

yaoyu-33 reviewed Feb 3, 2026

View reviewed changes

docs/training/multi-token-prediction.md Outdated Show resolved Hide resolved

address comments

2ba1e11

Signed-off-by: Chen Cui <chcui@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci February 3, 2026 06:32 Inactive

fix link

569c42c

Signed-off-by: Chen Cui <chcui@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci February 3, 2026 06:43 Inactive

fix link

da60c12

Signed-off-by: Chen Cui <chcui@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci February 3, 2026 06:57 Inactive

yaoyu-33 approved these changes Feb 3, 2026

View reviewed changes

yaoyu-33 merged commit 1e03374 into main Feb 3, 2026
21 checks passed

yaoyu-33 deleted the chcui/mtp_docs branch February 3, 2026 22:20

coderabbitai bot mentioned this pull request Feb 4, 2026

Fix typo in MTP doc #2222

Merged

5 tasks

coderabbitai bot mentioned this pull request Feb 12, 2026

[fix] Example scripts miscellaneous enhancement #2362

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] add MTP guide#2138

[docs] add MTP guide#2138
yaoyu-33 merged 10 commits intomainfrom
chcui/mtp_docs

cuichenx commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Jan 30, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cuichenx commented Jan 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jan 30, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cuichenx commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 2, 2026 •

edited

Loading