Skip to content

[docs] add MTP guide#2138

Merged
yaoyu-33 merged 10 commits intomainfrom
chcui/mtp_docs
Feb 3, 2026
Merged

[docs] add MTP guide#2138
yaoyu-33 merged 10 commits intomainfrom
chcui/mtp_docs

Conversation

@cuichenx
Copy link
Copy Markdown
Contributor

@cuichenx cuichenx commented Jan 30, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Documentation
    • Added comprehensive guide for Multi-Token Prediction (MTP) training covering configuration parameters, usage examples, parameter tuning guidelines, loss monitoring techniques, pipeline parallelism considerations, and troubleshooting solutions.
    • Updated documentation structure to include the new MTP training guide in both the Training and Customization section and main Guides list.

Signed-off-by: Chen Cui <chcui@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Jan 30, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Chen Cui <chcui@nvidia.com>
@cuichenx cuichenx added the docs-only With great power comes great responsibility. label Feb 2, 2026
@cuichenx cuichenx marked this pull request as ready for review February 2, 2026 17:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 2, 2026

📝 Walkthrough

Walkthrough

A new documentation guide about Multi-Token Prediction (MTP) training was added to the documentation structure. The docs/index.md was updated to reference the new guide in two toctree sections, and the comprehensive guide file documenting MTP configuration, usage, monitoring, and troubleshooting was created.

Changes

Cohort / File(s) Summary
Documentation Index
docs/index.md
Added reference to new Multi-Token Prediction training guide in toctree sections.
New MTP Training Guide
docs/training/multi-token-prediction.md
New comprehensive guide covering MTP overview, configuration parameters (mtp_num_layers, mtp_loss_scaling_factor), usage examples, monitoring, pipeline parallelism considerations, and troubleshooting.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR contains only documentation changes (docs-only), which are minor and do not introduce code features, modify behavior, or affect numerics/convergence/performance.
Title check ✅ Passed The title '[docs] add MTP guide' is clear, specific, and directly describes the main change—adding documentation for Multi-Token Prediction training.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chcui/mtp_docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@docs/training/multi-token-prediction.md`:
- Line 80: Typo in the example config: change the string assigned to
config.dataset.path_to_cache from "/path/to/cacahe" to the correct
"/path/to/cache" so the example uses the correct "cache" spelling.
- Around line 82-83: Remove the trailing commas after the assignment statements
for config.mtp_num_layers and config.mtp_loss_scaling_factor (they are currently
written as "config.mtp_num_layers = 1," and "config.mtp_loss_scaling_factor =
0.1,") which produce Python syntax errors; update those lines to simple
assignments without the trailing commas so the config variables
(config.mtp_num_layers and config.mtp_loss_scaling_factor) are valid Python
statements.
- Around line 235-239: The doc contains absolute local filesystem links for the
"Megatron Core MTP API Guide" and "Pipeline Parallelism Guide"; update those
link targets in the multi-token-prediction section to use repository-relative
paths (remove the /Users/... prefix) so the links resolve for other devs and CI,
e.g., replace the absolute target for "Megatron Core MTP API Guide" with the
equivalent repo-relative path and do the same for "Pipeline Parallelism Guide"
in the same paragraph.
- Around line 221-229: The doc uses absolute local paths; update the three links
to repository-root relative paths instead (e.g., change
`/Users/chcui/PycharmProjects/Megatron-Bridge/src/megatron/...` to
`src/megatron/...` for DeepSeek-V3 (`deepseek_v3.py`) and Qwen3-Next
(`qwen3_next.py`), and change the third absolute path to
`3rdparty/Megatron-LM/megatron/core/transformer/multi_token_prediction.py`);
ensure the markdown link targets are corrected and verify links resolve in the
repo browser.
- Line 262: The GitHub Issues URL string
"https://github.com/NVIDIA/Megatron-Bridge/issues" is incorrect; update every
occurrence (including the instance shown) to
"https://github.com/NVIDIA-NeMo/Megatron-Bridge/issues" so links point to the
NVIDIA-NeMo/Megatron-Bridge repo—search for the exact URL literal in the docs
(e.g., in multi-token-prediction.md) and replace it with the corrected URL.
🧹 Nitpick comments (2)
docs/training/multi-token-prediction.md (2)

42-47: Add language specifier to fenced code block.

The fenced code block should specify a language for proper syntax highlighting. For pseudo-code formulas like this, use text or python as the language identifier.

📝 Proposed fix
-```
+```text
 total_loss = main_loss + (avg_mtp_loss * mtp_loss_scaling_factor)
 
 where:

121-123: Add language specifier to fenced code block.

The fenced code block containing log output should specify a language for proper rendering. Use text or log as the language identifier.

📝 Proposed fix
-```
+```text
 iteration      100/  300000 | consumed samples:         3200 | elapsed time per iteration (ms): 3738.6 | learning rate: 6.000000E-05 | global batch size:    32 | lm loss: 7.968678E+00 | load_balancing_loss: 1.329517E+00 | mtp_1 loss: 7.925096E+00 | loss scale: 1.0 | grad norm: 1.040 | number of skipped iterations:   0 | number of nan iterations:   0 |

cuichenx and others added 2 commits February 2, 2026 11:09
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu>
@cuichenx cuichenx changed the title add MTP guide [docs] add MTP guide Feb 2, 2026
@cuichenx cuichenx added the r0.3.0 Cherry-pick label for r0.3.0 release branch label Feb 2, 2026
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
@yaoyu-33 yaoyu-33 merged commit 1e03374 into main Feb 3, 2026
21 checks passed
@yaoyu-33 yaoyu-33 deleted the chcui/mtp_docs branch February 3, 2026 22:20
ko3n1g pushed a commit that referenced this pull request Feb 3, 2026
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@coderabbitai coderabbitai bot mentioned this pull request Feb 4, 2026
5 tasks
sowmen pushed a commit to sowmen/Megatron-Bridge that referenced this pull request Feb 11, 2026
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: sowmen <sowmendipta@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-only With great power comes great responsibility. r0.3.0 Cherry-pick label for r0.3.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants