Conversation
Signed-off-by: Chen Cui <chcui@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Signed-off-by: Chen Cui <chcui@nvidia.com>
📝 WalkthroughWalkthroughA new documentation guide about Multi-Token Prediction (MTP) training was added to the documentation structure. The docs/index.md was updated to reference the new guide in two toctree sections, and the comprehensive guide file documenting MTP configuration, usage, monitoring, and troubleshooting was created. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@docs/training/multi-token-prediction.md`:
- Line 80: Typo in the example config: change the string assigned to
config.dataset.path_to_cache from "/path/to/cacahe" to the correct
"/path/to/cache" so the example uses the correct "cache" spelling.
- Around line 82-83: Remove the trailing commas after the assignment statements
for config.mtp_num_layers and config.mtp_loss_scaling_factor (they are currently
written as "config.mtp_num_layers = 1," and "config.mtp_loss_scaling_factor =
0.1,") which produce Python syntax errors; update those lines to simple
assignments without the trailing commas so the config variables
(config.mtp_num_layers and config.mtp_loss_scaling_factor) are valid Python
statements.
- Around line 235-239: The doc contains absolute local filesystem links for the
"Megatron Core MTP API Guide" and "Pipeline Parallelism Guide"; update those
link targets in the multi-token-prediction section to use repository-relative
paths (remove the /Users/... prefix) so the links resolve for other devs and CI,
e.g., replace the absolute target for "Megatron Core MTP API Guide" with the
equivalent repo-relative path and do the same for "Pipeline Parallelism Guide"
in the same paragraph.
- Around line 221-229: The doc uses absolute local paths; update the three links
to repository-root relative paths instead (e.g., change
`/Users/chcui/PycharmProjects/Megatron-Bridge/src/megatron/...` to
`src/megatron/...` for DeepSeek-V3 (`deepseek_v3.py`) and Qwen3-Next
(`qwen3_next.py`), and change the third absolute path to
`3rdparty/Megatron-LM/megatron/core/transformer/multi_token_prediction.py`);
ensure the markdown link targets are corrected and verify links resolve in the
repo browser.
- Line 262: The GitHub Issues URL string
"https://github.com/NVIDIA/Megatron-Bridge/issues" is incorrect; update every
occurrence (including the instance shown) to
"https://github.com/NVIDIA-NeMo/Megatron-Bridge/issues" so links point to the
NVIDIA-NeMo/Megatron-Bridge repo—search for the exact URL literal in the docs
(e.g., in multi-token-prediction.md) and replace it with the corrected URL.
🧹 Nitpick comments (2)
docs/training/multi-token-prediction.md (2)
42-47: Add language specifier to fenced code block.The fenced code block should specify a language for proper syntax highlighting. For pseudo-code formulas like this, use
textorpythonas the language identifier.📝 Proposed fix
-``` +```text total_loss = main_loss + (avg_mtp_loss * mtp_loss_scaling_factor) where:
121-123: Add language specifier to fenced code block.The fenced code block containing log output should specify a language for proper rendering. Use
textorlogas the language identifier.📝 Proposed fix
-``` +```text iteration 100/ 300000 | consumed samples: 3200 | elapsed time per iteration (ms): 3738.6 | learning rate: 6.000000E-05 | global batch size: 32 | lm loss: 7.968678E+00 | load_balancing_loss: 1.329517E+00 | mtp_1 loss: 7.925096E+00 | loss scale: 1.0 | grad norm: 1.040 | number of skipped iterations: 0 | number of nan iterations: 0 |
…ge into chcui/mtp_docs
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Chen Cui <cxcui@alumni.cmu.edu> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: sowmen <sowmendipta@gmail.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit