Conversation
Signed-off-by: Malay Nagda <malayn@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughUpdates DeepSeek V3 workload base configuration variants by replacing simple aliases with explicit Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`:
- Around line 55-66: DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 currently
inherits a pp_layout from DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1 that assumes
(pipeline_model_parallel_size=2, virtual_pipeline_model_parallel_size=8); update
the replace call for DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 to either set
pp_layout explicitly for the new (pipeline_model_parallel_size=4,
virtual_pipeline_model_parallel_size=4) configuration or remove pp_layout from
the replace override so the framework can auto-compute the correct layout based
on the updated pipeline_model_parallel_size and
virtual_pipeline_model_parallel_size values.
| DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 = replace( | ||
| DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1, | ||
| micro_batch_size=1, | ||
| pipeline_model_parallel_size=4, | ||
| virtual_pipeline_model_parallel_size=4, | ||
| expert_model_parallel_size=64, | ||
| moe_flex_dispatcher_backend="hybridep", | ||
| moe_a2a_overlap=False, | ||
| cuda_graph_impl="transformer_engine", | ||
| cuda_graph_scope=["attn", "moe_router", "moe_preprocess"], | ||
| recompute_modules=["moe_act"], | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, read the target file to understand the configuration structure
cd scripts/performance/configs/deepseek || exit 1
cat -n deepseek_workload_base_configs.py | head -80Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 3435
🏁 Script executed:
# Search for pp_layout usage across the codebase to understand how it's validated/used
rg -n -C 3 'pp_layout' --type=py | head -100Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 4348
🏁 Script executed:
# Look for validation logic that checks compatibility between pp_layout and parallelism params
rg -n -C 5 'pipeline_model_parallel_size|virtual_pipeline_model_parallel_size' --type=py | grep -C 3 'pp_layout' | head -50Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 3548
🏁 Script executed:
# Read more of the file to see other related configurations and understand the pattern
sed -n '55,160p' scripts/performance/configs/deepseek/deepseek_workload_base_configs.py | cat -nRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 5120
🏁 Script executed:
# Check the set_deepseek_v3_pipeline_model_parallel_layout function to understand what it does
rg -n -A 20 'def set_deepseek_v3_pipeline_model_parallel_layout' --type=pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 2579
🏁 Script executed:
# Check if there are any tests or validation that would catch this mismatch
rg -n 'pp_layout.*Et\*4' --type=pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 179
🏁 Script executed:
# Verify that (4, 4) is indeed the correct size for BF16_V1
grep -n 'pipeline_model_parallel_size=4' scripts/performance/configs/deepseek/deepseek_workload_base_configs.py | head -20Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 290
Override pp_layout in DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 for the new parallelism configuration.
BF16_V1 inherits pp_layout="Et*4|(t*4|)*14tmL" from GB300_V1, which is designed for pipeline_model_parallel_size=2 and virtual_pipeline_model_parallel_size=8. However, BF16_V1 changes both to 4 without overriding pp_layout. The configuration framework will use the inherited layout string instead of auto-computing the correct one for (4, 4), leading to incorrect layer distribution across pipeline stages at runtime.
Either explicitly set pp_layout to match the (4, 4) configuration or remove it entirely to trigger auto-computation based on the updated parallelism parameters.
🤖 Prompt for AI Agents
In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`
around lines 55 - 66, DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 currently
inherits a pp_layout from DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1 that assumes
(pipeline_model_parallel_size=2, virtual_pipeline_model_parallel_size=8); update
the replace call for DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 to either set
pp_layout explicitly for the new (pipeline_model_parallel_size=4,
virtual_pipeline_model_parallel_size=4) configuration or remove pp_layout from
the replace override so the framework can auto-compute the correct layout based
on the updated pipeline_model_parallel_size and
virtual_pipeline_model_parallel_size values.
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com> Signed-off-by: sowmen <sowmendipta@gmail.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit