Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,18 @@
cuda_graph_scope=[],
recompute_modules=["mla_up_proj"],
)
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 = replace(
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1,
micro_batch_size=1,
pipeline_model_parallel_size=4,
virtual_pipeline_model_parallel_size=4,
expert_model_parallel_size=64,
moe_flex_dispatcher_backend="hybridep",
moe_a2a_overlap=False,
cuda_graph_impl="transformer_engine",
cuda_graph_scope=["attn", "moe_router", "moe_preprocess"],
recompute_modules=["moe_act"],
)
Comment on lines +55 to +66
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, read the target file to understand the configuration structure
cd scripts/performance/configs/deepseek || exit 1
cat -n deepseek_workload_base_configs.py | head -80

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3435


🏁 Script executed:

# Search for pp_layout usage across the codebase to understand how it's validated/used
rg -n -C 3 'pp_layout' --type=py | head -100

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 4348


🏁 Script executed:

# Look for validation logic that checks compatibility between pp_layout and parallelism params
rg -n -C 5 'pipeline_model_parallel_size|virtual_pipeline_model_parallel_size' --type=py | grep -C 3 'pp_layout' | head -50

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3548


🏁 Script executed:

# Read more of the file to see other related configurations and understand the pattern
sed -n '55,160p' scripts/performance/configs/deepseek/deepseek_workload_base_configs.py | cat -n

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 5120


🏁 Script executed:

# Check the set_deepseek_v3_pipeline_model_parallel_layout function to understand what it does
rg -n -A 20 'def set_deepseek_v3_pipeline_model_parallel_layout' --type=py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2579


🏁 Script executed:

# Check if there are any tests or validation that would catch this mismatch
rg -n 'pp_layout.*Et\*4' --type=py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 179


🏁 Script executed:

# Verify that (4, 4) is indeed the correct size for BF16_V1
grep -n 'pipeline_model_parallel_size=4' scripts/performance/configs/deepseek/deepseek_workload_base_configs.py | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 290


Override pp_layout in DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 for the new parallelism configuration.

BF16_V1 inherits pp_layout="Et*4|(t*4|)*14tmL" from GB300_V1, which is designed for pipeline_model_parallel_size=2 and virtual_pipeline_model_parallel_size=8. However, BF16_V1 changes both to 4 without overriding pp_layout. The configuration framework will use the inherited layout string instead of auto-computing the correct one for (4, 4), leading to incorrect layer distribution across pipeline stages at runtime.

Either explicitly set pp_layout to match the (4, 4) configuration or remove it entirely to trigger auto-computation based on the updated parallelism parameters.

🤖 Prompt for AI Agents
In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`
around lines 55 - 66, DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 currently
inherits a pp_layout from DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1 that assumes
(pipeline_model_parallel_size=2, virtual_pipeline_model_parallel_size=8); update
the replace call for DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1 to either set
pp_layout explicitly for the new (pipeline_model_parallel_size=4,
virtual_pipeline_model_parallel_size=4) configuration or remove pp_layout from
the replace override so the framework can auto-compute the correct layout based
on the updated pipeline_model_parallel_size and
virtual_pipeline_model_parallel_size values.

DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_CS_V1 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_MX_V1 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_NVFP4_V1 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1
Expand Down Expand Up @@ -131,7 +142,10 @@
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V1,
global_batch_size=4096,
)
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V2 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V2
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V2 = replace(
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1,
global_batch_size=4096,
)
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_CS_V2 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V2
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_MX_V2 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V2
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_NVFP4_V2 = DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_V2
Expand Down Expand Up @@ -183,7 +197,7 @@
# =============================================================================

DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_MX_LARGE_SCALE = replace(
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_FP8_MX_V1,
DEEPSEEK_V3_PRETRAIN_CONFIG_GB300_BF16_V1,
global_batch_size=256,
)

Expand Down
Loading