cp: Fix DeepSeek-V3 H100 large scale config (2401) into r0.3.0#2483
cp: Fix DeepSeek-V3 H100 large scale config (2401) into r0.3.0#2483svcnvidia-nemo-ci wants to merge 1 commit intor0.3.0from
Fix DeepSeek-V3 H100 large scale config (2401) into r0.3.0#2483Conversation
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
|
/ok to test 3b1fed8 |
📝 WalkthroughWalkthroughThis PR adds two configuration fields to the DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE configuration: virtual_pipeline_model_parallel_size set to 2 and pp_layout set to None. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py (1)
228-233: Fix is correct and mirrors the SC-V2 parallelism settings.
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALEwas inheritingvirtual_pipeline_model_parallel_size=4andpp_layout="Et|(tt|)*30mL"fromDEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V1. The explicit overrides tovpp=2andpp_layout=Nonecorrectly align the LARGE_SCALE config with the SC variant's intended pipeline settings, consistent with whatDEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2already does.Optional: since
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2already carries both SC-specific overrides, the LARGE_SCALE config could derive from it directly to avoid re-stating them:♻️ Optional refactor: derive from SC V2
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE = replace( - DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V1, + DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2, global_batch_size=1024, - virtual_pipeline_model_parallel_size=2, - pp_layout=None, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py` around lines 228 - 233, The LARGE_SCALE config currently replaces DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V1 then overrides virtual_pipeline_model_parallel_size and pp_layout to match SC V2; to simplify and avoid re-stating SC-specific overrides, change the base to DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2 (i.e., create DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE by calling replace(DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2, global_batch_size=1024, virtual_pipeline_model_parallel_size=2, pp_layout=None) or remove redundant overrides if SC_V2 already sets them) so LARGE_SCALE directly derives SC settings from the SC V2 symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`:
- Around line 228-233: The LARGE_SCALE config currently replaces
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V1 then overrides
virtual_pipeline_model_parallel_size and pp_layout to match SC V2; to simplify
and avoid re-stating SC-specific overrides, change the base to
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2 (i.e., create
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE by calling
replace(DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2, global_batch_size=1024,
virtual_pipeline_model_parallel_size=2, pp_layout=None) or remove redundant
overrides if SC_V2 already sets them) so LARGE_SCALE directly derives SC
settings from the SC V2 symbol.
|
/ok to test 3b1fed8 |
|
Merge via #2509 |
beep boop [🤖]: Hi @scsudhakaran 👋,
Summary by CodeRabbit