-
Notifications
You must be signed in to change notification settings - Fork 300
perf: Add DeepEP support to Megatron Policy #1645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b82a643
0488cee
32a6794
1143d7a
d148d8c
76ddbe1
527f37e
285a329
4f2e164
789bda4
105a5cc
250f34e
afadf3e
87c55e2
56dba8e
b527f27
21d75b2
f952f78
8c9ae9f
d3442b4
7fbc72e
0419429
9108663
78917e0
78d182c
71a7fa8
8c492ff
4051123
3e4bdcf
a2580e2
90e14ee
121dcf1
f63f268
6e4c7d3
d3e219b
0032a2d
c3ca1d4
512bef9
d4683ea
d443965
ff48f85
9b5da01
b66ec93
35c7df9
6e6c476
0b5d72f
454c304
1ba733d
c66bc7a
a8351c7
0ff0879
6b34413
eb596a4
7377e5f
f15e7e8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -92,6 +92,9 @@ policy: | |
| # gives ~25% training perf speedup with sequence packing and apply_rope_fusion | ||
| bias_activation_fusion: True | ||
| moe_per_layer_logging: False | ||
| moe_enable_deepep: false | ||
| moe_token_dispatcher_type: "allgather" | ||
| moe_shared_expert_overlap: false | ||
|
Comment on lines
+95
to
+97
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Search for TypedDict definitions and MoE-related config keys
echo "=== Searching for MegatronConfig TypedDict definition ==="
find . -name "*.py" -type f ! -path "./tests/*" ! -path "./.git/*" | xargs grep -l "class MegatronConfig" | head -5
echo -e "\n=== Searching for MoE config keys in Python files ==="
rg -n "(moe_enable_deepep|moe_token_dispatcher_type|moe_shared_expert_overlap)" --type=py | head -20
echo -e "\n=== Searching for MoE-related TypedDict or config definitions ==="
rg -n -B2 -A10 "moe_enable_deepep|moe_token_dispatcher_type" --type=py | head -40Repository: NVIDIA-NeMo/RL Length of output: 2695 🏁 Script executed: #!/bin/bash
# Examine the PolicyConfig TypedDict definition in the policy module
echo "=== Reading PolicyConfig TypedDict definition ==="
cat nemo_rl/models/policy/__init__.py | head -200
echo -e "\n=== Searching for MegatronConfig definition ==="
rg -n -A30 "class MegatronConfig.*TypedDict" --type=pyRepository: NVIDIA-NeMo/RL Length of output: 9554 🏁 Script executed: #!/bin/bash
# Check the full MegatronConfig TypedDict to see if we missed the MoE keys
echo "=== Reading full MegatronConfig definition ==="
rg -n "class MegatronConfig" -A 40 --type=py nemo_rl/models/policy/__init__.pyRepository: NVIDIA-NeMo/RL Length of output: 2137 🏁 Script executed: #!/bin/bash
# Final verification - check if these keys are defined anywhere in TypedDict context
echo "=== Checking if MoE keys appear in any TypedDict definitions ==="
rg -n "moe_enable_deepep|moe_token_dispatcher_type|moe_shared_expert_overlap" --type=py -B3 -A3
echo -e "\n=== Check the actual YAML file to see the context ==="
head -100 examples/configs/sft_openmathinstruct2_megatron.yaml | tail -20Repository: NVIDIA-NeMo/RL Length of output: 2417 Add MoE configuration keys to MegatronConfig TypedDict definition. The keys Add these keys to the TypedDict with:
For reference: 🤖 Prompt for AI Agents |
||
|
|
||
| env_vars: | ||
| PYTORCH_CUDA_ALLOC_CONF: "expandable_segments:False" | ||
|
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Uh oh!
There was an error while loading. Please reload this page.