perf: DeepEP interface in megatron backend#1794
Conversation
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
…1640) Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
…1605) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: alexandery <alexandery@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Sahil Modi <samodi@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: ZeYi Lin <944270057@qq.com> Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: alexandery-nvidia <alexandery@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Peter Jin <pjin@nvidia.com> Co-authored-by: samodi-nv <141948907+samodi-nv@users.noreply.github.com> Co-authored-by: ruit <ruit@nvidia.com> Co-authored-by: Jonas Yang <joyang@nvidia.com> Co-authored-by: Ze-Yi LIN <58305964+Zeyi-Lin@users.noreply.github.com> Co-authored-by: Alexander Zhipa <alex.zhipa@proton.me> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Manasa Manohara <mmanohara@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: Guyue Huang <guyueh@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Peter Jin <pjin@nvidia.com> Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: root <root@pool0-00514.cm.cluster> Co-authored-by: root <root@pool0-00514.cm.cluster> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
…A) (#1648) Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
#1715) Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
📝 WalkthroughWalkthroughAdds three new MOE-related configuration fields ( Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@nemo_rl/models/policy/__init__.py`:
- Around line 186-195: Update the TypedDict in nemo_rl/models/policy/__init__.py
to mark moe_enable_deepep, moe_token_dispatcher_type, and
moe_shared_expert_overlap as NotRequired and document recommended defaults
(e.g., False, 'allgather', False); then change the access in
megatron_policy_worker.py (around the logic at lines ~661–667) to use
config.get('moe_enable_deepep', False), config.get('moe_token_dispatcher_type',
'allgather'), and config.get('moe_shared_expert_overlap', False) so missing keys
won’t KeyError; finally add those three keys with the recommended default values
to the exemplar YAMLs (grpo_math_70B_megatron.yaml, grpo_math_8B_megatron.yaml,
grpo_math_qwen30ba3b_megatron.yaml).
|
@terrykong could you review? This is needed for a recent deepseek performance study urgently |
Signed-off-by: Parth Mannan <pmannan@nvidia.com> Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: alexandery <alexandery@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Sahil Modi <samodi@nvidia.com> Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: ZeYi Lin <944270057@qq.com> Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Signed-off-by: root <root@pool0-00514.cm.cluster> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: Rayen <ruit@nvidia.com> Co-authored-by: Parth Mannan <pmannan@nvidia.com> Co-authored-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Co-authored-by: Rayen <ruit@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: alexandery-nvidia <alexandery@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Peter Jin <pjin@nvidia.com> Co-authored-by: samodi-nv <141948907+samodi-nv@users.noreply.github.com> Co-authored-by: Jonas Yang <joyang@nvidia.com> Co-authored-by: Ze-Yi LIN <58305964+Zeyi-Lin@users.noreply.github.com> Co-authored-by: Alexander Zhipa <alex.zhipa@proton.me> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Manasa Manohara <mmanohara@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Co-authored-by: root <root@pool0-00514.cm.cluster> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Adil <47084919+adil-a@users.noreply.github.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: alexchiu <alexq@nvidia.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com> Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: alexandery <alexandery@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Sahil Modi <samodi@nvidia.com> Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: ZeYi Lin <944270057@qq.com> Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Signed-off-by: root <root@pool0-00514.cm.cluster> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: Rayen <ruit@nvidia.com> Co-authored-by: Parth Mannan <pmannan@nvidia.com> Co-authored-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Co-authored-by: Rayen <ruit@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: alexandery-nvidia <alexandery@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Peter Jin <pjin@nvidia.com> Co-authored-by: samodi-nv <141948907+samodi-nv@users.noreply.github.com> Co-authored-by: Jonas Yang <joyang@nvidia.com> Co-authored-by: Ze-Yi LIN <58305964+Zeyi-Lin@users.noreply.github.com> Co-authored-by: Alexander Zhipa <alex.zhipa@proton.me> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Manasa Manohara <mmanohara@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Co-authored-by: root <root@pool0-00514.cm.cluster> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Adil <47084919+adil-a@users.noreply.github.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: alexchiu <alexq@nvidia.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com> Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: alexandery <alexandery@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Sahil Modi <samodi@nvidia.com> Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: ZeYi Lin <944270057@qq.com> Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Signed-off-by: root <root@pool0-00514.cm.cluster> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: Rayen <ruit@nvidia.com> Co-authored-by: Parth Mannan <pmannan@nvidia.com> Co-authored-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Co-authored-by: Rayen <ruit@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: alexandery-nvidia <alexandery@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Peter Jin <pjin@nvidia.com> Co-authored-by: samodi-nv <141948907+samodi-nv@users.noreply.github.com> Co-authored-by: Jonas Yang <joyang@nvidia.com> Co-authored-by: Ze-Yi LIN <58305964+Zeyi-Lin@users.noreply.github.com> Co-authored-by: Alexander Zhipa <alex.zhipa@proton.me> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Manasa Manohara <mmanohara@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Co-authored-by: root <root@pool0-00514.cm.cluster> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Adil <47084919+adil-a@users.noreply.github.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: alexchiu <alexq@nvidia.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com> Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: alexandery <alexandery@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Sahil Modi <samodi@nvidia.com> Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: ZeYi Lin <944270057@qq.com> Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Signed-off-by: root <root@pool0-00514.cm.cluster> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: Rayen <ruit@nvidia.com> Co-authored-by: Parth Mannan <pmannan@nvidia.com> Co-authored-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Co-authored-by: Rayen <ruit@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: alexandery-nvidia <alexandery@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Peter Jin <pjin@nvidia.com> Co-authored-by: samodi-nv <141948907+samodi-nv@users.noreply.github.com> Co-authored-by: Jonas Yang <joyang@nvidia.com> Co-authored-by: Ze-Yi LIN <58305964+Zeyi-Lin@users.noreply.github.com> Co-authored-by: Alexander Zhipa <alex.zhipa@proton.me> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Manasa Manohara <mmanohara@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Co-authored-by: root <root@pool0-00514.cm.cluster> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Adil <47084919+adil-a@users.noreply.github.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: alexchiu <alexq@nvidia.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com> Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: alexandery <alexandery@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Sahil Modi <samodi@nvidia.com> Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: ZeYi Lin <944270057@qq.com> Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Signed-off-by: root <root@pool0-00514.cm.cluster> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: Rayen <ruit@nvidia.com> Co-authored-by: Parth Mannan <pmannan@nvidia.com> Co-authored-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Co-authored-by: Rayen <ruit@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: alexandery-nvidia <alexandery@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Peter Jin <pjin@nvidia.com> Co-authored-by: samodi-nv <141948907+samodi-nv@users.noreply.github.com> Co-authored-by: Jonas Yang <joyang@nvidia.com> Co-authored-by: Ze-Yi LIN <58305964+Zeyi-Lin@users.noreply.github.com> Co-authored-by: Alexander Zhipa <alex.zhipa@proton.me> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Manasa Manohara <mmanohara@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Co-authored-by: root <root@pool0-00514.cm.cluster> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Adil <47084919+adil-a@users.noreply.github.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: alexchiu <alexq@nvidia.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com> Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: alexandery <alexandery@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Sahil Modi <samodi@nvidia.com> Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: ZeYi Lin <944270057@qq.com> Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Signed-off-by: root <root@pool0-00514.cm.cluster> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: Rayen <ruit@nvidia.com> Co-authored-by: Parth Mannan <pmannan@nvidia.com> Co-authored-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Co-authored-by: Rayen <ruit@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: alexandery-nvidia <alexandery@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Peter Jin <pjin@nvidia.com> Co-authored-by: samodi-nv <141948907+samodi-nv@users.noreply.github.com> Co-authored-by: Jonas Yang <joyang@nvidia.com> Co-authored-by: Ze-Yi LIN <58305964+Zeyi-Lin@users.noreply.github.com> Co-authored-by: Alexander Zhipa <alex.zhipa@proton.me> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Manasa Manohara <mmanohara@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Co-authored-by: root <root@pool0-00514.cm.cluster> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Adil <47084919+adil-a@users.noreply.github.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: alexchiu <alexq@nvidia.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com> Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: alexandery <alexandery@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Sahil Modi <samodi@nvidia.com> Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: ZeYi Lin <944270057@qq.com> Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Signed-off-by: root <root@pool0-00514.cm.cluster> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: Rayen <ruit@nvidia.com> Co-authored-by: Parth Mannan <pmannan@nvidia.com> Co-authored-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Co-authored-by: Rayen <ruit@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: alexandery-nvidia <alexandery@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Peter Jin <pjin@nvidia.com> Co-authored-by: samodi-nv <141948907+samodi-nv@users.noreply.github.com> Co-authored-by: Jonas Yang <joyang@nvidia.com> Co-authored-by: Ze-Yi LIN <58305964+Zeyi-Lin@users.noreply.github.com> Co-authored-by: Alexander Zhipa <alex.zhipa@proton.me> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Manasa Manohara <mmanohara@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Co-authored-by: root <root@pool0-00514.cm.cluster> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Adil <47084919+adil-a@users.noreply.github.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: alexchiu <alexq@nvidia.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com> Signed-off-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: alexandery <alexandery@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Sahil Modi <samodi@nvidia.com> Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: ZeYi Lin <944270057@qq.com> Signed-off-by: Alexander Zhipa <azzhipa@amazon.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: Sahger Lad <lad.sahger@gmail.com> Signed-off-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Signed-off-by: root <root@pool0-00514.cm.cluster> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: Rayen <ruit@nvidia.com> Co-authored-by: Parth Mannan <pmannan@nvidia.com> Co-authored-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com> Co-authored-by: Rayen <ruit@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: alexandery-nvidia <alexandery@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Peter Jin <pjin@nvidia.com> Co-authored-by: samodi-nv <141948907+samodi-nv@users.noreply.github.com> Co-authored-by: Jonas Yang <joyang@nvidia.com> Co-authored-by: Ze-Yi LIN <58305964+Zeyi-Lin@users.noreply.github.com> Co-authored-by: Alexander Zhipa <alex.zhipa@proton.me> Co-authored-by: Alexander Zhipa <azzhipa@amazon.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Manasa Manohara <mmanohara@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: sahgerlad <36946563+sahgerlad@users.noreply.github.com> Co-authored-by: root <root@pool0-00514.cm.cluster> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Adil <47084919+adil-a@users.noreply.github.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: alexchiu <alexq@nvidia.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
What does this PR do ?
Add interface to configure deep_ep usage in megatron backend
closes #1396
Dup of #1645
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
Release Notes
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.