Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/refactor process group #358

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

mwiacx
Copy link
Contributor

@mwiacx mwiacx commented Oct 28, 2024

重构ProccessGroup的构建,之前的代码有点典型的为了面向对象而面向对象。

优化项:

  • 复用共通的rank分配逻辑
  • 易于理解的并行维度组合和拆分定义
  • 易于改动并行组合的优先分配顺序
  • 支持嵌套多层并行维度组合定义
  • 支持并行组合中若干维度,比如中间维度匿名,只占位,不创建ProcessGroup
  • 更统一的支持创建不完整的Group

单元测试:

  • mtp: world_size = 16, zero1 = -1
  • mtp: world_size = 16, tp = 4, zero1.5 = 2
  • mtp: world_size = 16, tp = 2, pp = 2, zero1.5 = -1
  • mtp moe: world_size = 16, tp = 2, pp = 2, ep = 4, ep_no_tp = false, zero1 = -1
  • mtp moe: world_size = 16, tp = 2, pp = 1, ep = 2, ep_no_tp = true, zero1 = 2
  • msp/fsp: world_size = 16, zero1 = -1
  • msp/fsp: world_size = 16, tp = 4, zero1.5 = 2
  • msp/fsp: world_size = 16, tp = 2, pp = 2, zero1.5 = -1
  • msp/fsp moe: world_size = 16, tp = 2, pp = 2, ep = 4, ep_no_tp = false, zero1 = -1
  • msp/fsp moe: world_size = 16, tp = 2, pp = 1, ep = 2, ep_no_tp = true, zero1 = 2
  • isp: world_size = 16, zero = -1
  • isp: world_size = 16, sp = 4, pp = 2, zero = -1
  • isp: world_size = 16, wp = 4, pp = 1, zero1.5 = 2
  • isp: world_size = 16, sp = 2, wp = 2, pp = 2, zero = -1
  • isp moe: world_size = 16, sp = 2, wp =2, ewp = 4, ep = 2, pp = 2, zero = -1
  • isp moe: world_size = 16, sp = 2, wp =2, ewp = 2, ep = 4, pp = 2, zero = -1
  • isp 2d attn: world_size = 16, sp = 4, wp = 4, pp = 2, zero = -1, hp = 2, cp =2, window_size=1, head_first = True, interleaved = False
  • isp 2d attn: world_size = 16, sp = 8, wp = 2, pp = 1, zero = -1, hp = 4, cp = 2, window_size=2, head_first = False, interleaved = False
  • isp 2d attn: world_size = 16, sp = 8, wp = 2, pp = 1, zero = -1, hp = 1, cp = 8, window_size=4, head_first = False, interleaved = True
  • isp 2d attn: world_size = 16, sp = 8, wp = 2, pp = 1, zero = -1, hp = 2, cp = 4, window_size=2, head_first = False, interleaved = True
  • isp 2d attn moe : world_size = 16, sp = 4, wp = 4, pp = 2, ewp = 4, ep = 2, zero = -1, hp = 2, cp =2, window_size=2, head_first = True, interleaved = False
  • isp 2d attn: world_size = 16, sp = 8, wp = 2, pp = 1, ewp = 2, ep = 4, zero = -1, hp = 2, cp = 4, window_size=2, head_first = False, interleaved = True

@mwiacx mwiacx force-pushed the feat/refactor-process-group branch from 14d425b to 99c5e60 Compare October 28, 2024 07:52
@mwiacx mwiacx force-pushed the feat/refactor-process-group branch from c87dc8f to 28ecb58 Compare October 29, 2024 08:17
@mwiacx mwiacx marked this pull request as ready for review October 29, 2024 08:17
@mwiacx mwiacx force-pushed the feat/refactor-process-group branch 2 times, most recently from 243ef56 to 91fca0d Compare October 29, 2024 10:22
@mwiacx mwiacx force-pushed the feat/refactor-process-group branch from 9d9a37b to e032568 Compare October 29, 2024 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants