Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sync_bn, get the correct nccl comm. #45100

Merged
merged 1 commit into from
Aug 12, 2022

Conversation

LiYuRio
Copy link
Contributor

@LiYuRio LiYuRio commented Aug 12, 2022

PR types

Bug fixes

PR changes

Others

Describe

在新通信库中,sync_bn的语义退化成bn,因为从context中得到的nccl_comm为空,导致完全没有经过all reduce操作做同步。这个PR中,从全局的ProcessGroup中得到nccl comm做通信,如果没有全局的nccl comm则退化成原来的语义。

Sync batch norm has wrong syntax with new comm library for we can not get nccl_comm from context anymore. Fix this by getting nccl_comm from global process group.

@paddle-bot
Copy link

paddle-bot bot commented Aug 12, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ForFishes ForFishes merged commit 1e96575 into PaddlePaddle:develop Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants