-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the problem in fleet executor stop #38114
Fix the problem in fleet executor stop #38114
Conversation
Thanks for your contribution! |
9411ab4
to
8088ffe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用GPT对一下pp=dp=mp=2的精度再合吧,以后的pr都跑一下精度再合?
8088ffe
to
06b5458
Compare
06b5458
to
1c03784
Compare
@@ -29,9 +29,8 @@ void InterceptorMessageServiceImpl::InterceptorMessageService( | |||
VLOG(3) << "Interceptor Message Service receives a message from interceptor " | |||
<< request->src_id() << " to interceptor " << request->dst_id() | |||
<< ", with the message: " << request->message_type(); | |||
FleetExecutor::GetCarrier().EnqueueInterceptorMessage(*request); | |||
response->set_rst(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
突然想到,这其实可以set_rst为EnqueueInterceptorMessage的返回值,下个pr可以改一下。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
rebase一下最新develop分支跑一下精度吧 🧐 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for PADDLE_ENFORCE
PR types
Bug fixes
PR changes
Others
Describe
修复之前多卡下FleetExecutor停不下来的问题,修改对象的持有关系,将MessageBus从单例变为FleetExecutor的成员,由FleetExecutor管理它的生命周期,向Carrier传递MessageBus指针,Interceptor持有拥有者Carrier的指针,一个FleetExecutor未来会拥有多个Carrier,最终在FleetExecutor的析构函数中向Carrier上的源Interceptor发送停止的控制消息。