Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

egg出现agent与worker通信异常,部分worker进程无法收到agent的消息,杀掉worker进程后恢复,这种问题该如何监控和恢复 #5364

Open
TaotaoGold opened this issue Oct 16, 2024 · 0 comments

Comments

@TaotaoGold
Copy link

  1. agent进程会通过sendRandom来发送一个消息给wokrer进程
  2. 通过日志发现5个worker进程中只有固定的一个进程在处理消息,且这个进程只收到了部分消息,推测是发送给其他woker进程的消息丢失了
  3. 其他worker进程还能正常处理其他的业务,就只是收不到agent的消息了
  4. 手动kill掉其他worker进程后恢复正常,能收到消息
  5. 我在app和agent中添加了自己实现的heartbeat机制(agent每10s通过broadcast发送ping消息,app监听ping消息,如果app超过30s未收到ping消息则自己退出),这个机制似乎也没有生效
  6. 查看egg日志也未发现异常

egg版本为2.22.2,node版本为v12.22.11

@TaotaoGold TaotaoGold changed the title egg出现agent与worker通信异常,部分worker进程无法收到agent的消息,杀掉woker进程后恢复,这种问题该如何监控和恢复 egg出现agent与worker通信异常,部分worker进程无法收到agent的消息,杀掉worker进程后恢复,这种问题该如何监控和恢复 Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant