Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail fast and kill the process when error occurs during startup #275

Closed
seeflood opened this issue Oct 22, 2021 · 4 comments · Fixed by #388
Closed

fail fast and kill the process when error occurs during startup #275

seeflood opened this issue Oct 22, 2021 · 4 comments · Fixed by #388
Labels
good first issue Good for newcomers help wanted Extra attention is needed kind/easy

Comments

@seeflood
Copy link
Member

seeflood commented Oct 22, 2021

According to feedback from @stulzq and @whalesongAndLittleFish ,when Layotto can not connect to the redis server(for example,you wrote a wrong redis port in the Layotto configuration file), the logs are not detailed and we can't find out why error happens.

Steps to reproduce:

  1. Start a Layotto with a redis lock component following the quickstart document ,but configurate a wrong redis port
  2. Use redis lock demo client to invoke Layotto. The client will panic,and the client logs are as follows:
    image
    But the server logs are not detailed,you can't find out why:
    image

What would you like to be added:
Add detailed log when grpc API error

Chinese:
目前Layotto配置配错了、连接redis异常时,client调分布式锁API会报错,但是从日志中看不出来为啥报错
需要优化日志,连接redis异常时打印更详细的日志,方便排查。

@seeflood seeflood added good first issue Good for newcomers help wanted Extra attention is needed labels Oct 22, 2021
@seeflood seeflood changed the title Add detailed log when grpc API error Add detailed log when grpc API error happens Oct 22, 2021
@seeflood seeflood changed the title Add detailed log when grpc API error happens Add detailed log when error happens Oct 22, 2021
@zach030
Copy link
Contributor

zach030 commented Oct 27, 2021

在本地修改redis的配置文件端口测试后,这里是否存在问题还是我的测试方法不对

  1. 初始化redis客户端时,Ping这里报错connect error
    image
    if _, err = p.client.Ping(p.ctx).Result(); err != nil {
  2. 导致这里的redis lock组件的Init报错,InitLocks会直接return err
    if err := comp.Init(lock.Metadata{Properties: config.Metadata}); err != nil {
  3. 然后会导致InitRuntime报错,return err
    if err := m.initLocks(o.services.locks...); err != nil {
  4. 导致在runtime的Run方法里,这里直接return,grpcServer并没有初始化
    return nil, err

Dapr里对于Runtime组件的初始化
https://github.com/dapr/dapr/blob/875a74b203e5e4d405e4b47cbe382dd7052be726/pkg/runtime/runtime.go#L335
https://github.com/dapr/dapr/blob/875a74b203e5e4d405e4b47cbe382dd7052be726/pkg/runtime/runtime.go#L341

@seeflood
Copy link
Member Author

恩恩设计目标是在init的时候如果error 干脆就别启动;
@stulzq @whalesongAndLittleFish 看下楼上,似乎复现不了,如何复现问题~

@seeflood
Copy link
Member Author

seeflood commented Nov 2, 2021

@whalesongAndLittleFish 反馈,其实启动时候报错了但是没看到。流程是:
Layotto启动报错---正常起来绑定端口---所有客户端访问都会直接报错

这有点问题:理论上Layotto启动报错就应该自杀掉,fail fast,为啥会正常启动?
原因:查了下怀疑是启动时候Layotto报错、没正常初始化,但是不影响mosn启动,这就导致mosn还活着,layotto是死的
修复方案:Layotto改下启动代码,如果有error 直接起个协程panic、自杀掉进程
@nejisama 看下合适不~

@seeflood
Copy link
Member Author

seeflood commented Nov 2, 2021

刚找@nejisama 讨论了,总结下

问题:启动时候Layotto报错、没正常初始化,但是不影响mosn启动,这就导致mosn还活着,layotto是死的,进程虽然启动了,但是部分功能是挂的。但是本着fail fast的思想,最好能直接kill掉进程

生产用户的需求:有的生产用户可能希望别自动kill,kill掉就没日志查了

修复方案:
在main里,如果启动有error,就panic自杀;

if err!=nil panic(err)

这样如果用户希望不自杀的话,自己写个main编译运行即可
image

@zach030 @stulzq @whalesongAndLittleFish FYI,总算搞明白这个问题了……我先把修复这个问题记个社区任务吧,或者各位大佬感兴趣也可以修一下

@seeflood seeflood changed the title Add detailed log when error happens fail fast and kill the process when error occurs during startup Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed kind/easy
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants