[3.4.20] Panic probably due to nil log object #14402

ahrtr · 2022-08-30T07:35:27Z

Based on the feedback from @JohnJAS, this issue can't be reproduced in 3.4.19, instead it can only be reproduced on 3.4.20.

It looks like the lg is nil.

Thanks both @JohnJAS and @rtheis.

2022-08-29 06:29:02.702707 E | rafthttp: failed to read 88d5d8ebd854df5a on stream MsgApp v2 (unexpected EOF)
2022-08-29 06:29:02.702723 I | rafthttp: peer 88d5d8ebd854df5a became inactive (message send to peer failed)
2022-08-29 06:29:02.703504 W | rafthttp: lost the TCP streaming connection with peer 88d5d8ebd854df5a (stream Message reader)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x75745b]

goroutine 287 [running]:
go.uber.org/zap.(*Logger).check(0x0, 0x1, 0x10a3ea5, 0x36, 0xc003fb7480)
        /root/go/pkg/mod/go.uber.org/[email protected]/logger.go:264 +0x9b
go.uber.org/zap.(*Logger).Warn(0x0, 0x10a3ea5, 0x36, 0xc003fb7480, 0x2, 0x2)
        /root/go/pkg/mod/go.uber.org/[email protected]/logger.go:194 +0x45
go.etcd.io/etcd/etcdserver.(*EtcdServer).requestCurrentIndex(0xc000398600, 0xc005210f00, 0xd15682e845891c12, 0x0, 0x0, 0x0)
        /tmp/etcd-release-3.4.20/etcd/release/etcd/etcdserver/v3_server.go:805 +0x873
go.etcd.io/etcd/etcdserver.(*EtcdServer).linearizableReadLoop(0xc000398600)
        /tmp/etcd-release-3.4.20/etcd/release/etcd/etcdserver/v3_server.go:721 +0x2d6
go.etcd.io/etcd/etcdserver.(*EtcdServer).goAttach.func1(0xc000398600, 0xc00011ec20)
        /tmp/etcd-release-3.4.20/etcd/release/etcd/etcdserver/server.go:2698 +0x57
created by go.etcd.io/etcd/etcdserver.(*EtcdServer).goAttach
        /tmp/etcd-release-3.4.20/etcd/release/etcd/etcdserver/server.go:2696 +0x1b1

The text was updated successfully, but these errors were encountered:

kkkkun · 2022-08-31T13:54:56Z

It may have a relationship with PR https://github.com/etcd-io/etcd/pull/11616/files

ahrtr · 2022-08-31T20:59:59Z

@kkkkun Please feel free to deliver a PR for this. thx.

vsvastey · 2022-09-01T16:57:02Z

@ahrtr I would like to take a look at the issue, but I'm new to the project and haven't managed to reproduce the bug.
Could you (or anyone else) please help me find out how to reproduce the bug? I believe it might be helpful for anyone else who wants to tackle this issue.

ahrtr · 2022-09-02T21:20:12Z

Thanks @vsvastey .

The reason should be that the Logger isn't correctly initialized in some situation.

@JohnJAS and @rtheis, could you please provide the steps to reproduce this issue? thx

In v3.5 it is assumed that the logger should not be nil, however it is still a case in v3.4. The PR targeted to v3.5 was backported to 3.4 and that's why it's possible to get panic on nil logger in 3.4. This commit fixed this issue. Fixes etcd-io#14402

vsvastey · 2022-09-03T01:31:38Z

I've digged a bit into git history.
So, what I understood is that capnslog is removed for 3.5 and in main. (corresponding issue: #11426) After that, the Logger should not be nil.

However, for 3.4 it is still legal case for Logger to be nil. But the PR (#12795) that was targeted to main was also backported to 3.4 and that is how the code without nil checks get to the branch.

It will be enough just to add a couple of checks whether the Logger is nil. The fix should go for 3.4 only.

The PR is there. However, I haven't had a chance to test it since I don't know how to reproduce the bug.

In v3.5 it is assumed that the logger should not be nil, however it is still a case in v3.4. The PR targeted to v3.5 was backported to 3.4 and that's why it's possible to get panic on nil logger in 3.4. This commit fixed this issue. Fixes etcd-io#14402 Signed-off-by: Vladimir Sokolov <[email protected]>

rtheis · 2022-09-03T10:48:34Z

We easily reproduce the problem by deploying an IBM Cloud Kubernetes Service or Red Hat OpenShift on IBM Cloud cluster. However, you wouldn't be able to hack in your own etcd version to such clusters. My hope is that you could recreate by deploying etcd with your own Kubernetes cluster.

JohnJAS · 2022-09-05T05:41:42Z

We installed an on-premise multiple control-plane nodes k8s cluster. I think it should be easy to reproduce by installing a k8s cluster via kubeadm. In my case, when you are trying to run helm upgrade command, the etcd pods will always be crashed.

ahrtr · 2022-09-05T06:11:11Z

Thanks both @rtheis and @JohnJAS

Issues: 1. etcd-io#14402 fixed in 3.4 only; 2. etcd-io#14382 fixed in both 3.5 and main. Signed-off-by: Benjamin Wang <[email protected]>

ahrtr · 2022-09-06T04:59:37Z

Resolved by #14420

ahrtr added type/bug good first issue help wanted labels Aug 30, 2022

ahrtr mentioned this issue Aug 30, 2022

[3.4] panic: runtime error: invalid memory address or nil pointer dereference #14256

Open

vsvastey mentioned this issue Sep 3, 2022

etcdserver: nil-logger issue fix for version 3.4 #14420

Merged

ahrtr added a commit to ahrtr/etcd that referenced this issue Sep 6, 2022

Update changelog to cover some PRs

92ddc94

Issues: 1. etcd-io#14402 fixed in 3.4 only; 2. etcd-io#14382 fixed in both 3.5 and main. Signed-off-by: Benjamin Wang <[email protected]>

ahrtr mentioned this issue Sep 6, 2022

Update changelog to cover some PRs #14430

Merged

ahrtr closed this as completed Sep 6, 2022

ahrtr mentioned this issue Sep 8, 2022

Plans for v3.4.21 release #14438

Closed

8 tasks

ahrtr mentioned this issue Jan 19, 2023

Test etcd container in K8s cluster #15139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3.4.20] Panic probably due to nil log object #14402

[3.4.20] Panic probably due to nil log object #14402

ahrtr commented Aug 30, 2022 •

edited

Loading

kkkkun commented Aug 31, 2022

ahrtr commented Aug 31, 2022

vsvastey commented Sep 1, 2022

ahrtr commented Sep 2, 2022

vsvastey commented Sep 3, 2022

rtheis commented Sep 3, 2022

JohnJAS commented Sep 5, 2022 •

edited

Loading

ahrtr commented Sep 5, 2022

ahrtr commented Sep 6, 2022

[3.4.20] Panic probably due to nil log object #14402

[3.4.20] Panic probably due to nil log object #14402

Comments

ahrtr commented Aug 30, 2022 • edited Loading

kkkkun commented Aug 31, 2022

ahrtr commented Aug 31, 2022

vsvastey commented Sep 1, 2022

ahrtr commented Sep 2, 2022

vsvastey commented Sep 3, 2022

rtheis commented Sep 3, 2022

JohnJAS commented Sep 5, 2022 • edited Loading

ahrtr commented Sep 5, 2022

ahrtr commented Sep 6, 2022

ahrtr commented Aug 30, 2022 •

edited

Loading

JohnJAS commented Sep 5, 2022 •

edited

Loading