etcd3 cluster filled disk and didn't gc old revisions #7944

Spindel · 2017-05-17T20:22:56Z

Bug reporting

our kubernetes cluster stopped taking commands due to etcd3 running out of space and alarming about NOSPACE.

endpoint status, as json:
[{"Endpoint":"coreos01.kub.modio.se:2379","Status":{"header":{"cluster_id":1416696078359407448,"member_id":5715034692686022789,"revision":2190654,"raft_term":5829},"version":"3.0.10","dbSize":952512512,"leader":12583913577278567278,"raftIndex":5817555,"raftTerm":5829}},{"Endpoint":"coreos01.kub.do.modio.se:2379","Status":{"header":{"cluster_id":1416696078359407448,"member_id":9911998787147328850,"revision":2190654,"raft_term":5829},"version":"3.0.10","dbSize":356352,"leader":12583913577278567278,"raftIndex":5817555,"raftTerm":5829}},{"Endpoint":"coreos03.kub.do.modio.se:2379","Status":{"header":{"cluster_id":1416696078359407448,"member_id":12583913577278567278,"revision":2190654,"raft_term":5829},"version":"3.0.10","dbSize":3777458176,"leader":12583913577278567278,"raftIndex":5817555,"raftTerm":5829}}]

After that we did a compaction:

compaction 2190654
compacted revision 2190654

after compact, before defrag:

coreos01.kub.modio.se:2379, 4f4fe1c600958885, 3.0.10, 952 MB, false, 5829, 5817819
coreos01.kub.do.modio.se:2379, 898e7e618204ed52, 3.0.10, 369 kB, false, 5829, 5817819
coreos03.kub.do.modio.se:2379, aea30ab9e1e7736e, 3.0.10, 3.8 GB, true, 5829, 5817821

after compact, after defrag:

coreos01.kub.modio.se:2379, 4f4fe1c600958885, 3.0.10, 299 kB, false, 5829, 5817955
coreos01.kub.do.modio.se:2379, 898e7e618204ed52, 3.0.10, 299 kB, false, 5829, 5817956
coreos03.kub.do.modio.se:2379, aea30ab9e1e7736e, 3.0.10, 299 kB, true, 5829, 5817957

And after clearing the alarms, we could use the kubernetes cluster again.

The text was updated successfully, but these errors were encountered:

Spindel · 2017-05-17T20:23:50Z

Note that in the output here, we had already manually compacted against one of the revisions, but not all before this.
The difference in size between the 369kb and the 952MiB is also quite stark.

xiang90 · 2017-05-17T20:29:00Z

@Spindel Do you do external periodical backup? How often does this happen?

xiang90 · 2017-05-17T20:29:47Z

is this a cluster migrated from etcd2? or it is a fresh cluster?

Spindel · 2017-05-17T20:32:54Z

No external backups taken on this system, and this is the first time.

It's powering a small kubernetes cluster ( handful of pods only) and this is the first time it's filled up.

I'm no longer sure if it's migrated or not, honestly. It's been re-set quite a few times while I've been working with k8s.

xiang90 · 2017-05-17T20:36:00Z

No external backups taken on this system, and this is the first time.

good to know.

It's powering a small kubernetes cluster ( handful of pods only) and this is the first time it's filled up.

Can you keep on monitoring the database size growth? It should not grow even up to 100MB for your workload in theory.

Do you keep all your etcd log? I am interested in the etcd server log of coreos03.kub.do.modio.se:2379 in 48 hours before it filled up its db size quota.

Spindel · 2017-05-17T20:52:21Z

I can keep monitoring it, yes.
And I kept that machine ( and only that machine ) without rebooting. Attaching a log.

etcd-member.zip

xiang90 · 2017-05-17T21:00:44Z

@Spindel Thanks a lot! We will check the log and keep you posted.

xiang90 · 2017-05-17T21:21:22Z

### DB size is 4.2MB on 5-01

"MESSAGE" : "2017-05-01 16:33:37.593953 I | etcdserver: wrote database snapshot out [total bytes: 4210688]"
"MESSAGE" : "2017-05-01 16:33:37.715981 I | rafthttp: database snapshot [index: 897664, to: 4f4fe1c600958885] sent out successfully"

### DB size is 16.8 MB on 5-14

"MESSAGE" : "2017-05-14 18:33:09.985720 I | etcdserver: wrote database snapshot out [total bytes: 16801792]"
"MESSAGE" : "2017-05-14 18:33:10.044068 I | rafthttp: database snapshot [index: 4891320, to: 864e9ff97f31db34] sent out successfully"

### A bunch of failed db sent happened... We do not know the DB size. But on the same day, db grew to 3.8GB

"MESSAGE" : "2017-05-17 14:52:16.507272 I | rafthttp: start to send database snapshot [index: 5786911, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 14:54:31.768820 W | rafthttp: database snapshot [index: 5786911, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 14:54:32.214706 I | rafthttp: start to send database snapshot [index: 5787406, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:08:01.517050 W | rafthttp: database snapshot [index: 5787406, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:08:40.996441 I | rafthttp: start to send database snapshot [index: 5791076, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:09:30.765033 W | rafthttp: database snapshot [index: 5791076, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:09:31.044130 I | rafthttp: start to send database snapshot [index: 5791358, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:14:07.135746 W | rafthttp: database snapshot [index: 5791358, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:17:37.918353 I | rafthttp: start to send database snapshot [index: 5793924, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:19:02.336803 W | rafthttp: database snapshot [index: 5793924, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:19:02.457658 I | rafthttp: start to send database snapshot [index: 5794375, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:19:39.992919 W | rafthttp: database snapshot [index: 5794375, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"

@heyitsanthony I am not sure if the db sent failure triggered the huge db size or the db huge db size triggered the snapshot sending failure. Probably we should check how etcd handles snapshot sent failure first.

xiang90 · 2017-05-17T21:22:35Z

@Spindel do you know what happened to peer "https://coreos02.kub.do.modio.se"?

Spindel · 2017-05-17T21:24:31Z

@xiang90 : Yeah, it hit OOM and the OOM-killer started reaping random processes.
I don't have any logs left from that node ( it's been destroyed and re-provisioned since ) sadly.

xiang90 · 2017-05-17T21:26:29Z

Yeah, it hit OOM

Is it possible that etcd takes a lot of RAM because of the large snapshot? What is your RAM limit?

Spindel · 2017-05-17T21:27:41Z

On that node? I think that was 1GiB, but there were some misbehaving auto-restarted pods on that node that ate RAM pretty aggressively.

xiang90 · 2017-05-17T21:55:21Z

@Spindel

coreos01.kub.modio.se:2379, 4f4fe1c600958885, 3.0.10, 952 MB, false, 5829, 5817819

do you still have the log on this member too?

Spindel · 2017-05-17T22:00:59Z

Here you go:
coreos01.kub.modio.se.zip

Spindel · 2017-05-17T22:08:37Z

Here's the monitoring graph of available memory on the coreos02 node.

As you can see, it was quite stable at 200Mib free until it wasn't, and then a big dive.
Not sure what caused it, and I wasn't intent on digging further right now, but the dip should be coinciding with other things in the logs, I'd guess.

xiang90 · 2017-05-17T22:28:40Z

@Spindel Ah... Glad that you are monitoring RSS. Can you also provide the RSS usage on machine
coreos01.kub.modio.se:2379 and coreos03.kub.do.modio.se:2379? Thanks!

xiang90 · 2017-05-17T22:33:44Z

From another member

### DB size is 5.2MB on 5-10

"MESSAGE" : "2017-05-10 02:50:27.585453 I | rafthttp: receiving database snapshot [index:3443235, from 864e9ff97f31db34] ..."
"MESSAGE" : "2017-05-10 02:50:27.848332 I | snap: saved database snapshot to disk [total bytes: 5287936]"
"MESSAGE" : "2017-05-10 02:50:27.848383 I | rafthttp: received and saved database snapshot [index: 3443235, from: 864e9ff97f31db34] successfully"


### A bunch of failed db sent happened... We do not know the DB size. But on the same day, db grew to 952MB

"MESSAGE" : "2017-05-17 15:07:44.038917 I | rafthttp: start to send database snapshot [index: 5790785, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:08:45.918368 W | rafthttp: database snapshot [index: 5790785, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:19:36.991778 I | rafthttp: start to send database snapshot [index: 5794558, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:20:04.962841 W | rafthttp: database snapshot [index: 5794558, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:22:04.264811 I | rafthttp: start to send database snapshot [index: 5795340, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:22:29.757255 I | rafthttp: start to send database snapshot [index: 5795455, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:22:55.727348 W | rafthttp: database snapshot [index: 5795455, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:22:56.020457 I | rafthttp: start to send database snapshot [index: 5795598, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:23:19.094247 I | rafthttp: start to send database snapshot [index: 5795715, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:24:33.002411 W | rafthttp: database snapshot [index: 5795598, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:24:33.016531 W | rafthttp: database snapshot [index: 5795340, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:24:41.389751 I | rafthttp: start to send database snapshot [index: 5796157, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:24:46.281904 W | rafthttp: database snapshot [index: 5795715, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:24:46.371952 I | rafthttp: start to send database snapshot [index: 5796178, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:44:34.083107 W | rafthttp: database snapshot [index: 5796157, to: 864e9ff97f31db34] failed to be sent out (stopped)"
"MESSAGE" : "2017-05-17 15:44:34.083139 W | rafthttp: database snapshot [index: 5796178, to: 864e9ff97f31db34] failed to be sent out (stopped)"

@heyitsanthony Interesting... Two nodes with large db size all tried to send snapshot to 864e9ff97f31db34 but failed. It seems to me the root cause is the snapshot sending.

@Spindel Being greedy... Logs from coreos01.kub.do.modio.se:2379?

Spindel · 2017-05-17T22:37:59Z

Sadly it seems I didn't have monitoring set up on coreos01.kub.modio.se for RSS ( only ping and uptimes )

Attaching graph for coreos03

Spindel · 2017-05-17T22:39:48Z

And here are the logs for coreos01.kub.do.modio.se
coreos01.kub.do.modio.se.zip

Spindel · 2017-05-17T22:45:10Z

Sadly our monitoring doesn't have the proper cpu resources inside the cluster ( No proper process counts, etc) since they run as a daemonset pod.

A better way of running that would be nice.

Anyhow, I'm hitting the sack, it's way too late in my timezone. Hope the info helps!

xiang90 · 2017-05-17T22:54:08Z

@heyitsanthony

Strangely enough... the good member (coreos01.kub.do.modio.se:2379) also failed to send a few snapshots on the same day.

"MESSAGE" : "2017-05-17 15:08:20.434193 I | rafthttp: start to send database snapshot [index: 5790966, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:08:25.710143 I | rafthttp: start to send database snapshot [index: 5790995, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:08:35.739152 W | rafthttp: database snapshot [index: 5790966, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:08:35.741566 W | rafthttp: database snapshot [index: 5790995, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:11:01.644063 I | rafthttp: start to send database snapshot [index: 5791831, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:12:55.774031 W | rafthttp: database snapshot [index: 5791831, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:12:56.042411 I | rafthttp: start to send database snapshot [index: 5792449, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:13:56.535221 I | rafthttp: start to send database snapshot [index: 5792767, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:14:07.144550 W | rafthttp: database snapshot [index: 5792767, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:14:08.049084 I | rafthttp: start to send database snapshot [index: 5792829, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:14:18.592428 W | rafthttp: database snapshot [index: 5792829, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:14:18.853030 I | rafthttp: start to send database snapshot [index: 5792885, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:14:40.492684 W | rafthttp: database snapshot [index: 5792449, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:14:41.587673 I | rafthttp: start to send database snapshot [index: 5793010, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:15:02.155622 W | rafthttp: database snapshot [index: 5792885, to: 864e9ff97f31db34] failed to be sent out (write tcp [2a03:b0c0:3:d0::43bb:3001]:53990->[2a03:b0c0:3:d0::43bb:4001]:2380: write: connection reset by peer)"
"MESSAGE" : "2017-05-17 15:15:02.984109 I | rafthttp: start to send database snapshot [index: 5793123, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:15:38.859376 I | rafthttp: start to send database snapshot [index: 5793301, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:15:48.974015 W | rafthttp: database snapshot [index: 5793010, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:15:49.190946 I | rafthttp: start to send database snapshot [index: 5793357, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:16:10.765930 W | rafthttp: database snapshot [index: 5793123, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:16:11.682791 I | rafthttp: start to send database snapshot [index: 5793474, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:16:12.616413 W | rafthttp: database snapshot [index: 5793357, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:16:12.825122 I | rafthttp: start to send database snapshot [index: 5793480, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:16:54.197420 W | rafthttp: database snapshot [index: 5793474, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:16:54.335781 I | rafthttp: start to send database snapshot [index: 5793696, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:16:55.585107 W | rafthttp: database snapshot [index: 5793301, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:16:55.757656 I | rafthttp: start to send database snapshot [index: 5793702, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:17:52.729687 W | rafthttp: database snapshot [index: 5793696, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:17:58.294753 W | rafthttp: database snapshot [index: 5793480, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:18:20.395389 W | rafthttp: database snapshot [index: 5793702, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:19:58.508986 I | rafthttp: start to send database snapshot [index: 5794668, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:20:22.196328 I | rafthttp: start to send database snapshot [index: 5794790, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:20:31.653539 W | rafthttp: database snapshot [index: 5794668, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:20:32.068761 I | rafthttp: start to send database snapshot [index: 5794838, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:20:48.079427 W | rafthttp: database snapshot [index: 5794838, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:20:48.206504 I | rafthttp: start to send database snapshot [index: 5794941, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:20:53.611956 W | rafthttp: database snapshot [index: 5794790, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:20:53.720563 I | rafthttp: start to send database snapshot [index: 5794966, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:20:58.019918 W | rafthttp: database snapshot [index: 5794941, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:20:58.217332 I | rafthttp: start to send database snapshot [index: 5794995, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:21:31.398398 W | rafthttp: database snapshot [index: 5794966, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:21:32.911986 I | rafthttp: start to send database snapshot [index: 5795176, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:21:36.629722 W | rafthttp: database snapshot [index: 5794995, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:21:37.447187 I | rafthttp: start to send database snapshot [index: 5795203, to 864e9ff97f31db34]..."
"MESSAGE" : "2017-05-17 15:21:58.303013 W | rafthttp: database snapshot [index: 5795176, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"
"MESSAGE" : "2017-05-17 15:22:29.421087 W | rafthttp: database snapshot [index: 5795203, to: 864e9ff97f31db34] failed to be sent out (unexpected http status Internal Server Error while posting to \"https://coreos02.kub.do.modio.se:2380/raft/snapshot\")"

Spindel · 2017-05-17T23:01:18Z

Note that before I did the status and such, I had manually compacted coreos01.kub.do.modio.se as it was also quite huge.

I'm not sure it should be considered a "good" node in this case, just that I didn't keep all the logs from that part.

xiang90 · 2017-05-17T23:03:38Z

@Spindel Ah. Right. I looked into the log and found coreos01.kub.do.modio.se also has a HUGE snapshot at some point. So the snapshot failure theory is still possible.

Spindel · 2017-05-18T07:21:30Z

I'd think it's very plausible.

To reproduce, you can probably start a three node cluster where one of them has < 500 MiB RAM, and then just put some various memory hogs on the low-RAM device.
Not a single big ram-hog, cause then the oom-killer would reap it, you want to create memory pressure, not killing things

wojtek-t · 2017-05-18T17:59:21Z

/subscribe

xiang90 · 2017-05-22T16:40:42Z

@Spindel Does the db size increase again?

Spindel · 2017-05-23T07:03:17Z

Since I've permanently removed the suspected problematic host from the cluster, it doesn't appear to grow inconsiderately anymore:

endpoint status gives the following:

coreos03.kub.do.modio.se:2379, aea30ab9e1e7736e, 3.0.10, 5.6 MB, false, 8278, 7713102
coreos01.kub.modio.se:2379, 4f4fe1c600958885, 3.0.10, 5.6 MB, false, 8278, 7713102
coreos01.kub.do.modio.se:2379, 898e7e618204ed52, 3.0.10, 5.7 MB, false, 8278, 7713102
coreos05.kub.modio.se:2379, c1bd63d6fb4a10ea, 3.0.10, 5.7 MB, true, 8278, 7713104

Which still appears to be more than 100x bloat factor, is acceptable.

xiang90 · 2017-05-23T16:04:22Z

@Spindel

I discussed with @heyitsanthony about this issue yesterday. We can understand snapshot issues contribute to the huge db size, but should not bring it up to 1GB+ in that short amount of time.

So I investigated into the logs again, and found the root cause for this case.

"MESSAGE" : "2017-05-14 16:07:28.319813 I | mvcc: finished scheduled compaction at 1815026 (took 1.573214ms)"
"MESSAGE" : "2017-05-14 16:12:28.454798 I | mvcc: finished scheduled compaction at 1815435 (took 912.595µs)"
"MESSAGE" : "2017-05-14 16:17:28.522506 I | mvcc: finished scheduled compaction at 1815827 (took 1.691996ms)"
"MESSAGE" : "2017-05-14 16:22:28.571924 I | mvcc: finished scheduled compaction at 1816216 (took 1.743819ms)"

// No compaction for 3 days with 30K+ entries ~= 30k*4kb ~= 1GB+ data.

"MESSAGE" : "2017-05-17 11:36:45.675060 I | mvcc: finished scheduled compaction at 2169745 (took 20.935680091s)"
"MESSAGE" : "2017-05-17 11:41:24.829663 I | mvcc: finished scheduled compaction at 2170257 (took 33.872468ms)"
"MESSAGE" : "2017-05-17 11:46:24.824949 I | mvcc: finished scheduled compaction at 2170744 (took 2.118377ms)"
"MESSAGE" : "2017-05-17 11:51:24.838707 I | mvcc: finished scheduled compaction at 2171255 (took 1.376061ms)"
"MESSAGE" : "2017-05-17 11:56:24.856919 I | mvcc: finished scheduled compaction at 2171767 (took 2.451828ms)"
"MESSAGE" : "2017-05-17 12:01:24.896190 I | mvcc: finished scheduled compaction at 2172276 (took 1.796448ms)"
"MESSAGE" : "2017-05-17 12:06:24.967299 I | mvcc: finished scheduled compaction at 2172788 (took 1.330718ms)"
"MESSAGE" : "2017-05-17 12:11:25.009440 I | mvcc: finished scheduled compaction at 2173296 (took 2.343877ms)"

The compaction somehow got stuck for 3 days. Same thing happens to all other members in the cluster. Kuberentes master should compact etcd every 5 minutes or so.

Can you check your api server log to see if anything happened to it?

Spindel · 2017-05-23T16:17:57Z

Sadly, the apiserver logs have been rotated since. ;-(

…

On Tue, 23 May 2017, 18:04 Xiang Li, ***@***.***> wrote: @Spindel <https://github.com/spindel> I discussed with @heyitsanthony <https://github.com/heyitsanthony> about this issue yesterday. We can understand snapshot issues contribute to the huge db size, but should not bring it up to 1GB+ in that short amount of time. So I investigated into the logs again, and found the root cause for this case. "MESSAGE" : "2017-05-14 16:07:28.319813 I | mvcc: finished scheduled compaction at 1815026 (took 1.573214ms)" "MESSAGE" : "2017-05-14 16:12:28.454798 I | mvcc: finished scheduled compaction at 1815435 (took 912.595µs)" "MESSAGE" : "2017-05-14 16:17:28.522506 I | mvcc: finished scheduled compaction at 1815827 (took 1.691996ms)" "MESSAGE" : "2017-05-14 16:22:28.571924 I | mvcc: finished scheduled compaction at 1816216 (took 1.743819ms)" // No compaction for 3 days with 30K+ entries "MESSAGE" : "2017-05-17 11:36:45.675060 I | mvcc: finished scheduled compaction at 2169745 (took 20.935680091s)" "MESSAGE" : "2017-05-17 11:41:24.829663 I | mvcc: finished scheduled compaction at 2170257 (took 33.872468ms)" "MESSAGE" : "2017-05-17 11:46:24.824949 I | mvcc: finished scheduled compaction at 2170744 (took 2.118377ms)" "MESSAGE" : "2017-05-17 11:51:24.838707 I | mvcc: finished scheduled compaction at 2171255 (took 1.376061ms)" "MESSAGE" : "2017-05-17 11:56:24.856919 I | mvcc: finished scheduled compaction at 2171767 (took 2.451828ms)" "MESSAGE" : "2017-05-17 12:01:24.896190 I | mvcc: finished scheduled compaction at 2172276 (took 1.796448ms)" "MESSAGE" : "2017-05-17 12:06:24.967299 I | mvcc: finished scheduled compaction at 2172788 (took 1.330718ms)" "MESSAGE" : "2017-05-17 12:11:25.009440 I | mvcc: finished scheduled compaction at 2173296 (took 2.343877ms)" The compaction somehow got stuck for 3 days. Kuberentes master should compact etcd every 5 minutes or so. Can you check your api server log to see if anything happened to it? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7944 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABmI5jL7yZhWMMR7SGTNSGBMoFpj_NWdks5r8wOogaJpZM4NeZBT> .

xiang90 · 2017-05-31T16:06:08Z

@Spindel

Let's track the issue here: #7944.

I am going to create an issue on k8s repo about the missing compactions (which i believe is the main cause of your db growth).

Spindel · 2017-05-31T18:37:52Z

@xiang90 Uh. You just linked issue #7944 to issue #7944 and closed #7944...

Did you mean #8009?

xiang90 · 2017-05-31T18:38:55Z

@Spindel sorry. yea. #8009

Spindel · 2017-05-31T18:40:47Z

ack. subscribing!

xiang90 · 2017-05-31T18:48:31Z

@Spindel if you ever met the same problem again, please do comment on #8009. thanks!

xiang90 mentioned this issue May 17, 2017

etcdserver: more logging on snapshot close path #7945

Merged

xiang90 added the type/bug label May 17, 2017

wojtek-t mentioned this issue May 31, 2017

Etcd size sometimes starts growing and grows until "mvcc: database space exceeded" #8009

Closed

xiang90 closed this as completed May 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcd3 cluster filled disk and didn't gc old revisions #7944

etcd3 cluster filled disk and didn't gc old revisions #7944

Spindel commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

xiang90 commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

xiang90 commented May 17, 2017 •

edited

Loading

xiang90 commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

Spindel commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

xiang90 commented May 17, 2017 •

edited

Loading

Spindel commented May 17, 2017

Spindel commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

Spindel commented May 18, 2017

wojtek-t commented May 18, 2017

xiang90 commented May 22, 2017

Spindel commented May 23, 2017

xiang90 commented May 23, 2017 •

edited

Loading

Spindel commented May 23, 2017 via email

xiang90 commented May 31, 2017

Spindel commented May 31, 2017

xiang90 commented May 31, 2017

Spindel commented May 31, 2017

xiang90 commented May 31, 2017

etcd3 cluster filled disk and didn't gc old revisions #7944

etcd3 cluster filled disk and didn't gc old revisions #7944

Comments

Spindel commented May 17, 2017

Bug reporting

after compact, before defrag:

after compact, after defrag:

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

xiang90 commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

xiang90 commented May 17, 2017 • edited Loading

xiang90 commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

Spindel commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

xiang90 commented May 17, 2017 • edited Loading

Spindel commented May 17, 2017

Spindel commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

Spindel commented May 17, 2017

xiang90 commented May 17, 2017

Spindel commented May 18, 2017

wojtek-t commented May 18, 2017

xiang90 commented May 22, 2017

Spindel commented May 23, 2017

xiang90 commented May 23, 2017 • edited Loading

Spindel commented May 23, 2017 via email

xiang90 commented May 31, 2017

Spindel commented May 31, 2017

xiang90 commented May 31, 2017

Spindel commented May 31, 2017

xiang90 commented May 31, 2017

xiang90 commented May 17, 2017 •

edited

Loading

xiang90 commented May 17, 2017 •

edited

Loading

xiang90 commented May 23, 2017 •

edited

Loading