Use machine type n1-standard-2 to avoid OOM killing #17743

bart0sh · 2020-05-28T12:52:35Z

Jobs that create 105 pods on COS are regularly triggering
kernel OOM killer. That causes job falures.

Used n1-standard-2 instance type with 7.5 Gb RAM to give
tests processes more memory.

k8s-ci-robot · 2020-05-28T12:52:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bart0sh
To complete the pull request process, please assign derekwaynecarr
You can assign the PR to them by writing /assign @derekwaynecarr in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

jobs/e2e_node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

bart0sh · 2020-05-28T12:53:57Z

/cc @dims @vpickard @MHBauer @derekwaynecarr @spiffxp

MHBauer · 2020-05-28T15:38:20Z

What does that look like in the logs? I'm looking at https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-benchmark which uses this config.

bart0sh · 2020-05-28T16:10:14Z

Here is what I could find in the latest logs:

./_artifacts/n1-standard-1-cos-dev-83-13020-12-0-e2996693-system.log:May 28 11:51:24 n1-standard-1-cos-dev-83-13020-12-0-e2996693 kernel:  oom_kill_process+0xb1/0x280
./_artifacts/n1-standard-1-cos-dev-83-13020-12-0-e2996693-system.log:May 28 11:51:24 n1-standard-1-cos-dev-83-13020-12-0-e2996693 kernel:  ? oom_evaluate_task+0x137/0x160
...
./_artifacts/n1-standard-1-cos-dev-83-13020-12-0-e2996693-system.log:May 28 11:51:25 n1-standard-1-cos-dev-83-13020-12-0-e2996693 kernel: oom_reaper: reaped process 1860 (e2e_node.test), now anon-rss:0kB, file-rss:0kB, shmem-rss:768kB
...
./_artifacts/n1-standard-1-cos-dev-83-13020-12-0-e2996693-system.log:May 28 11:51:25 n1-standard-1-cos-dev-83-13020-12-0-e2996693 kubelet[1645]: I0528 11:51:18.295644    1645 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"n1-standard-1-cos-dev-83-13020-12-0-e2996693", UID:"n1-standard-1-cos-dev-83-13020-12-0-e2996693", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'SystemOOM' System OOM encountered, victim process: e2e_node.test, pid: 1860

This is an extreme case as test process was killed. Sometimes it's less obvious - OOM killer kills runc and even seemingly unrelated processes.

spiffxp · 2020-05-28T19:45:23Z

/hold
I'm wary of "just increase resources" fixes, it could be that we're hiding a legitimate problem of performance/resource-usage regressions

/cc @karan
I would be curious to get some input from folks with COS expertise

spiffxp · 2020-05-28T19:46:47Z

For example, I might be ok with this as a temporary / unblocking fix if there is a commitment to get back under the threshold. But I don't think we should just bump resources and never look back.

spiffxp · 2020-05-28T19:48:21Z

/cc @bsdnet

k8s-ci-robot · 2020-05-28T19:48:22Z

@spiffxp: GitHub didn't allow me to request PR reviews from the following users: bsdnet.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @bsdnet

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

karan · 2020-05-28T20:08:13Z

+1 to what @spiffxp said.

What jobs are scheduled on the node? What is their resource consumption? Can we instead tune them rather than double the machine size itself?

bsdnet · 2020-05-29T02:48:46Z

For this issue, we need to explore more. Why 105 is picked, and whether system memory increase like systemd, containerd, runc or there is some memory leak. Any step to run this test specifically, I can help debug in the background.

bart0sh · 2020-05-29T10:51:17Z

Sure, I'll try to investigate this further.

Just want to point out that n1-standard-2 machine type is not something new here. It started to be used in this config 2.5 years ago.
Currently 2 configurations use it:

Does anybody know what was the reason for this?

bsdnet · 2020-05-29T13:22:24Z

Does anybody know what was the reason for this?

I do not know. But when I read code, came accross the following comment:
https://github.com/kubernetes/kubernetes/blob/b0c1fd19fcb6cc508bb6aa461594eac9e456960a/test/e2e_node/runner/remote/run_remote.go#L132
Only benchmark is supposed to use machine type.

MHBauer · 2020-05-29T18:08:04Z

I'm wondering if these need to run at all anymore.

Tracing through history, the original proto-KEP, pre-KEP: kubernetes/enhancements#83
In order to help out with max-pods-per-node defaults kubernetes/kubernetes#23349 (comment)

The tests are in this file, https://github.com/kubernetes/kubernetes/blame/master/test/e2e_node/density_test.go#L118-L156

It looks like these results fill in http://node-perf-dash.k8s.io/#/builds
It seems retired https://github.com/kubernetes-retired/contrib/blob/master/node-perf-dash/README.md

It looks like maxpods was last updated to 110 https://github.com/kubernetes/kubernetes/pull/21361/files a long time ago.

bsdnet · 2020-05-30T06:56:58Z

@MHBauer This is good info. Unfortunately, when I asked around, it is hard to know why those numbers are there today. OOM killer is normal when system is under memory pressure. My concern is that whether runc should be the one being picked.

bart0sh · 2020-06-02T13:29:36Z

@bsdnet I've investigated it a bit further. one test (--focus="create 105 pods with 0s? interval [Benchmark]") runs more or less ok on cos-69-10895-385-0 and fails on cos-81-12871-119-0.

I was running this test on n1-standard instances with cos-69 and cos-81 and looking at the free -h output during the run.

on cos-69 minimum of free memory was 938Mi:

ed@n1-standard-1-cos-69-10895-385-0-856f4264 /tmp/node-e2e-20200602T115711 $ free -h
              total        used        free      shared  buff/cache   available
Mem:          3.6Gi       1.1Gi       938Mi       972Mi       1.6Gi       1.4Gi
Swap:            0B          0B          0B

on cos-81 it was 112Mi and after that the instance hanged, so I couldn't type anything.

ed@n1-standard-1-cos-81-12871-119-0-d0fc1801 ~ $ free -h
              total        used        free      shared  buff/cache   available
Mem:          3.6Gi       2.9Gi       112Mi       532Mi       619Mi        41Mi
Swap:            0B          0B          0B

After some time the instance was available again and it turned out that kernel OOM killer killed cadvisor and e2e_node.test processes:

ed@n1-standard-1-cos-81-12871-119-0-d0fc1801 ~ $ dmesg |grep -i oom_reaper
[  508.340021] oom_reaper: reaped process 1978 (cadvisor), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  563.531990] oom_reaper: reaped process 1802 (e2e_node.test), now anon-rss:0kB, file-rss:0kB, shmem-rss:2296kB

I used master branch for this test. The issue is reproducible almost 100%.

Any suggestions how to continue? I can find out minimum amount of pods that trigger this issue on cos-81 if that helps.

bart0sh · 2020-06-02T14:16:45Z

Lists of most memory consuming processes from both instances:

cos-81:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                   
  50280 root      20   0   11.1g 415796  84460 S   1.7  11.0   0:05.83 e2e_node.test                                                                                             
    310 root      20   0 1944200 139496   2160 S  10.0   3.7   1:21.60 dockerd                                                                                                   
  50150 root      20   0 1623228 104092  63032 S   5.3   2.7   0:04.91 kubelet                                                                                                   
  49992 root      20   0  844920  91156  69480 S   0.0   2.4   0:00.40 e2e_node.test                                                                                             
    304 root      20   0 2522160  40716      0 S   1.7   1.1   0:09.16 containerd                                                                                                
     94 root      20   0  177084  30312  29548 S   0.7   0.8   0:06.56 systemd-journal                                                                                           
  54279 root      20   0  744364  28104  17132 S   1.7   0.7   0:00.34 cadvisor

cos-69:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                   
 181663 root      20   0   10.8g 406892  86520 S   2.0  10.7   0:05.81 e2e_node.test                                                                                             
     91 root      20   0  319452 210280 209788 S   0.0   5.6   0:55.59 systemd-journal                                                                                           
      1 root      20   0  218364 121224   5096 S   0.0   3.2   3:26.80 systemd                                                                                                   
 181469 root      20   0  517168  93324  69268 S   0.0   2.5   0:00.36 e2e_node.test                                                                                             
 181590 root      20   0  737092  85860  64240 S   0.7   2.3   0:00.68 kubelet                                                                                                   
    306 root      20   0 1444100  61816  22000 S   0.3   1.6   2:48.58 dockerd                                                                                                   
 181730 root      20   0  728228  30364  17484 S   1.7   0.8   0:00.41 cadvisor

bart0sh · 2020-06-03T15:25:09Z

I've tested this with different COS images. It looks like the test starts failing on cos-dev-73-11636-0-0:

Here is a list of images I've tested:

cos-69-10895-385-0 works
cos-73-11647-534-0 doesn't work
cos-stable-71-11151-71-0 works
cos-stable-72-11316-171-0 works
cos-dev-73-11391-0-0 works
cos-dev-73-11517-0-0 works
cos-dev-73-11553-0-0 works
cos-dev-73-11636-0-0 doesn't work
cos-dev-73-11647-18-0 doesn't work
cos-beta-73-11647-35-0 doesn't work
cos-73-11647-112-0 doesn't work
cos-73-11647-559-0 doesn't work

release notes for cos-dev-73-11636-0-0 (taken from Container-Optimized OS - Release Notes):

Date:           Jan 24, 2019
Kernel:         ChromiumOS-4.14
Kubernetes:     v1.13.2
Docker:         v18.09.0
Changelog (vs 73-11553-0-0):
    * Made containerd run as a standalone systemd service.
    * Updated the built-in kubelet to 1.13.2.
    * Reenabled kernel.softlockup_all_cpu_backtrace sysctl.
    * Disabled the CONFIG_DEVMEM configuration option in the kernel.
    * Enabled kernel module signing.
    * Installed a new package keyutils.
    * Updated mdadm to 4.1.

MHBauer · 2020-06-04T00:05:52Z

I don't know if it's the root cause, but the containerd shim has gotten a little bit fatter over time. Maybe just enough to throw it over the edge.

I think we need to take a step back and look at the contents and users of the file a bit deeper. I see duplication now that the image references are all updated to the most up to date. I also think we could probably modify the caller to reduce the duplication.

I'm not sure if whoever relies on these outputs is paying attention. @lorqor

bsdnet · 2020-06-05T04:29:28Z

Thanks @bart0sh bart0sh
The following change looks suspicious:

Made containerd run as a standalone systemd service.

bart0sh · 2020-06-05T05:23:44Z

@bsdnet

The following change looks suspicious:

Made containerd run as a standalone systemd service.

What can we do about it?

In my opinion we still have two choices short term:

decrease amount of pods created in the test
increase amount of memory in the instance (this PR)

Any other ideas?

bsdnet · 2020-06-05T05:42:17Z

I think for now, we need to "decrease amount of pods created in the test".
Containerd will be independent in future. However, I am surprised to see
that there are 15% impact (105-90)/105

bart0sh · 2020-06-05T08:53:27Z

@bsdnet

However, I am surprised to see that there are 15% impact (105-90)/105

It didn't work with 100 pods. I thought that 10% lower number would give us enough memory and safety buffer. I can try if it works with 95 if that matters.

bart0sh · 2020-06-05T09:47:56Z

@bsdnet I've tried increasing number of pods to 95. It triggered OOM killer and killed cadvisor and e2e_node.test.

Here is a memory consumption picture around the peak of consumption, just before OOM killer starts its job. Note that 98.7% of memory has been consumed.


top - 09:26:17 up 7 min,  1 user,  load average: 71.86, 23.30, 8.49
Tasks: 474 total,   1 running, 469 sleeping,   0 stopped,   4 zombie
%Cpu(s): 30.3 us, 32.4 sy,  0.3 ni,  0.0 id, 37.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 98.7/3697.2   [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
MiB Swap:  0.0/0.0      [                                                                                                    ]

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                   
   1814 root      20   0   11.2g 432152  88004 S   2.4  11.4   0:12.46 e2e_node.test                                                                                             
    298 root      20   0 1950748 149912      0 S   0.2   4.0   0:42.91 dockerd                                                                                                   
   1600 root      20   0 1625568 133092  65312 S   4.4   3.5   0:16.31 kubelet                                                                                                   
    756 root      20   0  853820 125136  70884 S   0.6   3.3   0:02.28 e2e_node.test                                                                                             
   1976 root      20   0  778108  43952     64 S   1.7   1.2   0:08.62 cadvisor                                                                                                  
    292 root      20   0 2333396  33328      0 S   0.2   0.9   0:02.01 containerd                                                                                                
  15211 root      20   0  599944  20528   2180 D   0.8   0.5   0:00.08 containerd                                                                                                
  15159 root      20   0  599944  20460   2108 D   0.6   0.5   0:00.08 containerd                                                                                                
  15174 root      20   0  599944  20316   1976 D   0.6   0.5   0:00.08 containerd                                                                                                
  15167 root      20   0  599944  20312   1968 D   0.8   0.5   0:00.08 containerd                                                                                                
  15136 root      20   0  599944  20276   1944 D   0.9   0.5   0:00.09 containerd                                                                                                
  15219 root      20   0  599944  20240   1888 D   0.6   0.5   0:00.08 containerd                                                                                                
  15201 root      20   0  599944  20228   1880 D   0.8   0.5   0:00.08 containerd                                                                                                
  15150 root      20   0  525892  19892   1600 D   0.3   0.5   0:00.05 containerd                                                                                                
  15181 root      20   0  525892  19848   1560 D   0.3   0.5   0:00.05 containerd                                                                                                
  15214 root      20   0  525892  19792   1500 D   0.2   0.5   0:00.04 containerd                                                                                                
  15192 root      20   0  599944  19748   1408 D   0.6   0.5   0:00.07 containerd                                                                                                
  15198 root      20   0  598536  19568   1308 D   0.8   0.5   0:00.08 containerd                                                                                                
  15157 root      20   0  599944  19560   1216 D   0.9   0.5   0:00.09 containerd                                                                                                
  15163 root      20   0  599944  19496   1852 D   0.6   0.5   0:00.07 containerd                                                                                                
  15199 root      20   0  599944  19348   1092 D   1.5   0.5   0:00.14 containerd                                                                                                
  15180 root      20   0  598536  19332   1080 D   0.8   0.5   0:00.08 containerd                                                                                                
  15160 root      20   0  599944  19284   1024 D   0.6   0.5   0:00.07 containerd                                                                                                
  15173 root      20   0  599880  19264   1004 D   1.7   0.5   0:00.14 containerd                                                                                                
  15212 root      20   0  599944  19196   1568 D   0.6   0.5   0:00.07 containerd                                                                                                
  15166 root      20   0  599944  19180    920 D   0.3   0.5   0:00.05 containerd                                                                                                
  15178 root      20   0  599944  19124    824 D   0.6   0.5   0:00.07 containerd                                                                                                
  15179 root      20   0  599944  19116   1544 D   0.5   0.5   0:00.07 containerd                                                                                                
  15188 root      20   0  599944  19012    720 D   0.6   0.5   0:00.07 containerd                                                                                                
  15128 root      20   0  599944  18996    720 D   0.6   0.5   0:00.08 containerd                                                                                                
  15162 root      20   0  599944  18940   1328 D   0.5   0.5   0:00.07 containerd                                                                                                
  15218 root      20   0  599944  18880   1264 D   0.6   0.5   0:00.07 containerd                                                                                                
  15158 root      20   0  599944  18868   1248 D   0.6   0.5   0:00.07 containerd                                                                                                
  15204 root      20   0  599944  18848    952 D   0.6   0.5   0:00.07 containerd                                                                                                
  15016 root      20   0  599944  18840   1012 D   0.3   0.5   0:00.08 containerd                                                                                                
...

I agree with you regarding containerd being a culprit here.

With 90 pods peak of memory consumption is around 95%. It makes it possible to avoid OOM triggering, but it's still quite high in my opinion.

vpickard · 2020-06-05T19:26:05Z

@bartosh I think the preferred approach to fixing the benchmark test is in your other PR 91813, which reduces the number of test pods. Meaning, this PR can be closed now.

kubernetes/kubernetes#91813

I also opened this issue for tracking the root cause of the increase in memory consumption.
#17853

bsdnet · 2020-06-06T04:36:55Z

Thanks @bart0sh for the detailed info.
From the snapshot you posted, it is about 15% memory.
Sorry, I am kept busy this week, and did not get time responds timely.

bart0sh · 2020-06-06T13:06:28Z

@vpickard Closing as suggested. I'll submit another PR to change job yaml.

bart0sh · 2020-06-16T11:54:21Z

Reopening as decreasing amount of pods to 90 is not an option because 100 is an official maximum.

Note that machine types can be changed after #17853 is fixed. However, we shouldn't wait for that. We need to fix broken tests.

Jobs that create 105 pods on COS are regularly triggering kernel OOM killer. That causes job falures. Used n1-standard-2 instance type with 7.5 Gb RAM to give tests processes more memory.

bart0sh · 2020-06-23T12:55:54Z

Closing as kubernetes/kubernetes#91813 has been merged. As we decreased amount of pods there is no need to use n1-standard-2 instances.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 28, 2020

k8s-ci-robot requested review from Random-Liu and tallclair May 28, 2020 12:52

k8s-ci-robot requested review from derekwaynecarr, dims, MHBauer, spiffxp and vpickard May 28, 2020 12:53

k8s-ci-robot requested a review from karan May 28, 2020 19:45

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 28, 2020

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 29, 2020

bsdnet mentioned this pull request Jun 1, 2020

REQUEST: New membership for bsdnet kubernetes/org#1913

Closed

1 task

bart0sh mentioned this pull request Jun 5, 2020

e2e_node: fix node-kubelet-benchmark test kubernetes/kubernetes#91813

Merged

vpickard mentioned this pull request Jun 5, 2020

node-kubelet-benchmark fails with OOM #17853

Closed

bart0sh closed this Jun 6, 2020

bart0sh mentioned this pull request Jun 9, 2020

kubelet hugepages and benchmark tests panic kubernetes/kubernetes#91937

Closed

dims mentioned this pull request Jun 9, 2020

panic: invalid Go type int for field go_etcd_io.etcd.etcdserver.etcdserverpb.loggablePutRequest.value_size etcd-io/etcd#11992

Closed

bart0sh reopened this Jun 16, 2020

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 16, 2020

Use machine type n1-standard-2 to avoid OOM killing

737da54

Jobs that create 105 pods on COS are regularly triggering kernel OOM killer. That causes job falures. Used n1-standard-2 instance type with 7.5 Gb RAM to give tests processes more memory.

bart0sh force-pushed the PR0007-n1-standard-2 branch from 794a164 to 737da54 Compare June 16, 2020 12:09

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2020

bart0sh closed this Jun 23, 2020

bart0sh mentioned this pull request Sep 28, 2020

node-kubelet-benchmark flaking kubernetes/kubernetes#95043

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use machine type n1-standard-2 to avoid OOM killing #17743

Use machine type n1-standard-2 to avoid OOM killing #17743

bart0sh commented May 28, 2020

k8s-ci-robot commented May 28, 2020

bart0sh commented May 28, 2020

MHBauer commented May 28, 2020

bart0sh commented May 28, 2020

spiffxp commented May 28, 2020

spiffxp commented May 28, 2020

spiffxp commented May 28, 2020

k8s-ci-robot commented May 28, 2020

karan commented May 28, 2020 •

edited

Loading

bsdnet commented May 29, 2020

bart0sh commented May 29, 2020 •

edited

Loading

bsdnet commented May 29, 2020

MHBauer commented May 29, 2020

bsdnet commented May 30, 2020

bart0sh commented Jun 2, 2020 •

edited

Loading

bart0sh commented Jun 2, 2020

bart0sh commented Jun 3, 2020

MHBauer commented Jun 4, 2020

bsdnet commented Jun 5, 2020

bart0sh commented Jun 5, 2020

bsdnet commented Jun 5, 2020

bart0sh commented Jun 5, 2020

bart0sh commented Jun 5, 2020

vpickard commented Jun 5, 2020 •

edited

Loading

bsdnet commented Jun 6, 2020

bart0sh commented Jun 6, 2020

bart0sh commented Jun 16, 2020

bart0sh commented Jun 23, 2020

Use machine type n1-standard-2 to avoid OOM killing #17743

Use machine type n1-standard-2 to avoid OOM killing #17743

Conversation

bart0sh commented May 28, 2020

k8s-ci-robot commented May 28, 2020

bart0sh commented May 28, 2020

MHBauer commented May 28, 2020

bart0sh commented May 28, 2020

spiffxp commented May 28, 2020

spiffxp commented May 28, 2020

spiffxp commented May 28, 2020

k8s-ci-robot commented May 28, 2020

karan commented May 28, 2020 • edited Loading

bsdnet commented May 29, 2020

bart0sh commented May 29, 2020 • edited Loading

bsdnet commented May 29, 2020

MHBauer commented May 29, 2020

bsdnet commented May 30, 2020

bart0sh commented Jun 2, 2020 • edited Loading

bart0sh commented Jun 2, 2020

bart0sh commented Jun 3, 2020

MHBauer commented Jun 4, 2020

bsdnet commented Jun 5, 2020

bart0sh commented Jun 5, 2020

bsdnet commented Jun 5, 2020

bart0sh commented Jun 5, 2020

bart0sh commented Jun 5, 2020

vpickard commented Jun 5, 2020 • edited Loading

bsdnet commented Jun 6, 2020

bart0sh commented Jun 6, 2020

bart0sh commented Jun 16, 2020

bart0sh commented Jun 23, 2020

karan commented May 28, 2020 •

edited

Loading

bart0sh commented May 29, 2020 •

edited

Loading

bart0sh commented Jun 2, 2020 •

edited

Loading

vpickard commented Jun 5, 2020 •

edited

Loading