MySQL 5.7 container OOM kill / changing the NOFILE limit #1136

zlangbert · 2020-09-30T00:12:15Z

I am running a 1.17 EKS cluster that was we recently tried to switch to bottlerocket nodes. We ran into an issue where MySQL 5.7 containers were being OOM killed. This does not happen on the latest AL2 EKS AMI.

After some searching I came across these issues / PRs:
docker-library/mysql#579
kubernetes-sigs/kind#760
containerd/containerd#3201
containerd/containerd#3202 (this one has since been reverted without addressing the reason it was changed in the first place)

I am having trouble actually testing this change on bottlerocket, but it appears that setting LimitNOFILE to something like 1048576 should fix this issue.

I also found that the mysql:8 image does not exhibit this issue.

So couple questions:

This doesn't seem like a bottlerocket bug necessarily, but wanted to at least report the issue and see if it something that could be addressed, especially given the behavior change compared to the AL2 EKS AMI.
Is there any friendly way of making that kind of modification to the containerd unit file without rebuilding the image myself? Even on an already running box I was having trouble trying to test this because I couldn't find a way to edit the file after sudo sheltie

What I expected to happen:
I would expect the mysql container to start and use a resonable amount of memory.

What actually happened:
The mysql container attempts to consume many gigabytes of ram and is OOM killed when there are resource limits placed on it.

How to reproduce the problem:

Run an EKS cluster with bottlerocket nodes and apply this manifest. The mysql pod will go into a restart loop being OOM killed over and over. If you remove the resource limits you can see the pod will take up to 16gb of RAM doing nothing.

apiVersion: v1
kind: Pod
metadata:
  name: mysql-test
  namespace: default
spec:
  containers:
    - name: mysql
      imagePullPolicy: Always
      image: 'mysql:5.7'
      env:
        - name: MYSQL_DATABASE
          value: test
        - name: MYSQL_ROOT_PASSWORD
          value: test
      resources:
        limits:
          cpu: 500m
          memory: 500Mi
        requests:
          cpu: 500m
          memory: 300Mi

Image I'm using:
bottlerocket 1.0.1 / x86_64 in us-west-2 - ami-0d222c28fd8edb9ff

The text was updated successfully, but these errors were encountered:

zmrow · 2020-10-15T15:59:39Z

Thanks for putting in this issue! The AL2 EKS AMI sets LimitNOFILE=1048576 which is why you’re not seeing the problem there. The issue stems from cri-containerd not providing a mechanism to override the default containerd limit for a container. The default limit is inherited from the containerd process itself (which is configured in this case via our unit file).

Based on the comments in the containerd unit file, setting LimitNOFILE to non-zero can cause performance issues. Given this knowledge, we’d prefer not to change our default setting for containerd since the issue (as of now) looks to be scoped to MySQL and as you mentioned, appears to be fixed in a later version of MySQL. That being said, we’d like to work upstream to help provide better mechanisms to control these limits.

There is also an open Kubernetes issue to make limits configurable.

zlangbert · 2020-10-19T17:49:48Z

Hey @zmrow, thanks for the response! That makes sense, and what I suspected was going on. I agree with your position that changing the default in bottlerocket is not a good solution, though I wonder if providing it as a config option would at least make sense so people coming from the AL2 EKS AMI have a way to mitigate this issue?

It would be great to see this supported upstream, though seems like an issue that has been open for a long time without too much movement.

As for the other part of my question, is there a recommend way you would suggest to try and change this in bottlerocket? Seems like this type of thing might be possible by defining a custom container? Or would I definitely need a custom build? Any guidance you can give for what the best way to do that would be?

zmrow · 2020-10-21T18:54:17Z

I did a little digging in the upstream code and found what (I think) are the right spots to make the change. A good option could be to provide a reasonable and configurable default to containerd-cri of 1048576. The criService type has a criconfig.Config which contains the PluginConfig. It seems reasonable to add the field to PluginConfig, and then set it in the DefaultConfig() method.

We’ll take a look at the feasibility of carrying a short-term patch to our containerd package, while we investigate a proper upstream fix.

zmrow · 2020-10-23T15:51:22Z

@zlangbert Just to follow up here - I created a local patch for cri to be evaluated by the team. If accepted it should be in our next Bottlerocket release. The intent is to upstream this patch, but as mentioned in the PR, there is some additional work to be done.

zlangbert · 2020-12-14T16:18:39Z

@zmrow We have switched back to bottlerocket and have had no trouble with 1.0.3. Thank you so much for your help!

zmrow · 2020-12-14T17:08:57Z

@zlangbert Awesome, glad to hear it. I'll close this issue for now, but feel free to re-open or start a new issue should you see further problems.

I've opened #1240 to track the upstream-ing of our cri patch.

rverma-dev · 2021-09-10T01:50:59Z

@zmrow we are also facing issues with rlimit on our elasticsearch clusters, and the current openfiles are shown as (-n) 65536. Is there a way to use the increased limit, the clusters are bootstrapped using eksct.

zmrow mentioned this issue Oct 23, 2020

cri: set default RLIMIT_NOFILE #1180

Merged

samuelkarp added the type/bug Something isn't working label Nov 13, 2020

bcressey mentioned this issue Nov 29, 2020

The CPU usage is very high(almost %100) when app is running in the bottlerocket node #1226

Closed

jhaynes added the priority/p1 label Dec 10, 2020

bcressey added the status/needs-triage Pending triage or re-evaluation label Dec 11, 2020

gregdek assigned zmrow Dec 11, 2020

jhaynes removed the status/needs-triage Pending triage or re-evaluation label Dec 11, 2020

zmrow mentioned this issue Dec 14, 2020

Upstream cri patch for default RLIMIT_NOFILE #1240

Closed

zmrow closed this as completed Dec 14, 2020

zmrow mentioned this issue Sep 28, 2021

cri: Support configuration of process RLIMIT_NOFILE limits containerd/containerd#6063

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MySQL 5.7 container OOM kill / changing the NOFILE limit #1136

MySQL 5.7 container OOM kill / changing the NOFILE limit #1136

zlangbert commented Sep 30, 2020 •

edited

Loading

zmrow commented Oct 15, 2020

zlangbert commented Oct 19, 2020

zmrow commented Oct 21, 2020

zmrow commented Oct 23, 2020

zlangbert commented Dec 14, 2020

zmrow commented Dec 14, 2020

rverma-dev commented Sep 10, 2021

MySQL 5.7 container OOM kill / changing the NOFILE limit #1136

MySQL 5.7 container OOM kill / changing the NOFILE limit #1136

Comments

zlangbert commented Sep 30, 2020 • edited Loading

zmrow commented Oct 15, 2020

zlangbert commented Oct 19, 2020

zmrow commented Oct 21, 2020

zmrow commented Oct 23, 2020

zlangbert commented Dec 14, 2020

zmrow commented Dec 14, 2020

rverma-dev commented Sep 10, 2021

zlangbert commented Sep 30, 2020 •

edited

Loading