Set resource limits for containers #68

bigg01 · 2020-05-29T20:08:10Z

As a Platform Engineer I need to control usage of CPU Memory per Container.
Please add ressource limits:

resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            memory: 200Mi
            cpu: 400m

The Aide PODS were running for a day and used 1.6 Gb of memory for no reason.

cheers

The text was updated successfully, but these errors were encountered:

jhrozek · 2020-06-02T11:44:30Z

Thank you for filing the issue. We'll look into it next sprint.
While the resource limits are something we wanted to set either way, we also want to see if we can find the root cause of the leak.

openshift-bot · 2020-10-16T06:40:07Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2020-11-15T08:29:08Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2020-12-15T10:25:06Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2020-12-15T10:25:25Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

felixkrohn · 2021-04-07T19:12:44Z

Would it be possible to re-open this issue? After a week running the pods consume about 3GiB RAM each.
Current workaround could be to set namespaced defaults, but I find this less elegant.

JAORMX · 2021-04-08T05:30:30Z

@felixkrohn what versoin are you using?

felixkrohn · 2021-04-08T05:57:41Z

0.1.13 as distributed by RH on operatorhub (image: http://quay.io/file-integrity-operator/file-integrity-operator:0.1.13)

felixkrohn · 2021-04-12T07:10:20Z

Is there anything I can do to help you debug this? (we're not yet running it in production)

JAORMX · 2021-04-12T07:36:59Z

@felixkrohn we'll look into it.

mrogers950 · 2021-04-12T22:04:30Z

@felixkrohn would you be able to run with the steps outlined in https://mrogers950.gitlab.io/openshift/2021/04/12/fio-profile/ ?
It will enable pprof for the ds pods, but requires a container build from source. If you can capture the heap data at a few points (like once a few days in, then again the next week), that could be useful for us to take a look at. I've traced the same slow leak myself and it would be good to have a comparison.

felixkrohn · 2021-04-13T15:29:33Z

@mrogers950 Thanks to the great how-to 👍 I got it running, and will send you the .gz files next week (don't hesitate to remind me should I forget...)

felixkrohn · 2021-05-18T13:01:17Z

Did the traces help in any way?
Would it be OK to add memory limits (something between 500 and 1000M) to the f-i-o deployment, or do you expect this could cause unwanted side effects or even reduce reliability of the results?

mrogers950 · 2021-05-18T19:05:07Z

@felixkrohn yes, thanks for your help! the pprof data shows what I expected, which is that the daemon's actual heap usage is only a small percentage of the total that is reported by the cluster:
Here only about 7mb total:

This coincides with what I found about the reserved space used by the go runtime which, I tried to outline briefly here: https://mrogers950.gitlab.io/golang/2021/03/12/wild-crazy-golang-mem/
So I believe the high usage will be addressed by golang/go#44167 , (referenced by golang/go#43699).

But I think that now we can support pod limits properly because the daemon pods are more robust and should be able to handle restart by OOM occasionally. I'll work on a PR for that.

felixkrohn · 2021-05-18T20:03:47Z

Great news! thanks for the update.

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 16, 2020

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 15, 2020

openshift-ci-robot closed this as completed Dec 15, 2020

JAORMX reopened this Apr 8, 2021

JAORMX changed the title ~~Set ressoure limits for containers~~ Set resource limits for containers Apr 8, 2021

JAORMX added kind/bug Categorizes issue or PR as related to a bug. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Apr 12, 2021

mrogers950 mentioned this issue May 18, 2021

Add operator and aide-ds pod limits #176

Merged

JAORMX closed this as completed in #176 May 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set resource limits for containers #68

Set resource limits for containers #68

bigg01 commented May 29, 2020

jhrozek commented Jun 2, 2020

openshift-bot commented Oct 16, 2020

openshift-bot commented Nov 15, 2020

openshift-bot commented Dec 15, 2020

openshift-ci-robot commented Dec 15, 2020

felixkrohn commented Apr 7, 2021

JAORMX commented Apr 8, 2021

felixkrohn commented Apr 8, 2021

felixkrohn commented Apr 12, 2021 •

edited

Loading

JAORMX commented Apr 12, 2021

mrogers950 commented Apr 12, 2021

felixkrohn commented Apr 13, 2021

felixkrohn commented May 18, 2021

mrogers950 commented May 18, 2021

felixkrohn commented May 18, 2021

Set resource limits for containers #68

Set resource limits for containers #68

Comments

bigg01 commented May 29, 2020

jhrozek commented Jun 2, 2020

openshift-bot commented Oct 16, 2020

openshift-bot commented Nov 15, 2020

openshift-bot commented Dec 15, 2020

openshift-ci-robot commented Dec 15, 2020

felixkrohn commented Apr 7, 2021

JAORMX commented Apr 8, 2021

felixkrohn commented Apr 8, 2021

felixkrohn commented Apr 12, 2021 • edited Loading

JAORMX commented Apr 12, 2021

mrogers950 commented Apr 12, 2021

felixkrohn commented Apr 13, 2021

felixkrohn commented May 18, 2021

mrogers950 commented May 18, 2021

felixkrohn commented May 18, 2021

felixkrohn commented Apr 12, 2021 •

edited

Loading