Setting kubelet log level to recommended defaults #2166

kevchu3 · 2020-10-21T18:48:14Z

- What I did
The current kubelet log level is set to debug (KUBELET_LOG_LEVEL=4), per this Red Hat Knowledgebase article ¹. For default settings of a cluster, I'm proposing to set the log level to a more appropriate value (KUBELET_LOG_LEVEL=2). I've bolded the description of the proposed log level below:

Verbosity	Description
--v=0	Generally useful so it is ALWAYS visible to an operator.
--v=1	A reasonable default log level if you don't want verbosity.
--v=2	Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level.
--v=3	Extended information about changes.
--v=4	Debug level verbosity.
--v=6	Display requested resources.
--v=7	Display HTTP request headers.
--v=8	Display HTTP request contents.

- How to verify it
Steps to verify are provided in this Knowledgebase article: https://access.redhat.com/solutions/4619431

- Description for the changelog

Setting kubelet log level to recommended defaults

https://access.redhat.com/solutions/4619431 ↩

Fedosin

/lgtm

cgwalters · 2020-10-21T19:36:38Z

Sounds great to me, every time I have to look at a node's journal the spam from hyperkube makes it much harder to read. Plus makes it more likely that useful information rotates out.

So
/approve
from that PoV.

But I think you need to get this by the node team.
/hold
for that.

openshift-ci-robot · 2020-10-21T19:36:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, Fedosin, kevchu3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~templates/master/01-master-kubelet/OWNERS~~ [cgwalters]
~~templates/worker/01-worker-kubelet/OWNERS~~ [cgwalters]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kevchu3 · 2020-10-21T21:19:03Z

/retest

haircommander · 2020-10-22T00:04:37Z

there have definitely been times that we benefitted from having the default log level that high. I am leaning towards NACK on changing it. @rphillips @mrunalp @sjenning thoughts?

cgwalters · 2020-10-22T00:13:20Z

Is the kubelet log level dynamically changeable via e.g. some grpc or something? How many cases would be covered if we made it easy for people to frob that on on one or more nodes?

What about logging debug level things to a separate log file as kube-apiserver does?

mrunalp · 2020-10-22T15:12:49Z

I think having it at 4 in CI is very helpful in chasing and fixing some of the races and bugs that we still have. In production, we could go down to a less verbose level.

cgwalters · 2020-10-22T15:27:44Z

I think having it at 4 in CI is very helpful in chasing and fixing some of the races and bugs that we still have. In production, we could go down to a less verbose level.

Hmmm. I suppose we could pretty easily teach our CI jobs to always inject a MachineConfig or kubeletconfig to bump this, but I think that's a somewhat dangerous path to go down because it makes our CI not look like production. That's a big trap.

mrobson · 2020-11-05T12:56:43Z

The logging volume level 4 generates into the infra indexes of the logging stack is significant in OCP 4.5. A more dynamic way to control the log level - globally or, on a per node basis for specific debugging, would better server both CI and users.

sreber84 · 2020-11-05T13:03:31Z

I'm also in favor to make this change more dynamic so that one can choose what log-level is best. I can understand that in certain situation it will be good to have 4 but in normal production environments we may want to have 2. This is also based on experience where we have seen environments with about 70 nodes generating about 500 GB of infrastructure logs on daily basis, overloading the logging stack!

kevchu3 · 2020-11-05T21:59:58Z

/retest

openshift-merge-robot · 2020-11-05T22:01:30Z

@kevchu3: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/okd-e2e-aws	`e4fa3e3`	link	`/test okd-e2e-aws`
ci/prow/e2e-aws	`e4fa3e3`	link	`/test e2e-aws`
ci/prow/e2e-aws-serial	`e4fa3e3`	link	`/test e2e-aws-serial`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2020-11-06T13:17:35Z

@kevchu3: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kikisdeliveryservice · 2020-11-17T22:57:18Z

superseded by: #2211

its-saurabhjain · 2020-12-02T20:21:45Z

Is it recommend to have a verbosity --v =2 or 4 set for OpenShift Prod environment. We have an incident where ES disk was filling very fast and we are thinking of setting the verbosity to 2 for Prod. However a different point of view came in for Prod, as the goal is to minimize the time to reach a root cause conclusion as fast as possible and to avoid repeat issues. So, lowering the verbosity in prod and only increasing it after a failure or degradation event will require the event to re-occur quick because changing verbosity will cause the nodes affected by the verbosity change to be rebooted as part of restarting Kubelet process restart. This reset process could delay seeing the issue, especially if it's related to performance. Any thoughts or recommendations

Setting kubelet log level to recommended defaults

e4fa3e3

openshift-ci-robot requested review from EmilienM and Fedosin October 21, 2020 18:48

Fedosin approved these changes Oct 21, 2020

View reviewed changes

openshift-ci-robot assigned Fedosin Oct 21, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 21, 2020

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 21, 2020

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 21, 2020

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2020

kikisdeliveryservice closed this Nov 17, 2020

kikisdeliveryservice removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 17, 2020

Setting kubelet log level to recommended defaults #2166

Setting kubelet log level to recommended defaults #2166

Uh oh!

Conversation

kevchu3 commented Oct 21, 2020

Footnotes

Uh oh!

Fedosin left a comment

Choose a reason for hiding this comment

Uh oh!

cgwalters commented Oct 21, 2020

Uh oh!

openshift-ci-robot commented Oct 21, 2020

Uh oh!

kevchu3 commented Oct 21, 2020

Uh oh!

haircommander commented Oct 22, 2020

Uh oh!

cgwalters commented Oct 22, 2020

Uh oh!

mrunalp commented Oct 22, 2020

Uh oh!

cgwalters commented Oct 22, 2020

Uh oh!

mrobson commented Nov 5, 2020

Uh oh!

sreber84 commented Nov 5, 2020

Uh oh!

kevchu3 commented Nov 5, 2020

Uh oh!

openshift-merge-robot commented Nov 5, 2020

Uh oh!

openshift-ci-robot commented Nov 6, 2020

Uh oh!

kikisdeliveryservice commented Nov 17, 2020

Uh oh!

its-saurabhjain commented Dec 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants