Skip to content

Conversation

@kevchu3
Copy link
Member

@kevchu3 kevchu3 commented Oct 21, 2020

- What I did
The current kubelet log level is set to debug (KUBELET_LOG_LEVEL=4), per this Red Hat Knowledgebase article 1. For default settings of a cluster, I'm proposing to set the log level to a more appropriate value (KUBELET_LOG_LEVEL=2). I've bolded the description of the proposed log level below:

Verbosity Description
--v=0 Generally useful so it is ALWAYS visible to an operator.
--v=1 A reasonable default log level if you don't want verbosity.
--v=2 Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level.
--v=3 Extended information about changes.
--v=4 Debug level verbosity.
--v=6 Display requested resources.
--v=7 Display HTTP request headers.
--v=8 Display HTTP request contents.

- How to verify it
Steps to verify are provided in this Knowledgebase article: https://access.redhat.com/solutions/4619431

- Description for the changelog

Setting kubelet log level to recommended defaults

Footnotes

  1. https://access.redhat.com/solutions/4619431

Copy link
Contributor

@Fedosin Fedosin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 21, 2020
@cgwalters
Copy link
Member

Sounds great to me, every time I have to look at a node's journal the spam from hyperkube makes it much harder to read. Plus makes it more likely that useful information rotates out.

So
/approve
from that PoV.

But I think you need to get this by the node team.
/hold
for that.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 21, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, Fedosin, kevchu3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 21, 2020
@kevchu3
Copy link
Member Author

kevchu3 commented Oct 21, 2020

/retest

@haircommander
Copy link
Member

there have definitely been times that we benefitted from having the default log level that high. I am leaning towards NACK on changing it. @rphillips @mrunalp @sjenning thoughts?

@cgwalters
Copy link
Member

Is the kubelet log level dynamically changeable via e.g. some grpc or something? How many cases would be covered if we made it easy for people to frob that on on one or more nodes?

What about logging debug level things to a separate log file as kube-apiserver does?

@mrunalp
Copy link
Member

mrunalp commented Oct 22, 2020

I think having it at 4 in CI is very helpful in chasing and fixing some of the races and bugs that we still have. In production, we could go down to a less verbose level.

@cgwalters
Copy link
Member

I think having it at 4 in CI is very helpful in chasing and fixing some of the races and bugs that we still have. In production, we could go down to a less verbose level.

Hmmm. I suppose we could pretty easily teach our CI jobs to always inject a MachineConfig or kubeletconfig to bump this, but I think that's a somewhat dangerous path to go down because it makes our CI not look like production. That's a big trap.

@mrobson
Copy link

mrobson commented Nov 5, 2020

The logging volume level 4 generates into the infra indexes of the logging stack is significant in OCP 4.5. A more dynamic way to control the log level - globally or, on a per node basis for specific debugging, would better server both CI and users.

@sreber84
Copy link

sreber84 commented Nov 5, 2020

I'm also in favor to make this change more dynamic so that one can choose what log-level is best. I can understand that in certain situation it will be good to have 4 but in normal production environments we may want to have 2. This is also based on experience where we have seen environments with about 70 nodes generating about 500 GB of infrastructure logs on daily basis, overloading the logging stack!

@kevchu3
Copy link
Member Author

kevchu3 commented Nov 5, 2020

/retest

@openshift-merge-robot
Copy link
Contributor

@kevchu3: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/okd-e2e-aws e4fa3e3 link /test okd-e2e-aws
ci/prow/e2e-aws e4fa3e3 link /test e2e-aws
ci/prow/e2e-aws-serial e4fa3e3 link /test e2e-aws-serial

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2020
@openshift-ci-robot
Copy link
Contributor

@kevchu3: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice
Copy link
Contributor

superseded by: #2211

@kikisdeliveryservice kikisdeliveryservice removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 17, 2020
@its-saurabhjain
Copy link

Is it recommend to have a verbosity --v =2 or 4 set for OpenShift Prod environment. We have an incident where ES disk was filling very fast and we are thinking of setting the verbosity to 2 for Prod. However a different point of view came in for Prod, as the goal is to minimize the time to reach a root cause conclusion as fast as possible and to avoid repeat issues. So, lowering the verbosity in prod and only increasing it after a failure or degradation event will require the event to re-occur quick because changing verbosity will cause the nodes affected by the verbosity change to be rebooted as part of restarting Kubelet process restart. This reset process could delay seeing the issue, especially if it's related to performance. Any thoughts or recommendations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.