-
Notifications
You must be signed in to change notification settings - Fork 213
Bug 1879184: Log object updates and show existing/required diff #561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1879184: Log object updates and show existing/required diff #561
Conversation
52f3764 to
41a7a6a
Compare
9919448 to
6ab7550
Compare
|
/cc @jottofar |
2eb9aca to
8c322cb
Compare
8c322cb to
ae8d5fd
Compare
ae8d5fd to
8809ddb
Compare
|
/retest |
849de5b to
bef335b
Compare
|
/test unit |
bef335b to
dc0f19b
Compare
dc0f19b to
c59a7f8
Compare
|
@vrutkovs: This pull request references Bugzilla bug 1879184, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@vrutkovs: This pull request references Bugzilla bug 1879184, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test unit |
| expected := required.DeepCopy() | ||
| for k, v := range existing.Object { | ||
| if skipKeys.Has(k) { | ||
| expected.Object[k] = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sort of thing is why I think generically logging diffs is brittle. I'd rather have the Ensure* helpers return a slice of patches instead of the current modified boolean, so we could log the patches "Updating $PROPERTY to $VALUE to clobber $EXISTING_VALUE" instead of having logic about the things we care about clobbering both here and in the Ensure* helpers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging everytime modified is changed may be too verbose. I think we should do that later, on level 4 so that we wouldn't hit master's disks with too much log lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if, instead of twiddling *modified, these helpers return []patch, error, we can log any patch we get at one time, before calling the UPDATE. Should be same number of log lines, and we'd only complain about the in-cluster properties we were trying to stomp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in the opinion of moving these to some verbose mode. We do not need the diff for all CVO runs as I think it will impact the performance of CVO a little. But if we change it some mode which can give more details , that would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only required if we want to see not only that we're hot looping but also why. The latter could be dropped and once we know we're hot looping figuring out the why becomes part of the fix process done off line. Choosing one over the other is a question of how often we believe this is going to occur. If rarely then we probably don't need the "why" in the baseline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CVO can start hotlooping on any non-standard configuration - and its complicated to find out details to fix it.
I'd prefer to include "why" in every run, so that we wouldn't have to request customers to reconfigure CVO to get details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vrutkovs What sort of non-standard configs would they have which are managed by the CVO that we'd want to look into?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can sideload additional manifests during bootstrap
|
/test e2e-agnostic |
|
Failed to install on /retest |
| expected := required.DeepCopy() | ||
| for k, v := range existing.Object { | ||
| if skipKeys.Has(k) { | ||
| expected.Object[k] = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in the opinion of moving these to some verbose mode. We do not need the diff for all CVO runs as I think it will impact the performance of CVO a little. But if we change it some mode which can give more details , that would be fine.
| } | ||
|
|
||
| klog.V(2).Infof("Updating CRD %s", required.Name) | ||
| klog.V(2).Infof("Updating CRD %s due to diff: %v", required.Name, diff.ObjectDiff(existing, required)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should move these to some higher level of verbosity , what about V4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather use the default one, so that we'd see hotloops in CI upgrade jobs. If we move it to v4 we'd have to reconfigure CVO to get them recorded
|
I am wondering if the current PR will have any performance impact or not. If it is not going to have any performance impact then I am fine to merge it as it is. |
It improves performance by fixing hotloops in unstructured resources. Additional log lines might be printed during sync periods, but its effect is negligible |
LalatenduMohanty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: LalatenduMohanty, vrutkovs The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@vrutkovs: All pull requests linked via external trackers have merged: Bugzilla bug 1879184 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherrypick release-4.7 |
|
@vrutkovs: new pull request created: #589 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Logging of resource diffs was introduced to detect hotlooping in the CVO[1]. But this increases the verbosity of logging during updates when the resources have changed which is not useful. Hence logging of the diffs is more useful in the reconcile mode. Also the diff is generated with ObjectDiff[2] which is deprecated. The ObjectDiff function now just uses cmp.Diff[3] internally and all instances of ObjectDiff have been replaced with cmp.Diff. [1] - openshift#561 [2] - https://github.com/kubernetes/apimachinery/blob/235edae7dd90601011bbe3bcd6f84f7dc857b034/pkg/util/diff/diff.go#L57 [3] - https://pkg.go.dev/github.com/google/go-cmp/cmp#Diff
Logging of resource diffs was introduced to detect hotlooping in the CVO[1]. But this increases the verbosity of logging during updates when the resources have changed which is not useful. Hence logging of the diffs is more useful in the reconcile mode. Also the diff is generated with ObjectDiff[2] which is deprecated. The ObjectDiff function now just uses cmp.Diff[3] internally and all instances of ObjectDiff have been replaced with cmp.Diff. [1] - openshift#561 [2] - https://github.com/kubernetes/apimachinery/blob/235edae7dd90601011bbe3bcd6f84f7dc857b034/pkg/util/diff/diff.go#L57 [3] - https://pkg.go.dev/github.com/google/go-cmp/cmp#Diff
Logging of resource diffs was introduced to detect hotlooping in the CVO[1]. But this increases the verbosity of logging during updates when the resources have changed which is not useful. Hence logging of the diffs is more useful in the reconcile mode. Also the diff is generated with ObjectDiff[2] which is deprecated. The ObjectDiff function now just uses cmp.Diff[3] internally and all instances of ObjectDiff have been replaced with cmp.Diff. [1] - openshift#561 [2] - https://github.com/kubernetes/apimachinery/blob/235edae7dd90601011bbe3bcd6f84f7dc857b034/pkg/util/diff/diff.go#L57 [3] - https://pkg.go.dev/github.com/google/go-cmp/cmp#Diff
We've been using --v=5 since 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). But it leads to large quantities of noise from client-side throttling [1], and that throttling is V(3): $ grep -n . vendor/k8s.io/client-go/rest/request.go | grep -B7 -A6 '^597:' 589: switch { 590: case len(retryInfo) > 0: 591: message = fmt.Sprintf("Waited for %v, %s - request: %s:%s", latency, retryInfo, r.verb, r.URL().String()) 592: default: 593: message = fmt.Sprintf("Waited for %v due to client-side throttling, not priority and fairness, request: %s:%s", latency, r.verb, r.URL().String()) 594: } 596: if latency > longThrottleLatency { 597: klog.V(3).Info(message) 598: } 599: if latency > extraLongThrottleLatency { 600: // If the rate limiter latency is very high, the log message should be printed at a higher log level, 601: // but we use a throttled logger to prevent spamming. 602: globalThrottledLogger.Infof("%s", message) 603: } Auditing: $ git --no-pager grep 'klog.V([3-5])' vendor I don't see much that I'd miss. I liked having vendor/github.com/openshift/library-go/pkg/verify's logging back when we made that pivot, but now that that code is old and stable, I'm ok losing it. I've shifted the lib/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V lib) It's mostly hotloop-detection since 40d0a4e (Log object updates and show existing/required diff, 2021-06-03, openshift#561, [2]), and isn't all that noisy since 05e1af7 (Bug 1984414: Log resource diffs on update only in reconcile mode, 2021-07-22, openshift#628, [3]). I've shifted the pkg/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V pkg) It's mostly fairly core stuff, and low-noise except for the per-manifest "Running sync for ..." and "Done syncing for ..." messages [1]. The per-manifest messages can be useful to identify where the CVO is in the sync cycle, so I'm keeping them for now. We may be able to punt them up to higher levels if we get an alternative mechanism for identifying sync cycle positions (e.g. events for task-node completion or task failure). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2034493#c1 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1984414
We've been using --v=5 since 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). But it leads to large quantities of noise from client-side throttling [1], and that throttling is V(3): $ grep -n . vendor/k8s.io/client-go/rest/request.go | grep -B7 -A6 '^597:' 589: switch { 590: case len(retryInfo) > 0: 591: message = fmt.Sprintf("Waited for %v, %s - request: %s:%s", latency, retryInfo, r.verb, r.URL().String()) 592: default: 593: message = fmt.Sprintf("Waited for %v due to client-side throttling, not priority and fairness, request: %s:%s", latency, r.verb, r.URL().String()) 594: } 596: if latency > longThrottleLatency { 597: klog.V(3).Info(message) 598: } 599: if latency > extraLongThrottleLatency { 600: // If the rate limiter latency is very high, the log message should be printed at a higher log level, 601: // but we use a throttled logger to prevent spamming. 602: globalThrottledLogger.Infof("%s", message) 603: } Auditing: $ git --no-pager grep 'klog.V([3-5])' vendor I don't see much that I'd miss. I liked having vendor/github.com/openshift/library-go/pkg/verify's logging back when we made that pivot, but now that that code is old and stable, I'm ok losing it. I've shifted the lib/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V lib) It's mostly hotloop-detection since 40d0a4e (Log object updates and show existing/required diff, 2021-06-03, openshift#561, [2]), and isn't all that noisy since 05e1af7 (Bug 1984414: Log resource diffs on update only in reconcile mode, 2021-07-22, openshift#628, [3]). I've shifted the pkg/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V pkg) It's mostly fairly core stuff, and low-noise except for the per-manifest "Running sync for ..." and "Done syncing for ..." messages [1]. The per-manifest messages can be useful to identify where the CVO is in the sync cycle, so I'm keeping them for now. We may be able to punt them up to higher levels if we get an alternative mechanism for identifying sync cycle positions (e.g. events for task-node completion or task failure). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2034493#c1 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1984414
We've been using --v=5 since 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). But it leads to large quantities of noise from client-side throttling [1], and that throttling is V(3): $ grep -n . vendor/k8s.io/client-go/rest/request.go | grep -B7 -A6 '^597:' 589: switch { 590: case len(retryInfo) > 0: 591: message = fmt.Sprintf("Waited for %v, %s - request: %s:%s", latency, retryInfo, r.verb, r.URL().String()) 592: default: 593: message = fmt.Sprintf("Waited for %v due to client-side throttling, not priority and fairness, request: %s:%s", latency, r.verb, r.URL().String()) 594: } 596: if latency > longThrottleLatency { 597: klog.V(3).Info(message) 598: } 599: if latency > extraLongThrottleLatency { 600: // If the rate limiter latency is very high, the log message should be printed at a higher log level, 601: // but we use a throttled logger to prevent spamming. 602: globalThrottledLogger.Infof("%s", message) 603: } Auditing: $ git --no-pager grep 'klog.V([3-5])' vendor I don't see much that I'd miss. I liked having vendor/github.com/openshift/library-go/pkg/verify's logging back when we made that pivot, but now that that code is old and stable, I'm ok losing it. I've shifted the lib/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V lib) It's mostly hotloop-detection since 40d0a4e (Log object updates and show existing/required diff, 2021-06-03, openshift#561, [2]), and isn't all that noisy since 05e1af7 (Bug 1984414: Log resource diffs on update only in reconcile mode, 2021-07-22, openshift#628, [3]). I've shifted the pkg/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V pkg) It's mostly fairly core stuff, and low-noise except for the per-manifest "Running sync for ..." and "Done syncing for ..." messages [1]. The per-manifest messages can be useful to identify where the CVO is in the sync cycle, so I'm keeping them for now. We may be able to punt them up to higher levels if we get an alternative mechanism for identifying sync cycle positions (e.g. events for task-node completion or task failure). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2034493#c1 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1984414
We've been using --v=5 since 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). But it leads to large quantities of noise from client-side throttling [1], and that throttling is V(3): $ grep -n . vendor/k8s.io/client-go/rest/request.go | grep -B7 -A6 '^597:' 589: switch { 590: case len(retryInfo) > 0: 591: message = fmt.Sprintf("Waited for %v, %s - request: %s:%s", latency, retryInfo, r.verb, r.URL().String()) 592: default: 593: message = fmt.Sprintf("Waited for %v due to client-side throttling, not priority and fairness, request: %s:%s", latency, r.verb, r.URL().String()) 594: } 596: if latency > longThrottleLatency { 597: klog.V(3).Info(message) 598: } 599: if latency > extraLongThrottleLatency { 600: // If the rate limiter latency is very high, the log message should be printed at a higher log level, 601: // but we use a throttled logger to prevent spamming. 602: globalThrottledLogger.Infof("%s", message) 603: } Auditing: $ git --no-pager grep 'klog.V([3-5])' vendor I don't see much that I'd miss. I liked having vendor/github.com/openshift/library-go/pkg/verify's logging back when we made that pivot, but now that that code is old and stable, I'm ok losing it. I've shifted the lib/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V lib) It's mostly hotloop-detection since 40d0a4e (Log object updates and show existing/required diff, 2021-06-03, openshift#561, [2]), and isn't all that noisy since 05e1af7 (Bug 1984414: Log resource diffs on update only in reconcile mode, 2021-07-22, openshift#628, [3]). I've shifted the pkg/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V pkg) It's mostly fairly core stuff, and low-noise except for the per-manifest "Running sync for ..." and "Done syncing for ..." messages [1]. The per-manifest messages can be useful to identify where the CVO is in the sync cycle, so I'm keeping them for now. We may be able to punt them up to higher levels if we get an alternative mechanism for identifying sync cycle positions (e.g. events for task-node completion or task failure). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2034493#c1 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1984414
We've been using --v=5 since 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). But it leads to large quantities of noise from client-side throttling [1], and that throttling is V(3): $ grep -n . vendor/k8s.io/client-go/rest/request.go | grep -B7 -A6 '^597:' 589: switch { 590: case len(retryInfo) > 0: 591: message = fmt.Sprintf("Waited for %v, %s - request: %s:%s", latency, retryInfo, r.verb, r.URL().String()) 592: default: 593: message = fmt.Sprintf("Waited for %v due to client-side throttling, not priority and fairness, request: %s:%s", latency, r.verb, r.URL().String()) 594: } 596: if latency > longThrottleLatency { 597: klog.V(3).Info(message) 598: } 599: if latency > extraLongThrottleLatency { 600: // If the rate limiter latency is very high, the log message should be printed at a higher log level, 601: // but we use a throttled logger to prevent spamming. 602: globalThrottledLogger.Infof("%s", message) 603: } Auditing: $ git --no-pager grep 'klog.V([3-5])' vendor I don't see much that I'd miss. I liked having vendor/github.com/openshift/library-go/pkg/verify's logging back when we made that pivot, but now that that code is old and stable, I'm ok losing it. I've shifted the lib/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V lib) It's mostly hotloop-detection since 40d0a4e (Log object updates and show existing/required diff, 2021-06-03, openshift#561, [2]), and isn't all that noisy since 05e1af7 (Bug 1984414: Log resource diffs on update only in reconcile mode, 2021-07-22, openshift#628, [3]). I've shifted the pkg/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V pkg) It's mostly fairly core stuff, and low-noise except for the per-manifest "Running sync for ..." and "Done syncing for ..." messages [1]. The per-manifest messages can be useful to identify where the CVO is in the sync cycle, so I'm keeping them for now. We may be able to punt them up to higher levels if we get an alternative mechanism for identifying sync cycle positions (e.g. events for task-node completion or task failure). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2034493#c1 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1984414
Show when object gets created or updated with diff