-
Notifications
You must be signed in to change notification settings - Fork 461
Bug 1852047: daemon: Add events before/after all OS changes #1977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1852047: daemon: Add events before/after all OS changes #1977
Conversation
0eed4a7 to
112b782
Compare
112b782 to
4104a18
Compare
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of questions :)
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question, do we have an events somewhere that has what we are upgrading to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
following up on.. myself: in https://github.com/openshift/machine-config-operator/pull/1962/files we have an event that tells us the targeted config but not the OS version - might be nice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can determine the content by looking at the rendered machineconfigs. With the events I'm more interested in timing, not content.
|
adding hold since we are post-FF /hold |
This is part of https://bugzilla.redhat.com/show_bug.cgi?id=1852047 - not a new feature. Let's not go overboard with the holds - I think cleanly split up PRs is better than the alternative that holding would force of having a single gigantic PR with the one BZ. |
I had no idea since the BZ isn't in the title. Please add the related BZ to the PRs so we are allowed to merge post-FF. |
|
@cgwalters: This pull request references Bugzilla bug 1852047, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@kikisdeliveryservice: This pull request references Bugzilla bug 1852047, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/hold cancel |
|
Is Prow actually enforcing BZs right now? It doesn't look like it - if we're truly requiring a separate BZ per PR that heavily penalizes the workflow of having nice clean split up PRs, so hopefully we can just roll with attaching a bunch of PRs to the same BZ. |
I have no problem with multiple PRs on a BZ, the issue was that this PR had no BZ linked whatsoever. |
|
/test e2e-aws |
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this verifiable in the mustgather artifacts? (if so can you link?)
Definitely, I put up my Example invocation on the e2e-gcp-upgrade run from this PR: |
|
(Though that operates on the "extracted" dump that is handled in Prow, not the must-gather which seems to write the events in YAML, but they actually added a custom HTML renderer for events recently which helps searching exactly these types of things) |
|
Maybe we should store scripts to extract data from must-gathers and PRs here in the MCO git? |
4104a18 to
d307774
Compare
|
/retest |
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only thing is we lost the OSImageURL in the event from line 1315 below. can we re-add? otherwise lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per above though with these events it's mostly about timing, so removing the URL was intentional.
I think this really leads into a larger ergnonomic issue in that we should really be including something like a version number in MachineConfigs, and perhaps even include separate versions for our "base" MCs versus anything user provided.
For example for status we'd report All nodes are updated to config v7 (rendered-<md5here>) or so where we increment that counter each time we generate a new rendered MC.
Similarly we should be showing the ostree version number and not just the sha256.
I can re-add it if you really want, I just personally didn't find it useful - the more interesting information is the MachineConfig names which already contains the osimageurl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK the next problem with doing what you're asking is machineConfigDiff doesn't have the target...and it leads into the question of why are we special casing osImageURL in the event? Why not also show the kargs that changed? Why not try to show which systemd units? It wouldn't scale - I think these events should just be about timing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or to rephrase, the data in this PR is more useful because we're seeing if there are karg/unit changes too which we didn't have before - if we care to look up exactly what those are, we can look at the rendered config.
Adding osImageURL would make the event much more noisy when we don't need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair argument, we have other sources to look at what osImageURL is getting used like looking at MCD log or rendered MC. My main motivation of adding osImageURL is that this is the one which we pull in to perform upgrade, extensions and kernel switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK updated with a compromise - we keep the old event with the osimageurl and only emit it if we're pulling the oscontainer, while adding two new before/after events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im cool with the compromise :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im ok with the compromise :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apparently im doubly ok with it 😆
|
Doesn't have strong opinion but slightly agree with Kirsten on retaining OSImageURL information in event. Otherwise, lgtm |
We had an event when we were starting an OS update, but nothing when it was completed - one could implicitly get that by looking at the next event, but that's a bit fragile. And since then we started doing a lot more stuff with the OS, so let's add an event emitted before and after all OS changes so we can consistently get e.g. timing information about it. Relates to openshift#1962 around getting better data about timing during upgrades.
d307774 to
df727ce
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, kikisdeliveryservice, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/skip |
|
/test e2e-aws |
|
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@cgwalters: Some pull requests linked via external trackers have merged: openshift/machine-config-operator#1977, openshift/machine-config-operator#1971. The following pull requests linked via external trackers have not merged:
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We had an event when we were starting an OS update, but nothing
when it was completed - one could implicitly get that by looking
at the next event, but that's a bit fragile.
And since then we started doing a lot more stuff with the OS,
so let's add an event emitted before and after all OS changes
so we can consistently get e.g. timing information about it.
Relates to #1962
around getting better data about timing during upgrades.