Skip to content

Conversation

@mkenigs
Copy link
Contributor

@mkenigs mkenigs commented Mar 7, 2022

Various refactors that will be helpful for sharing functionality between a legacy and layering update flow. Opening this against master per discussion in #2982 (comment). I would like to get a CI run because I'm having trouble launching a cluster with code we're trying to get into layering

@mkenigs mkenigs changed the title Integrate update master Factor out functionality from update() that will be needed for layering Mar 7, 2022
@openshift-ci openshift-ci bot requested review from jkyros and sinnykumari March 7, 2022 13:00
Comment on lines -117 to -121
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these comments out of date? Or am I misunderstanding how error handling works here? It looks like it just returns errors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that those bits got moved out? Though honestly I am having trouble following all of the flow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I think I did that here: d200340

Without properly updating the comment

@mkenigs mkenigs marked this pull request as draft March 7, 2022 15:58
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 7, 2022
@mkenigs mkenigs force-pushed the integrate-update-master branch from 7667b1e to 47d9151 Compare March 7, 2022 16:13
@mkenigs
Copy link
Contributor Author

mkenigs commented Mar 7, 2022

/test e2e-gcp-op

@mkenigs
Copy link
Contributor Author

mkenigs commented Mar 7, 2022

/test e2e-gcp-op

@mkenigs
Copy link
Contributor Author

mkenigs commented Mar 7, 2022

/test e2e-gcp-op

@mkenigs mkenigs force-pushed the integrate-update-master branch from f7985d4 to 1b8e874 Compare March 14, 2022 13:30
@mkenigs
Copy link
Contributor Author

mkenigs commented Mar 14, 2022

/test e2e-gcp-op

@mkenigs mkenigs marked this pull request as ready for review March 14, 2022 17:06
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 14, 2022
@openshift-ci openshift-ci bot requested a review from yuqi-zhang March 14, 2022 17:11
@mkenigs mkenigs force-pushed the integrate-update-master branch from 1b8e874 to 4da81ce Compare March 14, 2022 20:03
@mkenigs
Copy link
Contributor Author

mkenigs commented Mar 15, 2022

e2e-gcp-op hit the total test timeout, but I had a passing run before a minor change so don't expect there's an actual problem

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks sane to me; I am a bit less confident in the later changes just because of the complexity of the code. But, I think a lot of this is covered by our e2es.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Unrelated to your PR, but this appears to be another instance where actually what we want to test is "rhel8 vs rhel9 vs fedora N", not "fcos vs rhcos"; xref coreos/fedora-coreos-config#1405 (comment) etc.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks totally fine to me as is, but I'm thinking we may want to make this an interface in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -117 to -121
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that those bits got moved out? Though honestly I am having trouble following all of the flow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth converting at least one user in this commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted almost everything

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 15, 2022
@mkenigs mkenigs force-pushed the integrate-update-master branch from 4da81ce to b649419 Compare March 15, 2022 14:50
Copy link
Member

@cheesesashimi cheesesashimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a couple minor questions and suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this as well.

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. I think it would be helpful for me at least if I can understand why some of these refactors are needed, for example:

  1. Why does calculatePostConfigChangeAction need to be consolidated?
  2. What is the extra FCOS checks added for?
  3. Why do we need to move isDrainRequired -> drainIfRequired

As a counter example, the explanation around readFileFunc is very nice

Comment on lines -117 to -121
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I think I did that here: d200340

Without properly updating the comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused at the use of skipReboot here. This is only for onceFrom right? Why is this specifically handled here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed adding this condition originally and it was causing bootstrap to fail. Looking at the old version of performPostConfigChangeAction it calls dn.reboot which I initially assumed would never return. But if dn.skipReboot == true, dn.reboot does return, and performPostConfigChangeAction returns early before updating state. So I had to add this to preserve that behavior. I can't say I fully understand why that's necessary because how state is managed is pretty unclear to me

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this PR is a bit overreaching. While it's a "prep" pr, it has some more trivial changes that are easy to merge for ex namespace constant but then other parts that are much less trivial and need more review/discussion/understanding/explanation. I think combining them all into one PR makes for difficult review and discussion and think we should consider breaking this PR up a bit (for ex along the lines of Jerry's questions above could be a start).

@kikisdeliveryservice kikisdeliveryservice added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed layering labels Mar 15, 2022
@mkenigs mkenigs force-pushed the integrate-update-master branch from b649419 to 1458d39 Compare March 15, 2022 21:09
@mkenigs
Copy link
Contributor Author

mkenigs commented Mar 15, 2022

  1. Why does calculatePostConfigChangeAction need to be consolidated?

Added a much more detailed commit message (c4b5dcf) Does that clear it up?

  1. What is the extra FCOS checks added for?

The path for ssh keys is different on FCOS. Some of the explanation I added in the commit message should give some more context

  1. Why do we need to move isDrainRequired -> drainIfRequired

Per @cheesesashimi 's feedback I just wrote a wrapper function. I'm just anticipating de-duplicating between legacy and layered update.

@mkenigs
Copy link
Contributor Author

mkenigs commented Mar 16, 2022

Per @kikisdeliveryservice's comments have opened:
#3020
#3021
#3022

@mkenigs mkenigs changed the base branch from master to layering March 17, 2022 16:07
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 17, 2022
@mkenigs mkenigs marked this pull request as draft March 21, 2022 22:27
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 21, 2022
@mkenigs
Copy link
Contributor Author

mkenigs commented Mar 21, 2022

Will wait for other PRs to merge before revisiting commits left here

@mkenigs mkenigs force-pushed the integrate-update-master branch from 1458d39 to c1901f1 Compare March 21, 2022 22:28
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 21, 2022
@mkenigs mkenigs force-pushed the integrate-update-master branch from c1901f1 to a124451 Compare March 22, 2022 14:16
mkenigs added 4 commits March 30, 2022 08:09
This code will be needed for layered updates
isDrainRequired needs to support multiple types of file content, since
layered updates will require ostree cat'ing a file while legacy updates
read the contents directly from MachineConfigs. This is accomplished by
passing isDrainRequired functions to read content from old and new files
This code will be needed for layered updates
Layered updates will only need to perform post config change actions.

Comments about uncordoning the node and rebooting immediately were
dropped as they have been out of date since
d200340
@mkenigs mkenigs force-pushed the integrate-update-master branch from a124451 to c1529b3 Compare March 30, 2022 12:09
@mkenigs mkenigs marked this pull request as ready for review March 30, 2022 12:11
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 30, 2022
@mkenigs
Copy link
Contributor Author

mkenigs commented Mar 30, 2022

@kikisdeliveryservice is this small enough now or should I split it again?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 30, 2022

@mkenigs: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-disruptive 1458d39df9dac81b728fb3c24fff3e20c50c2024 link false /test e2e-aws-disruptive
ci/prow/e2e-vsphere-upgrade 1458d39df9dac81b728fb3c24fff3e20c50c2024 link false /test e2e-vsphere-upgrade
ci/prow/e2e-aws-upgrade-single-node 1458d39df9dac81b728fb3c24fff3e20c50c2024 link false /test e2e-aws-upgrade-single-node
ci/prow/okd-e2e-aws 1458d39df9dac81b728fb3c24fff3e20c50c2024 link false /test okd-e2e-aws
ci/prow/e2e-aws-upgrade c1529b3 link false /test e2e-aws-upgrade
ci/prow/e2e-gcp-op-single-node c1529b3 link false /test e2e-gcp-op-single-node
ci/prow/e2e-gcp-single-node c1529b3 link false /test e2e-gcp-single-node

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cgwalters
Copy link
Member

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2022
@cgwalters
Copy link
Member

Since this is targeting layering, I don't see any blockers to merge.
If there's stuff to address we can always do followups.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 31, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, mkenigs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 0ab40fc into openshift:layering Mar 31, 2022
@mkenigs
Copy link
Contributor Author

mkenigs commented Mar 31, 2022

/cherry-pick master

@mkenigs mkenigs deleted the integrate-update-master branch March 31, 2022 19:57
@openshift-cherrypick-robot

@mkenigs: #2987 failed to apply on top of branch "master":

Applying: update.go: factor out setWorking()
Using index info to reconstruct a base tree...
M	pkg/daemon/update.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/daemon/update.go
CONFLICT (content): Merge conflict in pkg/daemon/update.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 update.go: factor out setWorking()
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick master

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants