Skip to content

Conversation

@mkenigs
Copy link

@mkenigs mkenigs commented Feb 25, 2022

First 6 commits factor out code from update(). The 7th calls all the factored out code in experimentalUpdateLayeredConfig

@mkenigs mkenigs marked this pull request as draft February 25, 2022 22:47
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check if this is nil? Looks like we do it in setWorking but not updateConfigAndState. I think we also might need to add some more of the code from updateConfigAndState

@mkenigs
Copy link
Author

mkenigs commented Feb 28, 2022

@cgwalters do we need to set the scheduler to BFQ before rebasing to an image? Also why don't we currently undo setting the scheduler to BFQ? Does it reset on reboot? What about for liveapply?

@cgwalters
Copy link

@cgwalters do we need to set the scheduler to BFQ before rebasing to an image? Also why don't we currently undo setting the scheduler to BFQ? Does it reset on reboot? What about for liveapply?

The BFQ thing was basically a failure on my part. I think actually layering will itself help significantly here because we will avoid transiently writing large amounts of data to disk. We can probably just drop BFQ then.

Does it reset on reboot?

Yes.

@jkyros
Copy link
Owner

jkyros commented Mar 2, 2022

So I cleaned some stuff up into constants in update.go instead of using the raw annotations and added some SetDone calls so the pool would stay happy after rebase, but for the most part I stayed out of update.go. (You can kind of see where I was going over here, I didn't want to conflict with you

if err := dn.nodeWriter.SetDone(dn.kubeClient.CoreV1().Nodes(), dn.nodeLister, dn.name, desiredConfig); err != nil {
)

Is this in a spot where I can marge your commits commits here and then we can stitch it back together for a demo tomorrow or you want me to hold off?

@mkenigs mkenigs force-pushed the integrate-update branch from 3507386 to 350dc62 Compare March 3, 2022 04:44
@mkenigs
Copy link
Author

mkenigs commented Mar 3, 2022

Okay I think this should be ready to merge if you don't mind some broken things. I certainly haven't tested it. Not sure how thoroughly you can review before demoing but would love if you at least glance at the second to last commit (the one that touches experimentalUpdateLayeredConfig(). Things to keep working on:

  1. Disable config drift monitor
  2. Un revert your commit. The last commit I have should probably be squashed with the reverted one. I still have some questions about how some of the annotation stuff works.
  3. (also related to the annotations) handling setting done after we reboot?
  4. Fix unit tests

@jkyros
Copy link
Owner

jkyros commented Mar 3, 2022

I tried to merge it here: https://github.com/jkyros/machine-config-operator/tree/mcbs-minimize-mcd-try-merge

It built, but the cluster failed to boostrap. I will be honest though I didn't give it a ton of review before I did that -- it was more "eh, let's see what happens!". I stayed up too late -- I'll look at it tomorrow morning :)

@cgwalters
Copy link

Is the plan to push this to the layering branch of the MCO main git?

@mkenigs
Copy link
Author

mkenigs commented Mar 3, 2022

I think so

return err
}

//defer os.Unlink("/run/ostree/auth.json")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how much this matters, but remember everything in /run goes away on reboot. It may be simpler to just leave it there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to leave it there if we apply live? Agreed probably doesn't matter much

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it there in case I wanted to horse with it manually -- I figured if I had to "drive it out of the ditch" because I broke it, I didn't want to have to go get credentials too :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any risk to leaving credentials lying around? If not I'll take out the removal I put back in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a real risk here, but we should be sure the file ownership/permissions are root:root and 0600.

Comment on lines -2114 to +2128
var targetNamespace = "openshift-machine-config-operator"
var targetNamespace = ctrlcommon.MCONamespace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good cleanup we could factor out into a prep patch now and merge to main/master.

return rawOut, nil
}

type Deployment struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm one thing we could do is get a start on coreos/rpm-ostree#2389 by extracting the current MCO code to a logically separate package (e.g. pkg/rpmostree), but keep it in the MCO git to start.

@jkyros
Copy link
Owner

jkyros commented Mar 4, 2022

Is the plan to push this to the layering branch of the MCO main git?

Yeah, unless you have a better idea. :)

We need somewhere where the merge/review bar is lower than our main branch and we'd be willing to tolerate the code being in some "imperfect intermediate states" on occasion -- somewhere where "I don't know that I'm totally sold on this, but it works for now" is okay.

(That, and I'd like this to be accessible to where someone can build/play in the working poc without having to build using code from 7 different open PRs because we're still haggling about the details )

I know that comes with certain risks of too many changes stacking up in the branch, etc so we'll have to pay attention and adjust if necessary.

@mkenigs
Copy link
Author

mkenigs commented Mar 4, 2022

I know that comes with certain risks of too many changes stacking up in the branch, etc so we'll have to pay attention and adjust if necessary.

Will we be able to repeatedly/quickly merge master into layering? E.g. if PRs like openshift#2982 or openshift#2973 go into master

@cgwalters
Copy link

We need somewhere where the merge/review bar is lower than our main branch and we'd be willing to tolerate the code being in some "imperfect intermediate states" on occasion -- somewhere where "I don't know that I'm totally sold on this, but it works for now" is okay.

💯 on this!

Will we be able to repeatedly/quickly merge master into layering?

I think so...if that turns out to be annoying, we will figure out how to make it less annoying.

mkenigs added 13 commits March 7, 2022 19:25
This code will be needed for layered updates
Consolidate all code that will be needed for layered updates into
calculatePostConfigChangeActionFromFiles

Add a reboot file exception for ssh keys to preserve current behavior of
not rebooting for passwd changes
isDrainRequired needs to support multiple types of file content, since
layered updates will require ostree cat'ing a file while legacy updates
read the contents directly from MachineConfigs. This is accomplished by
passing isDrainRequired functions to read content from old and new files
This code will be needed for layered updates
This will keep layered and non-layered update logging consistent
Layered updates will only need to perform post config change actions.

Comments about uncordoning the node and rebooting immediately were
dropped. Are these comments out of date? Was code dropped or am I
missing something?
Add a comment to Rebase
Write pull secret to file defined in new constant ostreeAuthFile
Don't return changed since there should be an error if nothing changed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants