-
Notifications
You must be signed in to change notification settings - Fork 462
Don't enable machine-config-daemon-host.service by default #1366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't enable machine-config-daemon-host.service by default #1366
Conversation
|
I didn't test that we're running both today yet, but @sinnykumari said she'd seen it. Investigating too. |
|
Logic makes sense to me, we'd have to backport this to 4.3 right? |
|
Oh yeah, they're totally racing today. Look at the master journal here from an aws run on the main openshift-release (not this PR): We're pulling the image twice, etc! Eww! Let's get this in! |
|
Hmm, I think what's going on here is just that the Notice the timestamps matching; I think it's just from the |
|
IOW, One thing we need to make sure though is that the |
OK you really made me pause here and verify. But no. The firstboot service doesn't do the "systemctl start" dance. That bit is only in the MCD. (When I say "MCD", I mean the thing running as a daemonset) Look at e.g. Both services were started up quite early on. |
Hmm OK, that's weird. Reading the code, here's the stack I see:
right? (But full disclosure, I didn't actually run this code.)
Ahh right, that's what I mean by:
I.e. it seems like we should just have the And then also we can have |
|
OK I think you're right and I was wrong - the "don't systemctl in -firstboot" was something I had planned to do and had in my head we were doing. We're still starting both services though. One thing I quickly checked is that So...in the current state, in fact I think the race here is that -firstboot might be killed by the reboot initated by
Yeah I think so. |
e49c1fa to
a086336
Compare
|
/approve |
Today the MCS serves both `/etc/pivot/image-pullspec` *and* `/etc/ignition-machine-config-encapsulated.json` - and we have to serve both so that using 4.1 bootimages works. However, we absolutely shouldn't start *both* services on a modern 4.3+ system on firstboot - they will race, both trying to update, and one doing a reboot. Only enable the `-firstboot.service` which will itself currently invoke `-host.service`, though in the future we should clean this up by having the firstboot code run rpm-ostree *directly*, since we're already on the host. Motivated by discussions around RT kernel handling.
a086336 to
57e4c67
Compare
|
Bumped to force retesting. |
|
/hold |
|
/retest |
|
Keeping m-c-d-host disabled seems right way to handle this. Verified this by looking at service log and their timestamps on a local cluster. During firstboot, m-c-d-host service doesn't get started by default. It gets started by m-c-d-firstboot service /lgtm |
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, jlebon, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@cgwalters Shall we get this merged? |
|
/hold cancel |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
25 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@cgwalters: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Also, add machine-config-daemon-firstboot-v42.service so that new nodes through machine-api comes up as expected on a 4.2 to 4.3 upgraded cluster Backport of PRs: - openshift#1366 - openshift#1706
Today the MCS serves both
/etc/pivot/image-pullspecand/etc/ignition-machine-config-encapsulated.json- and we haveto serve both so that using 4.1 bootimages works.
However, we absolutely shouldn't start both services on a modern
4.3+ system on firstboot - they will race, both trying to update,
and one doing a reboot.
Only enable the
-firstboot.servicewhich will itself currentlyinvoke
-host.service, though in the future we should clean thisup by having the firstboot code run rpm-ostree directly, since
we're already on the host.
Motivated by discussions around RT kernel handling.