Skip to content

Conversation

@sinnykumari
Copy link
Contributor

During firstboot machine-config-daemon running on host will read
/etc/ignition-machine-config-encapsulated.json file available on
node, process it and applies MachineConfig provided like
kernelArguments

During firstboot machine-config-daemon running on host will read
/etc/ignition-machine-config-encapsulated.json file available on
node, process it and applies MachineConfig provided like
kernelArguments
@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 24, 2019
@sinnykumari
Copy link
Contributor Author

/hold
This will go in once we have latest RHCOS bootimage available - openshift/installer#2547

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 24, 2019
@runcom
Copy link
Member

runcom commented Oct 25, 2019

/retest

can't wait to test this out 👍

@ericavonb
Copy link

/retest

@runcom
Copy link
Member

runcom commented Oct 28, 2019

/skip

@runcom
Copy link
Member

runcom commented Oct 28, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 28, 2019
@runcom
Copy link
Member

runcom commented Oct 29, 2019

I'm about to drop the hold once I run a 4.1->4.2 upgrade

@runcom
Copy link
Member

runcom commented Oct 29, 2019

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 29, 2019
Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Tested with manual payload on latest installer, the kargs is correctly applied and only 1 reboot is incurred due to pivot. Great work!

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: runcom, sinnykumari, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 29, 2019

@sinnykumari: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 f5c2c8a link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit e805081 into openshift:master Oct 29, 2019
@sinnykumari
Copy link
Contributor Author

Yay! Thank you all for testing and getting it merged.

@cgwalters
Copy link
Member

cgwalters commented Apr 30, 2020

I think this may have broken upgrade + scaleup from all "born 4.2" clusters (i.e. 4.2 bootimages) to 4.4, because 4.2 doesn't have /usr/libexec/machine-config-daemon firstboot-complete-machineconfig:

Apr 29 23:36:22 mffied-8v28p-w-c-z9vll.c.openshift-qe.internal machine-config-daemon[1306]: Error: unknown command "firstboot-complete-machineconfig" for "machine-config-daemon"

It won't affect 4.1 bootimages because that has pivot.service baked into the OS, which ends up running. It won't affect 4.3 (or later) bootimages because that has firstboot-complete-machineconfig (which is likely what we tested with in this PR).

What I mean by "upgrade + scaleup" is that the "born 4.2" cluster is upgraded to 4.4 while retaining 4.2 bootimages, and from there all further worker boots will break.

@cgwalters
Copy link
Member

cgwalters commented Apr 30, 2020

OK sketching out an outline for a fix:

  • Add infrastructure to in the MCO to look at clusterversion and determine if a cluster was "born" at 4.2
  • Change the MCS to look at the cluster "born version" and if it's 4.2, change the Ignition it serves so that we basically s/firstboot-complete-machineconfig/pivot/ in this template

One should be able to reproduce this problem btw by creating a new machineset even in a "born 4.4" cluster that manually points to the 4.2 bootimages. Which...is something we should probably change our CI to do right now for at least one cloud; for each of {4.1, 4.2, 4.3, ...} test having a worker with that bootimage version join the cluster.

@cgwalters
Copy link
Member

We might also be able to inspect the User-Agent version...I'm digging into that now.

cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Apr 30, 2020
It's super useful to see the bootimage version for debugging
things like https://bugzilla.redhat.com/show_bug.cgi?id=1829642
AKA
openshift#1215 (comment)
@cgwalters cgwalters mentioned this pull request Apr 30, 2020
@runcom
Copy link
Member

runcom commented Apr 30, 2020

We might also be able to inspect the User-Agent version...I'm digging into that now.

with the user-agent, it would become just a matter of serving the modified template w/o extra plumbing 🤔

@cgwalters
Copy link
Member

Oooh, or I bet it would work if we generated two systemd units with conditionals! Going to do a PR to sketch that out.

cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Apr 30, 2020
This is aiming to fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1829642
AKA
openshift#1215 (comment)

Basically we have our systemd units dynamically differentiate between
"4.2" and "4.3 or above" by looking at the aleph version.
@cgwalters
Copy link
Member

#1706

ashcrow pushed a commit to ashcrow/machine-config-operator that referenced this pull request Apr 30, 2020
This is aiming to fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1829642
AKA
openshift#1215 (comment)

Basically we have our systemd units dynamically differentiate between
"4.2" and "4.3 or above" by looking at the aleph version.
vrutkovs pushed a commit to vrutkovs/machine-config-operator that referenced this pull request Jun 5, 2020
It's super useful to see the bootimage version for debugging
things like https://bugzilla.redhat.com/show_bug.cgi?id=1829642
AKA
openshift#1215 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants