-
Notifications
You must be signed in to change notification settings - Fork 462
Add ContainerRuntime CRD and Controller #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@umohnani8 Thanks for starting some work on docs! Could you retitle this PR "docs: Add ContainerRuntime.." so people don't worry there was some other changes in the PR like I did? :) |
|
/cc @rphillips |
|
@kikisdeliveryservice We are planning to add all the code to this PR before it merges :) @umohnani8 started out with the docs initially to make sure design is fine before we start writing code. |
|
Thanks for the info, @mrunalp ! |
|
@umohnani8 @sjenning and I are considering putting this into the KubeletConfig CR. Do you think this could fit into the KubeletConfig controller within the MCO? |
|
Did something change since we last discussed this on Thursday? 🙂
… On Jan 21, 2019, at 11:01 AM, Ryan Phillips ***@***.***> wrote:
@umohnani8 @sjenning and I are considering putting this into the KubeletConfig CR. Do you think this could fit into the KubeletConfig controller within the MCO?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
I doubt it changed 😃. I just didn't hear about it. |
|
Okay 😃 we decided to make this separate from the kubelet cr so it isn't too tied to Linux.
… On Jan 21, 2019, at 11:53 AM, Ryan Phillips ***@***.***> wrote:
I doubt it changed 😃. I just didn't hear about it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
After talking with Clayton yesterday [issue], the crio.conf pause image needs to be customized as well with the quay.io/openshift/origin-pod:v4.0-[cvo versioned pod image]. Seems like the pause image could be rolled into this design/code-change as well. |
|
Yeah, I discussed with @umohnani8 yesterday; we will include the pod infra image as part of this work. |
|
cc: @chrisnegus for doc updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of this seems to still refers to KC or kubeconfig right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, thanks! I missed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: s/machineconfig pool/containerruntime config/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
docs/ContainerRuntimeConfigDesign.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: well, the MCD is not explicitly "instructed" (I think this is what #360 is suggesting). The MCD could in the future become smarter and not reboot for things it knows don't require a reboot.
docs/ContainerRuntimeConfigDesign.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a stray * here? Or did this refer to a footnote before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was a stray, thanks! Fixed
docs/ContainerRuntimeConfigDesign.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, why 99 here? This just has to be > 00 to override the base templates, right? The connotation with "99" is that no MC should override these files but we want to have that flexibility, right? (Of course, one could always have a "999", but that doesn't look as nice :)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it just has to be > 01 to override the base templates. Saw 99 being used, so just followed that. I can change it to 02.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose 99- for a managed MachineConfig. The default crio.conf should probably be at 01- to match the kubelet conf default machineconfig.
docs/ContainerRuntimeConfigDesign.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add infra_image to this list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it is more per machine set, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think configuring overlay is distinct from machine api. We also need this in UPI modes where machines are not necessarily present.
docs/ContainerRuntimeConfigDesign.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this overlap with the existing global config fields for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fields right now are scoped to build time behavior and not runtime behavior. It’s possible we should collapse to a common config for both and have MCO render from that global config. I could see a benefit in letting us configure this at pool level. In particular, we could have a whitelist registry setting that says only images that run on control planes should come from this source.
docs/ContainerRuntimeConfigDesign.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this becomes a kube level setting in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubelet settings apply at the pod level while this setting is at the container level. We can advise admins to not set this or keep this unbounded once kubelet setting lands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubelet setting is pod level.
We backported the flag to set a default per pod pod limit to 3.11+ from 1.14 to address production issues.
Robert is working on enable node to pod pod isolation.
SIG node kep I wrote proposes a granular pod limit feature as a potential policy knob in future but at pod and not Container boundary. If we do go to Container boundary in kubelet, crio value is default absent kubelet telling it a specific value.
docs/ContainerRuntimeConfigDesign.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would an end user set the log level (vs when we tell someone to set it during debugging)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more for admins enabling it for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to me an issue for debugging , and wanting to canary a pool of nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a canary pool the mco could easily select just those for debugging (but not sure we want this exposed to everyone like this risking to enable debug on every node).
|
@umohnani8 one last comment, and then could you also squash your commits? |
changes made as requested, will allow others to approve.
|
Many thanks @umohnani8 the updated description & commit are great. Well done! |
|
@runcom you want her to squash the 2 into 1? We usually try to keep vendor file changes in their own commit. |
|
@umohnani8 I talked to @runcom and don't sqush your commits just update that Generated commit to something like "add auto-generated files & vendor bump" |
|
What Kirsten said ^^ |
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Add a new CRD and Controller that allows users to configure certain options in crio.conf and storage.conf. A template is used for the default values that can be changed with a CR. When the CR is deleted, the values are reverted back to its defaults. The following options can be configured in /etc/crio/crio.conf - pidsLimit - logLevel - logSizeMax - infraImage The following options can be configured in /etc/containers/storage.conf: - overlaySize Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
|
@runcom this should be ready to merge once tests are green. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: runcom, umohnani8 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-aws-op |
|
/test e2e-aws |
|
/test e2e-aws-op |
| path: "/etc/crio/crio.conf" | ||
| contents: | ||
| inline: | | ||
| # The "crio" table contains all of the server options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is overwriting the version from the RPM? Is it different?
My 2¢ is that it's best to have defaults live in the code, documentation for options in a man page or so.
Not objecting to this, we don't need to remove the lgtm but if this isn't different from the RPM version I'm not sure what we're gaining by writing it here again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made sure to match this to the one coming in from the RPM. But yeah this overrides the RPM one.
|
/test e2e-aws |
|
/test e2e-aws-op |
…atting
Two changes:
* Don't format the additional arguments on success. Callers set this
up with strings like:
could not generate the original Kubelet config: %v
that are only appropriate to the error condition. Exposing
additional details about non-error conditions would be nice, but
seems awkward without a broader refactoring.
* Fix the args[:1] -> args[1:] when picking the arguments to pass in
to get formatted. The previous implementation would use args[0] as
both the format string and the first argument to that format string.
Both of these issues date back to 1944fb2 (add status update,
2019-01-17, openshift#323).
Also bump the not-very-DRY duplicate in
pkg/controller/container-runtime-config/helpers which was added in
74ae3b3 (Add ContainerRuntime CRD and Controller, 2019-01-18, openshift#330),
although I haven't copied the unit tests over for that function copy.
I also haven't looked to see how many of these helpers should be
pulled out into a shared helper package.
The machine-config operator had a bug where MachineConfig entries lead
the machine-config daemon (MCD) to lay down a storage.conf that
exactly matched the content installed by the containers-common RPM.
On update, the RHCOS machine pivots to a new OSTree image (defined in
the machine-os-content image referenced from the release image).
Seeing storage.conf content that matched the old OSTree image,
libostree replaced storage.conf with the version defined in the new
OSTree image [1]. Then, when the MCD comes back up post-pivot, it
sees the divergent storage.conf content and freaks out with logs like
[2]:
E1210 16:15:51.105286 11181 daemon.go:1350] content mismatch for file /etc/containers/storage.conf:
and the machine-config operator goes Degraded=True with
RequiredPoolsFailed "nodes are reporting degraded status on sync" [3].
The narrow machine-config fix was to annotate storage.conf that it
writes, libostree doesn't touch the files on pivot [4]. This
addresses the storage.conf case, but leaves the MCD vulnerable to
other instances of "MCD writes exactly the OSTree contents to $FILE
and expects it to remain untouched during an OSTree pivot that bumps
the file". I'm not aware of a generic fix at the moment, although [5]
might be related. You can guard a cluster against the narrow bug by
setting a MachineConfig [6] or higher level object such as a
ContainerRuntimeConfig [7] that will cause the MCD to write a
storage.conf that diverges (even just by a comment or whitespace) from
the OSTree original.
Tracking the narrow fix through the various z streams:
The 4.1 machine-config bug was introduced in d2c44d7 [8], which landed
before 4.1.0-rc.0:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.0-rc.0 | grep machine-config
machine-config-controller https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
machine-config-daemon https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
machine-config-operator https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
machine-config-server https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
setup-etcd-environment https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
$ git --no-pager log --oneline --first-parent de9998eb37 | grep d2c44d7
d2c44d7c Merge pull request openshift#330 from umohnani8/runtime
The 4.1 machine-config fix was [9], landed in 1301934 [10], which is
new in 4.1.34:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.34-x86_64 | grep machine-config
machine-config-controller https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
machine-config-daemon https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
machine-config-operator https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
machine-config-server https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
setup-etcd-environment https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.31-x86_64 | grep machine-config
machine-config-controller https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
machine-config-daemon https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
machine-config-operator https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
machine-config-server https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
setup-etcd-environment https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
$ git --no-pager log --oneline --first-parent -2 f56d736e74a
f56d736e (origin/release-4.1) Merge pull request openshift#1147 from openshift-cherrypick-robot/cherry-pick-1114-to-release-4.1
1301934a Merge pull request openshift#1382 from vrutkovs/4.1-containers-conf-generated
The 4.2 machine-config fix was [2], landed in bd358bb [11], which is new
in 4.2.18:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64 | grep machine-config
machine-config-operator https://github.com/openshift/machine-config-operator 31fed93186c9f84708f5cdfd0227ffe4f79b31cd
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64 | grep machine-config
machine-config-operator https://github.com/openshift/machine-config-operator 9366460085b2a24d825380759f554769ec5ab4f9
$ git --no-pager log --oneline --first-parent -2 9366460085
93664600 Merge pull request openshift#1362 from rphillips/fixes/1787581_4.2
bd358bb7 Merge pull request openshift#1323 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.2
The 4.3 machine-config fix was [12], landed in 9fd53bd [13], which
landed early enough for 4.3.0-rc.0:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 | grep machine-config
machine-config-operator https://github.com/openshift/machine-config-operator 23a6e6fb37e73501bc3216183ef5e6ebb15efc7a
$ git --no-pager log --oneline --first-parent -8 23a6e6fb37
23a6e6fb Merge pull request openshift#1348 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.3
80c8aed7 Merge pull request openshift#1343 from retroflexer/cherry-pick-backup-restore-kube-static-resources
269990a3 Merge pull request openshift#1344 from openshift-cherrypick-robot/cherry-pick-1296-to-release-4.3
fd3ca395 Merge pull request openshift#1338 from runcom/fix-go-mod
ba304dbb Merge pull request openshift#1333 from openshift-cherrypick-robot/cherry-pick-1278-to-release-4.3
787f3fa9 Merge pull request openshift#1332 from runcom/reserved-cpus-4.3
2b85d6ba Merge pull request openshift#1329 from openshift-cherrypick-robot/cherry-pick-1314-to-release-4.3
9fd53bd5 Merge pull request openshift#1322 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.3
The 4.4 machine-config fix was [3] which has landed before any 4.4 RCs
have been cut. Even in 4.4, the generated note was the first content
touch to this template:
$ git --no-pager log --oneline --follow origin/release-4.4 -- templates/common/_base/files/container-storage.yaml
46c4e27a (origin/pr/1320) templates/container-storage: Add a "this is generated" note
47a6321c templates: Move container-storage.yaml into common/
74ae3b31 (origin/pr/330) Add ContainerRuntime CRD and Controller
(47a6321c was a pure rename).
So the MCD has been annotating storage.conf since 4.1.34, 4.2.18, and
all 4.3 and later releases. When has the RPM-installed storage.conf
changed? Figuring this part out is a bit awkward, because we need to
drill down machine-os-content -> RHCOS -> RPM -> file. For example,
from 4.2.16 -> 4.2.18 [14]:
$ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64) | jq -r .config.config.Labels.version
42.81.20200114.0
$ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64) | jq -r .config.config.Labels.version
42.81.20200203.1
$ ./differ.py --first-endpoint art --first-version 42.81.20200114.0 --second-endpoint art --second-version 42.81.20200203.1 | jq -r '.diff | keys | sort[]'
cri-o
ignition
libarchive
machine-config-daemon
openshift-clients
openshift-hyperkube
sqlite-libs
storage.conf is managed by the containers-common RPM, so no change
from 4.2.16 to 4.2.18, and that update will safely pull in the fixed
MCD without a surprising pivot change. Here are our changes to the
RPM across the various z streams:
$ for OCP in 4.1.1 4.1.23 4.1.24 4.1.31-x86_64 4.1.34-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.1/${RHCOS}/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done
410.8.20190606.0 0.1.32 4.1.1
410.8.20191030.0 0.1.32 4.1.23
410.81.20191112.2 0.1.37 4.1.24
410.81.20200114.0 0.1.37 4.1.31-x86_64
410.81.20200204.1 0.1.40 4.1.34-x86_64
$ for OCP in 4.2.0-rc.0 4.2.2 4.2.4 4.2.16-x86_64 4.2.18-x86_64 4.2.19-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.2/${RHCOS}/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done
42.80.20190930.1 0.1.32 4.2.0-rc.0
42.80.20191022.0 0.1.32 4.2.2
42.81.20191107.0 0.1.37 4.2.4
42.81.20200114.0 0.1.37 4.2.16-x86_64
42.81.20200203.1 0.1.37 4.2.18-x86_64
42.81.20200210.0 0.1.40 4.2.19-x86_64
$ for OCP in 4.3.0-rc.0-x86_64 4.3.3-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.3/${RHCOS}/x86_64/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done
43.81.202001072253.0 0.1.40 4.3.0-rc.0-x86_64
43.81.202002170853.0 0.1.40 4.3.3-x86_64
Fetching a source RPM for containers-common, e.g. from [15,16] shows
the source packages coming from skopeo. Checking [17]:
$ git --no-pager log --follow --oneline --stat=200 -M50% -- vendor/github.com/containers/storage/storage.conf
afaa9e7f Bump github.com/containers/storage from 1.15.1 to 1.15.2
vendor/github.com/containers/storage/storage.conf | 3 ---
1 file changed, 3 deletions(-)
39ff039b Image encryption/decryption support in skopeo
vendor/github.com/containers/storage/storage.conf | 44 +++++++++++++++++++++++++-------------------
1 file changed, 25 insertions(+), 19 deletions(-)
05ae513b Bump github.com/containers/buildah from 1.8.4 to 1.11.4
vendor/github.com/containers/storage/storage.conf | 7 -------
1 file changed, 7 deletions(-)
700b3102 update github.com/containers/{image,storage}
vendor/github.com/containers/storage/storage.conf | 8 ++++++++
1 file changed, 8 insertions(+)
033b2902 migrate to go modules
vendor/github.com/containers/storage/storage.conf | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 130 insertions(+)
$ git --no-pager log --follow --oneline --stat=200 -M50% 033b2902^ -- contrib/storage.conf
fe259105 add storage.conf and manpage in contrib/
contrib/storage.conf | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
$ for HASH in fe259105 033b2902 700b3102 05ae513b 39ff039b afaa9e7f; do git describe --contains "${HASH}"; done
v0.1.29~3^2
v0.1.38~14^2~2
v0.1.39~1
v0.1.41~25^2
v0.1.41~21^2
v0.1.41~12^2
So changes may have been made in 0.1.29 (when the file landed for the
first time, likely from wherever we store post-Git patches), and were
likely made in 0.1.38, 0.1.39, and 0.1.41.
Comparing with our machine-os-content, that means vulnerable
transitions are:
* 4.1.* -> 4.1.34, since 4.1.31 -> 4.1.34 takes containers-common from
0.1.37 to 0.1.40, picking up the v0.1.38~14^2~2 and v0.1.39~1 bumps.
There may be no safe way to get to 4.1.34.
* 4.1.* -> 4.2... FIXME
* 4.2.16 and earler -> 4.2.19, since 4.2.18 -> 4.2.19 takes
containers-common from 0.1.37 to 0.1.40, picking up the
v0.1.38~14^2~2 and v0.1.39~1 bumps. 4.2.16 and earlier -> 4.2.18 is
fine, because there were no RPM-induced storage.conf bumps. 4.2.18
-> 4.2.* is fine, because 4.2.18 has the patched machine-config
source.
* 4.2.16 and earlier -> 4.3, since 4.2.18 -> 4.3 takes
containers-common from 0.1.37 to 0.1.40, picking up the
v0.1.38~14^2~2 and v0.1.39~1 bumps. 4.2.18 -> 4.3 is fine, because
4.2.18 has the patched machine-config source.
* 4.3 -> 4.3 are fine, since they all have the patched machine-config
source.
So ideally this pull would block edges from 4.2.16 and earlier into
4.3. But because blocked-edges requires explicit to, I've just added
the 4.3.0 blocker (other 4.3.z releases either already blocked 4.2.*
or only give 4.2.18+ as update sources). I've also dropped 4.2.16
from the *-4.3 channels with a comment about this bug. There
shouldn't be much pushback on pulling the edge, because users can
still move from 4.2 to 4.3 via 4.2.19 -> 4.3.2.
Also simplify the wording on the GCP bug 1793635, which remains
unfixed.
[1]: openshift/machine-config-operator#1320 (comment)
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1782152#c5
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1781708#c0
[4]: https://github.com/openshift/machine-config-operator/pull/1320/files
[5]: openshift/machine-config-operator#1190
[6]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/MachineConfiguration.md
[7]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/ContainerRuntimeConfigDesign.md
[8]: openshift/machine-config-operator#330 (comment)
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1782153
[10]: openshift/machine-config-operator#1382 (comment)
[11]: openshift/machine-config-operator#1323 (comment)
[12]: https://bugzilla.redhat.com/show_bug.cgi?id=1782149
[13]: openshift/machine-config-operator#1322 (comment)
[14]: https://gitlab.cee.redhat.com/coretools/differ
Internal link, sorry :/ But you can also browse the history at:
https://releases-rhcos-art.cloud.privileged.psi.redhat.com/?stream=releases/rhcos-4.2&release=42.81.20200114.0 etc.
[15]: https://access.redhat.com/downloads/content/290/ver=4.2/rhel---8/4.2.0/x86_64/packages
[16]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8841/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package
[17]: https://github.com/containers/skopeo/
The machine-config operator had a bug where MachineConfig entries lead
the machine-config daemon (MCD) to lay down a storage.conf that
exactly matched the content installed by the containers-common RPM.
On update, the RHCOS machine pivots to a new OSTree image (defined in
the machine-os-content image referenced from the release image).
Seeing storage.conf content that matched the old OSTree image,
libostree replaced storage.conf with the version defined in the new
OSTree image [1]. Then, when the MCD comes back up post-pivot, it
sees the divergent storage.conf content and freaks out with logs like
[2]:
E1210 16:15:51.105286 11181 daemon.go:1350] content mismatch for file /etc/containers/storage.conf:
and the machine-config operator goes Degraded=True with
RequiredPoolsFailed "nodes are reporting degraded status on sync" [3].
The narrow machine-config fix was to annotate storage.conf that it
writes, libostree doesn't touch the files on pivot [4]. This
addresses the storage.conf case, but leaves the MCD vulnerable to
other instances of "MCD writes exactly the OSTree contents to $FILE
and expects it to remain untouched during an OSTree pivot that bumps
the file". I'm not aware of a generic fix at the moment, although [5]
might be related. You can guard a cluster against the narrow bug by
setting a MachineConfig [6] or higher level object such as a
ContainerRuntimeConfig [7] that will cause the MCD to write a
storage.conf that diverges (even just by a comment or whitespace) from
the OSTree original.
Tracking the narrow fix through the various z streams:
The 4.1 machine-config bug was introduced in d2c44d7 [8], which landed
before 4.1.0-rc.0:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.0-rc.0 | grep machine-config
machine-config-controller https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
machine-config-daemon https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
machine-config-operator https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
machine-config-server https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
setup-etcd-environment https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
$ git --no-pager log --oneline --first-parent de9998eb37 | grep d2c44d7
d2c44d7c Merge pull request openshift#330 from umohnani8/runtime
The 4.1 machine-config fix was [9], landed in 1301934 [10], which is
new in 4.1.34:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.34-x86_64 | grep machine-config
machine-config-controller https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
machine-config-daemon https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
machine-config-operator https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
machine-config-server https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
setup-etcd-environment https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.31-x86_64 | grep machine-config
machine-config-controller https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
machine-config-daemon https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
machine-config-operator https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
machine-config-server https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
setup-etcd-environment https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
$ git --no-pager log --oneline --first-parent -2 f56d736e74a
f56d736e (origin/release-4.1) Merge pull request openshift#1147 from openshift-cherrypick-robot/cherry-pick-1114-to-release-4.1
1301934a Merge pull request openshift#1382 from vrutkovs/4.1-containers-conf-generated
The 4.2 machine-config fix was [2], landed in bd358bb [11], which is new
in 4.2.18:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64 | grep machine-config
machine-config-operator https://github.com/openshift/machine-config-operator 31fed93186c9f84708f5cdfd0227ffe4f79b31cd
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64 | grep machine-config
machine-config-operator https://github.com/openshift/machine-config-operator 9366460085b2a24d825380759f554769ec5ab4f9
$ git --no-pager log --oneline --first-parent -2 9366460085
93664600 Merge pull request openshift#1362 from rphillips/fixes/1787581_4.2
bd358bb7 Merge pull request openshift#1323 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.2
The 4.3 machine-config fix was [12], landed in 9fd53bd [13], which
landed early enough for 4.3.0-rc.0:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 | grep machine-config
machine-config-operator https://github.com/openshift/machine-config-operator 23a6e6fb37e73501bc3216183ef5e6ebb15efc7a
$ git --no-pager log --oneline --first-parent -8 23a6e6fb37
23a6e6fb Merge pull request openshift#1348 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.3
80c8aed7 Merge pull request openshift#1343 from retroflexer/cherry-pick-backup-restore-kube-static-resources
269990a3 Merge pull request openshift#1344 from openshift-cherrypick-robot/cherry-pick-1296-to-release-4.3
fd3ca395 Merge pull request openshift#1338 from runcom/fix-go-mod
ba304dbb Merge pull request openshift#1333 from openshift-cherrypick-robot/cherry-pick-1278-to-release-4.3
787f3fa9 Merge pull request openshift#1332 from runcom/reserved-cpus-4.3
2b85d6ba Merge pull request openshift#1329 from openshift-cherrypick-robot/cherry-pick-1314-to-release-4.3
9fd53bd5 Merge pull request openshift#1322 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.3
The 4.4 machine-config fix was [3] which has landed before any 4.4 RCs
have been cut. Even in 4.4, the generated note was the first content
touch to this template:
$ git --no-pager log --oneline --follow origin/release-4.4 -- templates/common/_base/files/container-storage.yaml
46c4e27a (origin/pr/1320) templates/container-storage: Add a "this is generated" note
47a6321c templates: Move container-storage.yaml into common/
74ae3b31 (origin/pr/330) Add ContainerRuntime CRD and Controller
(47a6321c was a pure rename).
So the MCD has been annotating storage.conf since 4.1.34, 4.2.18, and
all 4.3 and later releases. When has the RPM-installed storage.conf
changed? Figuring this part out is a bit awkward, because we need to
drill down machine-os-content -> RHCOS -> RPM -> file. For example,
from 4.2.16 -> 4.2.18 [14]:
$ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64) | jq -r .config.config.Labels.version
42.81.20200114.0
$ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64) | jq -r .config.config.Labels.version
42.81.20200203.1
$ ./differ.py --first-endpoint art --first-version 42.81.20200114.0 --second-endpoint art --second-version 42.81.20200203.1 | jq -r '.diff | keys | sort[]'
cri-o
ignition
libarchive
machine-config-daemon
openshift-clients
openshift-hyperkube
sqlite-libs
storage.conf is managed by the containers-common RPM, so no change
from 4.2.16 to 4.2.18, and that update will safely pull in the fixed
MCD without a surprising pivot change. Here are our changes to the
RPM across the various z streams:
$ for OCP in 4.1.1 4.1.23 4.1.24 4.1.31-x86_64 4.1.34-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.1/${RHCOS}/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done
410.8.20190606.0 0.1.32 4.1.1
410.8.20191030.0 0.1.32 4.1.23
410.81.20191112.2 0.1.37 4.1.24
410.81.20200114.0 0.1.37 4.1.31-x86_64
410.81.20200204.1 0.1.40 4.1.34-x86_64
$ for OCP in 4.2.0-rc.0 4.2.2 4.2.4 4.2.16-x86_64 4.2.18-x86_64 4.2.19-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.2/${RHCOS}/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done
42.80.20190930.1 0.1.32 4.2.0-rc.0
42.80.20191022.0 0.1.32 4.2.2
42.81.20191107.0 0.1.37 4.2.4
42.81.20200114.0 0.1.37 4.2.16-x86_64
42.81.20200203.1 0.1.37 4.2.18-x86_64
42.81.20200210.0 0.1.40 4.2.19-x86_64
$ for OCP in 4.3.0-rc.0-x86_64 4.3.3-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.3/${RHCOS}/x86_64/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done
43.81.202001072253.0 0.1.40 4.3.0-rc.0-x86_64
43.81.202002170853.0 0.1.40 4.3.3-x86_64
Fetching a source RPM for containers-common, e.g. from [15,16] shows
the source packages coming from skopeo. Checking [17]:
$ git --no-pager log --follow --oneline --stat=200 -M50% -- vendor/github.com/containers/storage/storage.conf
afaa9e7f Bump github.com/containers/storage from 1.15.1 to 1.15.2
vendor/github.com/containers/storage/storage.conf | 3 ---
1 file changed, 3 deletions(-)
39ff039b Image encryption/decryption support in skopeo
vendor/github.com/containers/storage/storage.conf | 44 +++++++++++++++++++++++++-------------------
1 file changed, 25 insertions(+), 19 deletions(-)
05ae513b Bump github.com/containers/buildah from 1.8.4 to 1.11.4
vendor/github.com/containers/storage/storage.conf | 7 -------
1 file changed, 7 deletions(-)
700b3102 update github.com/containers/{image,storage}
vendor/github.com/containers/storage/storage.conf | 8 ++++++++
1 file changed, 8 insertions(+)
033b2902 migrate to go modules
vendor/github.com/containers/storage/storage.conf | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 130 insertions(+)
$ git --no-pager log --follow --oneline --stat=200 -M50% 033b2902^ -- contrib/storage.conf
fe259105 add storage.conf and manpage in contrib/
contrib/storage.conf | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
$ for HASH in fe259105 033b2902 700b3102 05ae513b 39ff039b afaa9e7f; do git describe --contains "${HASH}"; done
v0.1.29~3^2
v0.1.38~14^2~2
v0.1.39~1
v0.1.41~25^2
v0.1.41~21^2
v0.1.41~12^2
So changes may have been made in 0.1.29 (when the file landed for the
first time, likely from wherever we store post-Git patches), and were
likely made in 0.1.38, 0.1.39, and 0.1.41. However, the skopeo and
derivative containers-common RPMs may have had patched versions of the
file tracked in dist-git [18]. Comparing the dist-git 4.1 tip with
the machine-config template:
$ git -C containers/skopeo remote -v | grep 'dist-git.*fetch'
dist-git git://pkgs.devel.redhat.com/rpms/skopeo.git (fetch)
$ git --no-pager -C containers/skopeo log --date=short --format='%ad %h %s' -2 dist-git/rhaos-4.1-rhel-8 -- storage.conf
2018-07-18 3757b210 add statx to seccomp.json to containers-config add seccomp.json to containers-config
2017-11-08 284f9024 Force storage.conf to default to overlay
$ git --no-pager -C containers/skopeo grep '^Version:' 3757b210
3757b210:skopeo.spec:Version: 0.1.31
$ diff -U3 <(git -C containers/skopeo cat-file -p 3757b210:storage.conf) <(sed 's/^ //' openshift/machine-config-operator/templates/common/_base/files/container-storage.yaml)--- /dev/fd/63 2020-02-20 01:13:48.073704685 -0800
+++ /dev/fd/62 2020-02-20 01:13:48.073704685 -0800
@@ -1,3 +1,10 @@
+filesystem: "root"
+mode: 0644
+path: "/etc/containers/storage.conf"
+contents:
+ inline: |
+# This file is generated by the Machine Config Operator's containerruntimeconfig controller.
+#
# storage.conf is the configuration file for all tools
# that share the containers/storage libraries
# See man 5 containers-storage.conf for more information
So the machine-config master (5ed0aee72c) only differs from the old
0.1.31 RPM storage.conf by the "file is generated" marker.
There does not seem to be any 4.2-specific content. Presumably
they're using the same rhaos-4.1-rhel-8 RPMs. 4.3 has some changes:
$ git --no-pager log --date=short --format='%ad %h %s' -2 --stat=80 dist-git/rhaos-4.3-rhel-8 -- storage.conf
2019-12-09 4a131916 skopeo-0.1.40-2.el8
storage.conf | 39 +++++++++++++++++++++++++++++----------
1 file changed, 29 insertions(+), 10 deletions(-)
2019-10-08 13a4ce10 skopeo-1:0.1.40-0.1.gitf72e39f
storage.conf | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 114 insertions(+)
So it looks like we can ignore the dev skopeo repository, focus on the
dist-git skopeo repository, and say that before 0.1.40-2.el8 we had a
version of storage.conf in the RPMs that matched the unpatched
machine-config templates, and with 0.1.40-2.el8 and later the RPMs had
different content. Sanity checking via [19,20]:
$ diff -U3 <(rpm2cpio containers-common-0.1.32-5.git1715c90.el8.x86_64.rpm | cpio -i --to-stdout ./etc/containers/storage.conf 2>/dev/null) <(sed 's/^ //' templates/common/_base/files/container-storage.yaml)
--- /dev/fd/63 2020-02-20 01:36:23.031918968 -0800
+++ /dev/fd/62 2020-02-20 01:36:23.031918968 -0800
@@ -1,3 +1,10 @@
+filesystem: "root"
+mode: 0644
+path: "/etc/containers/storage.conf"
+contents:
+ inline: |
+# This file is generated by the Machine Config Operator's containerruntimeconfig controller.
+#
# storage.conf is the configuration file for all tools
# that share the containers/storage libraries
# See man 5 containers-storage.conf for more information
but I'm not clear on why the product pages are claiming
containers-common-0.1.32 for 4.1.34 [19,20].
FIXME
Comparing with our machine-os-content, that means vulnerable
transitions are:
* 4.1.* -> 4.1.34, since 4.1.31 -> 4.1.34 takes containers-common from
0.1.37 to 0.1.40, picking up the v0.1.38~14^2~2 and v0.1.39~1 bumps.
There may be no safe way to get to 4.1.34.
* 4.1.* -> 4.2... FIXME
* 4.2.16 and earler -> 4.2.19, since 4.2.18 -> 4.2.19 takes
containers-common from 0.1.37 to 0.1.40, picking up the
v0.1.38~14^2~2 and v0.1.39~1 bumps. 4.2.16 and earlier -> 4.2.18 is
fine, because there were no RPM-induced storage.conf bumps. 4.2.18
-> 4.2.* is fine, because 4.2.18 has the patched machine-config
source.
* 4.2.16 and earlier -> 4.3, since 4.2.18 -> 4.3 takes
containers-common from 0.1.37 to 0.1.40, picking up the
v0.1.38~14^2~2 and v0.1.39~1 bumps. 4.2.18 -> 4.3 is fine, because
4.2.18 has the patched machine-config source.
* 4.3 -> 4.3 are fine, since they all have the patched machine-config
source.
So ideally this pull would block edges from 4.2.16 and earlier into
4.3. But because blocked-edges requires explicit to, I've just added
the 4.3.0 blocker (other 4.3.z releases either already blocked 4.2.*
or only give 4.2.18+ as update sources). I've also dropped 4.2.16
from the *-4.3 channels with a comment about this bug. There
shouldn't be much pushback on pulling the edge, because users can
still move from 4.2 to 4.3 via 4.2.19 -> 4.3.2.
Also simplify the wording on the GCP bug 1793635, which remains
unfixed.
[1]: openshift/machine-config-operator#1320 (comment)
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1782152#c5
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1781708#c0
[4]: https://github.com/openshift/machine-config-operator/pull/1320/files
[5]: openshift/machine-config-operator#1190
[6]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/MachineConfiguration.md
[7]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/ContainerRuntimeConfigDesign.md
[8]: openshift/machine-config-operator#330 (comment)
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1782153
[10]: openshift/machine-config-operator#1382 (comment)
[11]: openshift/machine-config-operator#1323 (comment)
[12]: https://bugzilla.redhat.com/show_bug.cgi?id=1782149
[13]: openshift/machine-config-operator#1322 (comment)
[14]: https://gitlab.cee.redhat.com/coretools/differ
Internal link, sorry :/ But you can also browse the history at:
https://releases-rhcos-art.cloud.privileged.psi.redhat.com/?stream=releases/rhcos-4.2&release=42.81.20200114.0 etc.
[15]: https://access.redhat.com/downloads/content/290/ver=4.2/rhel---8/4.2.0/x86_64/packages
[16]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8841/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package
[17]: https://github.com/containers/skopeo/
[18]: http://pkgs.devel.redhat.com/cgit/rpms/skopeo/
[19]: https://access.redhat.com/downloads/content/290/ver=4.1/rhel---8/4.1.34/x86_64/packages
[20]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8384/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package
The machine-config operator had a bug where MachineConfig entries lead
the machine-config daemon (MCD) to lay down a storage.conf that
exactly matched the content installed by the containers-common RPM.
On update, the RHCOS machine pivots to a new OSTree image (defined in
the machine-os-content image referenced from the release image).
Seeing storage.conf content that matched the old OSTree image,
libostree replaced storage.conf with the version defined in the new
OSTree image [1]. Then, when the MCD comes back up post-pivot, it
sees the divergent storage.conf content and freaks out with logs like
[2]:
E1210 16:15:51.105286 11181 daemon.go:1350] content mismatch for file /etc/containers/storage.conf:
and the machine-config operator goes Degraded=True with
RequiredPoolsFailed "nodes are reporting degraded status on sync" [3].
The narrow machine-config fix was to annotate storage.conf that it
writes, libostree doesn't touch the files on pivot [4]. This
addresses the storage.conf case, but leaves the MCD vulnerable to
other instances of "MCD writes exactly the OSTree contents to $FILE
and expects it to remain untouched during an OSTree pivot that bumps
the file". I'm not aware of a generic fix at the moment, although [5]
might be related. You can guard a cluster against the narrow bug by
setting a MachineConfig [6] or higher level object such as a
ContainerRuntimeConfig [7] that will cause the MCD to write a
storage.conf that diverges (even just by a comment or whitespace) from
the OSTree original.
Tracking the narrow fix through the various z streams:
The 4.1 machine-config bug was introduced in d2c44d7 [8], which landed
before 4.1.0-rc.0:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.0-rc.0 | grep machine-config
machine-config-controller https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
machine-config-daemon https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
machine-config-operator https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
machine-config-server https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
setup-etcd-environment https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
$ git --no-pager log --oneline --first-parent de9998eb37 | grep d2c44d7
d2c44d7c Merge pull request openshift#330 from umohnani8/runtime
The 4.1 machine-config fix was [9], landed in 1301934 [10], which is
new in 4.1.34:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.34-x86_64 | grep machine-config
machine-config-controller https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
machine-config-daemon https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
machine-config-operator https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
machine-config-server https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
setup-etcd-environment https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.31-x86_64 | grep machine-config
machine-config-controller https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
machine-config-daemon https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
machine-config-operator https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
machine-config-server https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
setup-etcd-environment https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
$ git --no-pager log --oneline --first-parent -2 f56d736e74a
f56d736e (origin/release-4.1) Merge pull request openshift#1147 from openshift-cherrypick-robot/cherry-pick-1114-to-release-4.1
1301934a Merge pull request openshift#1382 from vrutkovs/4.1-containers-conf-generated
The 4.2 machine-config fix was [2], landed in bd358bb [11], which is new
in 4.2.18:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64 | grep machine-config
machine-config-operator https://github.com/openshift/machine-config-operator 31fed93186c9f84708f5cdfd0227ffe4f79b31cd
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64 | grep machine-config
machine-config-operator https://github.com/openshift/machine-config-operator 9366460085b2a24d825380759f554769ec5ab4f9
$ git --no-pager log --oneline --first-parent -2 9366460085
93664600 Merge pull request openshift#1362 from rphillips/fixes/1787581_4.2
bd358bb7 Merge pull request openshift#1323 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.2
The 4.3 machine-config fix was [12], landed in 9fd53bd [13], which
landed early enough for 4.3.0-rc.0:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 | grep machine-config
machine-config-operator https://github.com/openshift/machine-config-operator 23a6e6fb37e73501bc3216183ef5e6ebb15efc7a
$ git --no-pager log --oneline --first-parent -8 23a6e6fb37
23a6e6fb Merge pull request openshift#1348 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.3
80c8aed7 Merge pull request openshift#1343 from retroflexer/cherry-pick-backup-restore-kube-static-resources
269990a3 Merge pull request openshift#1344 from openshift-cherrypick-robot/cherry-pick-1296-to-release-4.3
fd3ca395 Merge pull request openshift#1338 from runcom/fix-go-mod
ba304dbb Merge pull request openshift#1333 from openshift-cherrypick-robot/cherry-pick-1278-to-release-4.3
787f3fa9 Merge pull request openshift#1332 from runcom/reserved-cpus-4.3
2b85d6ba Merge pull request openshift#1329 from openshift-cherrypick-robot/cherry-pick-1314-to-release-4.3
9fd53bd5 Merge pull request openshift#1322 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.3
The 4.4 machine-config fix was [3] which has landed before any 4.4 RCs
have been cut. Even in 4.4, the generated note was the first content
touch to this template:
$ git --no-pager log --oneline --follow origin/release-4.4 -- templates/common/_base/files/container-storage.yaml
46c4e27a (origin/pr/1320) templates/container-storage: Add a "this is generated" note
47a6321c templates: Move container-storage.yaml into common/
74ae3b31 (origin/pr/330) Add ContainerRuntime CRD and Controller
(47a6321c was a pure rename).
So the MCD has been annotating storage.conf since 4.1.34, 4.2.18, and
all 4.3 and later releases. When has the RPM-installed storage.conf
changed? Figuring this part out is a bit awkward, because we need to
drill down machine-os-content -> RHCOS -> RPM -> file. For example,
from 4.2.16 -> 4.2.18 [14]:
$ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64) | jq -r .config.config.Labels.version
42.81.20200114.0
$ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64) | jq -r .config.config.Labels.version
42.81.20200203.1
$ ./differ.py --first-endpoint art --first-version 42.81.20200114.0 --second-endpoint art --second-version 42.81.20200203.1 | jq -r '.diff | keys | sort[]'
cri-o
ignition
libarchive
machine-config-daemon
openshift-clients
openshift-hyperkube
sqlite-libs
storage.conf is managed by the containers-common RPM, so no change
from 4.2.16 to 4.2.18, and that update will safely pull in the fixed
MCD without a surprising pivot change. Here are our changes to the
RPM across the various z streams:
$ for OCP in 4.1.1 4.1.23 4.1.24 4.1.31-x86_64 4.1.34-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.1/${RHCOS}/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done
410.8.20190606.0 0.1.32 4.1.1
410.8.20191030.0 0.1.32 4.1.23
410.81.20191112.2 0.1.37 4.1.24
410.81.20200114.0 0.1.37 4.1.31-x86_64
410.81.20200204.1 0.1.40 4.1.34-x86_64
$ for OCP in 4.2.0-rc.0 4.2.2 4.2.4 4.2.16-x86_64 4.2.18-x86_64 4.2.19-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.2/${RHCOS}/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done
42.80.20190930.1 0.1.32 4.2.0-rc.0
42.80.20191022.0 0.1.32 4.2.2
42.81.20191107.0 0.1.37 4.2.4
42.81.20200114.0 0.1.37 4.2.16-x86_64
42.81.20200203.1 0.1.37 4.2.18-x86_64
42.81.20200210.0 0.1.40 4.2.19-x86_64
$ for OCP in 4.3.0-rc.0-x86_64 4.3.3-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.3/${RHCOS}/x86_64/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done
43.81.202001072253.0 0.1.40 4.3.0-rc.0-x86_64
43.81.202002170853.0 0.1.40 4.3.3-x86_64
Fetching a source RPM for containers-common, e.g. from [15,16] shows
the source packages coming from skopeo. Checking [17]:
$ git --no-pager log --follow --oneline --stat=200 -M50% -- vendor/github.com/containers/storage/storage.conf
afaa9e7f Bump github.com/containers/storage from 1.15.1 to 1.15.2
vendor/github.com/containers/storage/storage.conf | 3 ---
1 file changed, 3 deletions(-)
39ff039b Image encryption/decryption support in skopeo
vendor/github.com/containers/storage/storage.conf | 44 +++++++++++++++++++++++++-------------------
1 file changed, 25 insertions(+), 19 deletions(-)
05ae513b Bump github.com/containers/buildah from 1.8.4 to 1.11.4
vendor/github.com/containers/storage/storage.conf | 7 -------
1 file changed, 7 deletions(-)
700b3102 update github.com/containers/{image,storage}
vendor/github.com/containers/storage/storage.conf | 8 ++++++++
1 file changed, 8 insertions(+)
033b2902 migrate to go modules
vendor/github.com/containers/storage/storage.conf | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 130 insertions(+)
$ git --no-pager log --follow --oneline --stat=200 -M50% 033b2902^ -- contrib/storage.conf
fe259105 add storage.conf and manpage in contrib/
contrib/storage.conf | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
$ for HASH in fe259105 033b2902 700b3102 05ae513b 39ff039b afaa9e7f; do git describe --contains "${HASH}"; done
v0.1.29~3^2
v0.1.38~14^2~2
v0.1.39~1
v0.1.41~25^2
v0.1.41~21^2
v0.1.41~12^2
So changes may have been made in 0.1.29 (when the file landed for the
first time, likely from wherever we store post-Git patches), and were
likely made in 0.1.38, 0.1.39, and 0.1.41. However, the skopeo and
derivative containers-common RPMs may have had patched versions of the
file tracked in dist-git [18]. Comparing the dist-git 4.1 tip with
the machine-config template:
$ git -C containers/skopeo remote -v | grep 'dist-git.*fetch'
dist-git git://pkgs.devel.redhat.com/rpms/skopeo.git (fetch)
$ git --no-pager -C containers/skopeo log --date=short --format='%ad %h %s' -2 dist-git/rhaos-4.1-rhel-8 -- storage.conf
2018-07-18 3757b210 add statx to seccomp.json to containers-config add seccomp.json to containers-config
2017-11-08 284f9024 Force storage.conf to default to overlay
$ git --no-pager -C containers/skopeo grep '^Version:' 3757b210
3757b210:skopeo.spec:Version: 0.1.31
$ diff -U3 <(git -C containers/skopeo cat-file -p 3757b210:storage.conf) <(sed 's/^ //' openshift/machine-config-operator/templates/common/_base/files/container-storage.yaml)--- /dev/fd/63 2020-02-20 01:13:48.073704685 -0800
+++ /dev/fd/62 2020-02-20 01:13:48.073704685 -0800
@@ -1,3 +1,10 @@
+filesystem: "root"
+mode: 0644
+path: "/etc/containers/storage.conf"
+contents:
+ inline: |
+# This file is generated by the Machine Config Operator's containerruntimeconfig controller.
+#
# storage.conf is the configuration file for all tools
# that share the containers/storage libraries
# See man 5 containers-storage.conf for more information
So the machine-config master (5ed0aee72c) only differs from the old
0.1.31 RPM storage.conf by the "file is generated" marker.
There does not seem to be any 4.2-specific content. Presumably
they're using the same rhaos-4.1-rhel-8 RPMs. 4.3 has some changes:
$ git --no-pager log --date=short --format='%ad %h %s' -2 --stat=80 dist-git/rhaos-4.3-rhel-8 -- storage.conf
2019-12-09 4a131916 skopeo-0.1.40-2.el8
storage.conf | 39 +++++++++++++++++++++++++++++----------
1 file changed, 29 insertions(+), 10 deletions(-)
2019-10-08 13a4ce10 skopeo-1:0.1.40-0.1.gitf72e39f
storage.conf | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 114 insertions(+)
So it looks like we can ignore the dev skopeo repository, focus on the
dist-git skopeo repository, and say that before 0.1.40-2.el8 we had a
version of storage.conf in the RPMs that matched the unpatched
machine-config templates, and with 0.1.40-2.el8 and later the RPMs had
different content. Sanity checking via [19,20]:
$ diff -U3 <(rpm2cpio containers-common-0.1.32-5.git1715c90.el8.x86_64.rpm | cpio -i --to-stdout ./etc/containers/storage.conf 2>/dev/null) <(sed 's/^ //' templates/common/_base/files/container-storage.yaml)
--- /dev/fd/63 2020-02-20 01:36:23.031918968 -0800
+++ /dev/fd/62 2020-02-20 01:36:23.031918968 -0800
@@ -1,3 +1,10 @@
+filesystem: "root"
+mode: 0644
+path: "/etc/containers/storage.conf"
+contents:
+ inline: |
+# This file is generated by the Machine Config Operator's containerruntimeconfig controller.
+#
# storage.conf is the configuration file for all tools
# that share the containers/storage libraries
# See man 5 containers-storage.conf for more information
but I'm not clear on why the product pages are claiming
containers-common-0.1.32 for 4.1.34 [19,20].
FIXME
Comparing with our machine-os-content, that means vulnerable
transitions are:
* 4.1.* -> 4.1.34, since 4.1.31 -> 4.1.34 takes containers-common from
0.1.37 to 0.1.40, picking up the v0.1.38~14^2~2 and v0.1.39~1 bumps.
There may be no safe way to get to 4.1.34.
* 4.1.* -> 4.2... FIXME
* 4.2.16 and earler -> 4.2.19, since 4.2.18 -> 4.2.19 takes
containers-common from 0.1.37 to 0.1.40, picking up the
v0.1.38~14^2~2 and v0.1.39~1 bumps. 4.2.16 and earlier -> 4.2.18 is
fine, because there were no RPM-induced storage.conf bumps. 4.2.18
-> 4.2.* is fine, because 4.2.18 has the patched machine-config
source.
* 4.2.16 and earlier -> 4.3, since 4.2.18 -> 4.3 takes
containers-common from 0.1.37 to 0.1.40, picking up the
v0.1.38~14^2~2 and v0.1.39~1 bumps. 4.2.18 -> 4.3 is fine, because
4.2.18 has the patched machine-config source.
* 4.3 -> 4.3 are fine, since they all have the patched machine-config
source.
So ideally this pull would block edges from 4.2.16 and earlier into
4.3. But because blocked-edges requires explicit to, I've just added
the 4.3.0 blocker (other 4.3.z releases either already blocked 4.2.*
or only give 4.2.18+ as update sources). I've also dropped 4.2.16
from the *-4.3 channels with a comment about this bug. There
shouldn't be much pushback on pulling the edge, because users can
still move from 4.2 to 4.3 via 4.2.19 -> 4.3.2.
Also simplify the wording on the GCP bug 1793635, which remains
unfixed.
[1]: openshift/machine-config-operator#1320 (comment)
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1782152#c5
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1781708#c0
[4]: https://github.com/openshift/machine-config-operator/pull/1320/files
[5]: openshift/machine-config-operator#1190
[6]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/MachineConfiguration.md
[7]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/ContainerRuntimeConfigDesign.md
[8]: openshift/machine-config-operator#330 (comment)
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1782153
[10]: openshift/machine-config-operator#1382 (comment)
[11]: openshift/machine-config-operator#1323 (comment)
[12]: https://bugzilla.redhat.com/show_bug.cgi?id=1782149
[13]: openshift/machine-config-operator#1322 (comment)
[14]: https://gitlab.cee.redhat.com/coretools/differ
Internal link, sorry :/ But you can also browse the history at:
https://releases-rhcos-art.cloud.privileged.psi.redhat.com/?stream=releases/rhcos-4.2&release=42.81.20200114.0 etc.
[15]: https://access.redhat.com/downloads/content/290/ver=4.2/rhel---8/4.2.0/x86_64/packages
[16]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8841/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package
[17]: https://github.com/containers/skopeo/
[18]: http://pkgs.devel.redhat.com/cgit/rpms/skopeo/
[19]: https://access.redhat.com/downloads/content/290/ver=4.1/rhel---8/4.1.34/x86_64/packages
[20]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8384/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package
The machine-config operator had a bug where MachineConfig entries lead
the machine-config daemon (MCD) to lay down a storage.conf that
exactly matched the content installed by the containers-common RPM.
On update, the RHCOS machine pivots to a new OSTree image (defined in
the machine-os-content image referenced from the release image).
Seeing storage.conf content that matched the old OSTree image,
libostree replaced storage.conf with the version defined in the new
OSTree image [1]. Then, when the MCD comes back up post-pivot, it
sees the divergent storage.conf content and freaks out with logs like
[2]:
E1210 16:15:51.105286 11181 daemon.go:1350] content mismatch for file /etc/containers/storage.conf:
and the machine-config operator goes Degraded=True with
RequiredPoolsFailed "nodes are reporting degraded status on sync" [3].
The narrow machine-config fix was to annotate storage.conf that it
writes, libostree doesn't touch the files on pivot [4]. This
addresses the storage.conf case, but leaves the MCD vulnerable to
other instances of "MCD writes exactly the OSTree contents to $FILE
and expects it to remain untouched during an OSTree pivot that bumps
the file". I'm not aware of a generic fix at the moment, although [5]
might be related. You can guard a cluster against the narrow bug by
setting a MachineConfig [6] or higher level object such as a
ContainerRuntimeConfig [7] that will cause the MCD to write a
storage.conf that diverges (even just by a comment or whitespace) from
the OSTree original.
Tracking the narrow fix through the various z streams:
The 4.1 machine-config bug was introduced in d2c44d7 [8], which landed
before 4.1.0-rc.0:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.0-rc.0 | grep machine-config
machine-config-controller https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
machine-config-daemon https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
machine-config-operator https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
machine-config-server https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
setup-etcd-environment https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e
$ git --no-pager log --oneline --first-parent de9998eb37 | grep d2c44d7
d2c44d7c Merge pull request openshift#330 from umohnani8/runtime
The 4.1 machine-config fix was [9], landed in 1301934 [10], which is
new in 4.1.34:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.34-x86_64 | grep machine-config
machine-config-controller https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
machine-config-daemon https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
machine-config-operator https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
machine-config-server https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
setup-etcd-environment https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.31-x86_64 | grep machine-config
machine-config-controller https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
machine-config-daemon https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
machine-config-operator https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
machine-config-server https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
setup-etcd-environment https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84
$ git --no-pager log --oneline --first-parent -2 f56d736e74a
f56d736e (origin/release-4.1) Merge pull request openshift#1147 from openshift-cherrypick-robot/cherry-pick-1114-to-release-4.1
1301934a Merge pull request openshift#1382 from vrutkovs/4.1-containers-conf-generated
The 4.2 machine-config fix was [2], landed in bd358bb [11], which is new
in 4.2.18:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64 | grep machine-config
machine-config-operator https://github.com/openshift/machine-config-operator 31fed93186c9f84708f5cdfd0227ffe4f79b31cd
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64 | grep machine-config
machine-config-operator https://github.com/openshift/machine-config-operator 9366460085b2a24d825380759f554769ec5ab4f9
$ git --no-pager log --oneline --first-parent -2 9366460085
93664600 Merge pull request openshift#1362 from rphillips/fixes/1787581_4.2
bd358bb7 Merge pull request openshift#1323 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.2
The 4.3 machine-config fix was [12], landed in 9fd53bd [13], which
landed early enough for 4.3.0-rc.0:
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 | grep machine-config
machine-config-operator https://github.com/openshift/machine-config-operator 23a6e6fb37e73501bc3216183ef5e6ebb15efc7a
$ git --no-pager log --oneline --first-parent -8 23a6e6fb37
23a6e6fb Merge pull request openshift#1348 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.3
80c8aed7 Merge pull request openshift#1343 from retroflexer/cherry-pick-backup-restore-kube-static-resources
269990a3 Merge pull request openshift#1344 from openshift-cherrypick-robot/cherry-pick-1296-to-release-4.3
fd3ca395 Merge pull request openshift#1338 from runcom/fix-go-mod
ba304dbb Merge pull request openshift#1333 from openshift-cherrypick-robot/cherry-pick-1278-to-release-4.3
787f3fa9 Merge pull request openshift#1332 from runcom/reserved-cpus-4.3
2b85d6ba Merge pull request openshift#1329 from openshift-cherrypick-robot/cherry-pick-1314-to-release-4.3
9fd53bd5 Merge pull request openshift#1322 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.3
The 4.4 machine-config fix was [3] which has landed before any 4.4 RCs
have been cut. Even in 4.4, the generated note was the first content
touch to this template:
$ git --no-pager log --oneline --follow origin/release-4.4 -- templates/common/_base/files/container-storage.yaml
46c4e27a (origin/pr/1320) templates/container-storage: Add a "this is generated" note
47a6321c templates: Move container-storage.yaml into common/
74ae3b31 (origin/pr/330) Add ContainerRuntime CRD and Controller
(47a6321c was a pure rename).
So the MCD has been annotating storage.conf since 4.1.34, 4.2.18, and
all 4.3 and later releases. When has the RPM-installed storage.conf
changed? Figuring this part out is a bit awkward, because we need to
drill down machine-os-content -> RHCOS -> RPM -> file. For example,
from 4.2.16 -> 4.2.18 [14]:
$ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64) | jq -r .config.config.Labels.version
42.81.20200114.0
$ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64) | jq -r .config.config.Labels.version
42.81.20200203.1
$ ./differ.py --first-endpoint art --first-version 42.81.20200114.0 --second-endpoint art --second-version 42.81.20200203.1 | jq -r '.diff | keys | sort[]'
cri-o
ignition
libarchive
machine-config-daemon
openshift-clients
openshift-hyperkube
sqlite-libs
storage.conf is managed by the containers-common RPM, so no change
from 4.2.16 to 4.2.18, and that update will safely pull in the fixed
MCD without a surprising pivot change. Here are our changes to the
RPM across the various z streams:
$ for OCP in 4.1.1 4.1.16 4.1.17 4.1.23 4.1.24 4.1.28 4.1.29 4.1.31-x86_64 4.1.34-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.1/${RHCOS}/commitmeta.json" | jq -c '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .')"; echo "${COMMON} ${RHCOS} ${OCP}"; done
["containers-common","1","0.1.32","4.git1715c90.el8","x86_64"] 410.8.20190606.0 4.1.1
["containers-common","1","0.1.32","4.git1715c90.el8","x86_64"] 410.8.20190910.1 4.1.16
["containers-common","1","0.1.32","5.git1715c90.el8","x86_64"] 410.8.20190918.0 4.1.17
["containers-common","1","0.1.32","5.git1715c90.el8","x86_64"] 410.8.20191030.0 4.1.23
["containers-common","1","0.1.37","5.module+el8.1.0+4240+893c1ab8","x86_64"] 410.81.20191112.2 4.1.24
["containers-common","1","0.1.37","5.module+el8.1.0+4240+893c1ab8","x86_64"] 410.81.20191210.0 4.1.28
["containers-common","1","0.1.37","6.module+el8.1.0+4876+e678a192","x86_64"] 410.81.20191223.0 4.1.29
["containers-common","1","0.1.37","6.module+el8.1.0+4876+e678a192","x86_64"] 410.81.20200114.0 4.1.31-x86_64
["containers-common","1","0.1.40","8.module+el8.1.1+5351+506397b0","x86_64"] 410.81.20200204.1 4.1.34-x86_64
$ for OCP in 4.2.0-rc.0 4.2.2 4.2.4 4.2.12 4.2.13 4.2.18-x86_64 4.2.19-x86_64 4.2.20-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.2/${RHCOS}/commitmeta.json" | jq -c '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .')"; echo "${COMMON} ${RHCOS} ${OCP}"; done
["containers-common","1","0.1.32","5.git1715c90.el8","x86_64"] 42.80.20190930.1 4.2.0-rc.0
["containers-common","1","0.1.32","5.git1715c90.el8","x86_64"] 42.80.20191022.0 4.2.2
["containers-common","1","0.1.37","5.module+el8.1.0+4240+893c1ab8","x86_64"] 42.81.20191107.0 4.2.4
["containers-common","1","0.1.37","5.module+el8.1.0+4240+893c1ab8","x86_64"] 42.81.20191210.1 4.2.12
["containers-common","1","0.1.37","6.module+el8.1.0+4876+e678a192","x86_64"] 42.81.20191223.0 4.2.13
["containers-common","1","0.1.37","6.module+el8.1.0+4876+e678a192","x86_64"] 42.81.20200203.1 4.2.18-x86_64
["containers-common","1","0.1.40","8.module+el8.1.1+5351+506397b0","x86_64"] 42.81.20200210.0 4.2.19-x86_64
["containers-common","1","0.1.40","8.module+el8.1.1+5351+506397b0","x86_64"] 42.81.20200217.0 4.2.20-x86_64
$ for OCP in 4.3.0-rc.0-x86_64 4.3.0-x86_64 4.3.1-x86_64 4.3.2-x86_64 4.3.3-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.3/${RHCOS}/x86_64/commitmeta.json" | jq -c '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .')"; echo "${COMMON} ${RHCOS} ${OCP}"; done
["containers-common","1","0.1.40","2.el8","x86_64"] 43.81.202001072253.0 4.3.0-rc.0-x86_64
["containers-common","1","0.1.40","2.el8","x86_64"] 43.81.202001142154.0 4.3.0-x86_64
["containers-common","1","0.1.40","3.rhaos.el8","x86_64"] 43.81.202002032142.0 4.3.1-x86_64
["containers-common","1","0.1.40","8.module+el8.1.1+5351+506397b0","x86_64"] 43.81.202002110953.0 4.3.2-x86_64
["containers-common","1","0.1.40","8.module+el8.1.1+5351+506397b0","x86_64"] 43.81.202002170853.0 4.3.3-x86_64
Fetching a source RPM for containers-common, e.g. from [15,16] shows
the source packages coming from skopeo. Checking [17]:
$ git --no-pager log --follow --oneline --stat=200 -M50% -- vendor/github.com/containers/storage/storage.conf
afaa9e7f Bump github.com/containers/storage from 1.15.1 to 1.15.2
vendor/github.com/containers/storage/storage.conf | 3 ---
1 file changed, 3 deletions(-)
39ff039b Image encryption/decryption support in skopeo
vendor/github.com/containers/storage/storage.conf | 44 +++++++++++++++++++++++++-------------------
1 file changed, 25 insertions(+), 19 deletions(-)
05ae513b Bump github.com/containers/buildah from 1.8.4 to 1.11.4
vendor/github.com/containers/storage/storage.conf | 7 -------
1 file changed, 7 deletions(-)
700b3102 update github.com/containers/{image,storage}
vendor/github.com/containers/storage/storage.conf | 8 ++++++++
1 file changed, 8 insertions(+)
033b2902 migrate to go modules
vendor/github.com/containers/storage/storage.conf | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 130 insertions(+)
$ git --no-pager log --follow --oneline --stat=200 -M50% 033b2902^ -- contrib/storage.conf
fe259105 add storage.conf and manpage in contrib/
contrib/storage.conf | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
$ for HASH in fe259105 033b2902 700b3102 05ae513b 39ff039b afaa9e7f; do git describe --contains "${HASH}"; done
v0.1.29~3^2
v0.1.38~14^2~2
v0.1.39~1
v0.1.41~25^2
v0.1.41~21^2
v0.1.41~12^2
So changes may have been made in 0.1.29 (when the file landed for the
first time, likely from wherever we store post-Git patches), and were
likely made in 0.1.38, 0.1.39, and 0.1.41. However, the skopeo and
derivative containers-common RPMs may have had patched versions of the
file tracked in dist-git [18]. Comparing the dist-git 4.1 tip with
the machine-config template:
$ git -C containers/skopeo remote -v | grep 'dist-git.*fetch'
dist-git git://pkgs.devel.redhat.com/rpms/skopeo.git (fetch)
$ git --no-pager -C containers/skopeo log --date=short --format='%ad %h %s' -2 dist-git/rhaos-4.1-rhel-8 -- storage.conf
2018-07-18 3757b210 add statx to seccomp.json to containers-config add seccomp.json to containers-config
2017-11-08 284f9024 Force storage.conf to default to overlay
$ git --no-pager -C containers/skopeo grep '^Version:' 3757b210
3757b210:skopeo.spec:Version: 0.1.31
$ diff -U3 <(git -C containers/skopeo cat-file -p 3757b210:storage.conf) <(sed 's/^ //' openshift/machine-config-operator/templates/common/_base/files/container-storage.yaml)--- /dev/fd/63 2020-02-20 01:13:48.073704685 -0800
+++ /dev/fd/62 2020-02-20 01:13:48.073704685 -0800
@@ -1,3 +1,10 @@
+filesystem: "root"
+mode: 0644
+path: "/etc/containers/storage.conf"
+contents:
+ inline: |
+# This file is generated by the Machine Config Operator's containerruntimeconfig controller.
+#
# storage.conf is the configuration file for all tools
# that share the containers/storage libraries
# See man 5 containers-storage.conf for more information
So the machine-config master (5ed0aee72c) only differs from the old
0.1.31 RPM storage.conf by the "file is generated" marker.
There does not seem to be any 4.2-specific content. Presumably
they're using the same rhaos-4.1-rhel-8 RPMs. 4.3 has some changes:
$ git --no-pager log --date=short --format='%ad %h %s' -2 --stat=80 dist-git/rhaos-4.3-rhel-8 -- storage.conf
2019-12-09 4a131916 skopeo-0.1.40-2.el8
storage.conf | 39 +++++++++++++++++++++++++++++----------
1 file changed, 29 insertions(+), 10 deletions(-)
2019-10-08 13a4ce10 skopeo-1:0.1.40-0.1.gitf72e39f
storage.conf | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 114 insertions(+)
So it looks like we can ignore the dev skopeo repository, focus on the
dist-git skopeo repository, and say that before 0.1.40-2.el8 we had a
version of storage.conf in the RPMs that matched the unpatched
machine-config templates, and with 0.1.40-2.el8 and later the RPMs had
different content. Can we check the RPMs to confirm?
The product pages are claiming containers-common-0.1.32 for 4.1.34
[19,20]. Those product pages are fed from RPM Errata reports, and ART
builds those Errata by sweeping RPM repositories in the viscinity of
the RHCOS builds. So there's a potential for races like:
1. RPM Errata sweep fires and grabs RPM A v1.
2. New RPM A v2 pushed to the repository.
3. RHCOS build hits repositories and grabs RPM A v2.
The RPMs referenced by releases-rhcos-art.cloud are reliable, but
actually tracking down the referenced RPMs to download them is
complicated (especially for module builds like containers-common).
But here are two RPM-lookup procedures that seem more reliable:
A. From [21]:
1. On [21], find the matching skopeo package,
e.g. skopeo-0.1.40-2.el8. Click through to the Advisory,
e.g. [22].
2. On [22], find the matching skopeo package, expand the CDN RPMs
section to see the containers-common RPM link, e.g. [23].
3. Click through to /etc/containers/storage.conf, e.g. [24].
4. See the sha256, e.g. a6423cca39d0cde0d6ee82163630d288e8876ab7d39d2678f6d86d804bf61044.
B. From [25]. This works better for module builds.
1. Search for the skopeo package from [25], e.g. [26], takes me to
[27].
2. Find the matching package,
e.g. skopeo-0.1.37-5.module+el8.1.0+4240+893c1ab8, and click
through to [28].
3. Find the x86_64 containers-common RPM, and click through to info
[29]. Continue from step A.3.
Summarizing storage.conf digests for the various RPMs:
* containers-common-1:0.1.32-4.git1715c90.el8.x86_64
Used for 4.1.1 through 4.1.16.
ee7daca89532d5a80da391fc358776ec11eff256c497652c49505acc70b96822 [30]
* containers-common-1:0.1.32-5.git1715c90.el8.x86_64
Used for 4.1.7 through 4.1.23, 4.2.0-rc.0 through 4.2.2.
ee7daca89532d5a80da391fc358776ec11eff256c497652c49505acc70b96822 [31]
* containers-common-1:0.1.37-5.module+el8.1.0+4240+893c1ab8.x86_64
Used for 4.1.24 through 4.1.28, 4.2.4 through 4.2.12.
ee7daca89532d5a80da391fc358776ec11eff256c497652c49505acc70b96822 [32]
* containers-common-1:0.1.37-6.module+el8.1.0+4876+e678a192.x86_64
Used for 4.1.29 through 4.1.31, 4.2.13 through 4.2.18.
ee7daca89532d5a80da391fc358776ec11eff256c497652c49505acc70b96822 [33]
* containers-common-1:0.1.40-2.el8.x86_64.rpm
Used for 4.3.0-rc.0 through 4.3.0.
a6423cca39d0cde0d6ee82163630d288e8876ab7d39d2678f6d86d804bf61044 [24]
* containers-common-1:0.1.40-3.rhaos.el8.x86_64
Used for 4.3.1.
a6423cca39d0cde0d6ee82163630d288e8876ab7d39d2678f6d86d804bf61044 [34]
* containers-common-1:0.1.40-8.module+el8.1.1+5351+506397b0.x86_64
Used for 4.2.19, 4.2.20, and 4.1.34.
a6423cca39d0cde0d6ee82163630d288e8876ab7d39d2678f6d86d804bf61044 [35]
So there are only two versions in the RPMs, ee7daca895 used for all
4.1 and 4.2, and a6423cca39 used for all 4.3. That means that the
vulnerable transitions are 4.2.16 and earlier going into 4.3. It also
means that there's a potential for future trouble in transitions from
4.1.31 and earlier to a future 4.1 or 4.2 where the RPM-installed
content is different, and from 4.2.16 and earlier to a future 4.2
where the RPM-installed content is different, but that we have no such
4.1 or 4.2 changes at the moment.
So ideally this pull would block edges from 4.2.16 and earlier into
4.3. This commit drops 4.2.16 from the *-4.3 channels with a comment
about this bug. This also explicitly blocks edges from 4.2 into
4.3.0, because 4.3.0 is the only 4.3 release which recommends 4.2.16
or earlier as an update edge.
$ for i in $(seq 0 3); do echo -n "$i "; oc adm release info "quay.io/openshift-release-dev/ocp-release:4.3.$i-x86_64" | grep Upgrades; done
0 Upgrades: 4.2.16, 4.3.0-rc.0, 4.3.0-rc.1, 4.3.0-rc.2, 4.3.0-rc.3
1 Upgrades: 4.2.18, 4.3.0-rc.0, 4.3.0-rc.3, 4.3.0
2 Upgrades: 4.2.19, 4.3.0, 4.3.1
3 Upgrades: 4.2.20, 4.3.0, 4.3.1, 4.3.2
There shouldn't be much pushback on pulling the edge, because users
can still move from 4.2 to 4.3 via 4.2.18 -> 4.3.1, both of which are
already in fast-4.3.
Also simplify the wording on the GCP bug 1793635, which remains
unfixed.
[1]: openshift/machine-config-operator#1320 (comment)
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1782152#c5
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1781708#c0
[4]: https://github.com/openshift/machine-config-operator/pull/1320/files
[5]: openshift/machine-config-operator#1190
[6]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/MachineConfiguration.md
[7]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/ContainerRuntimeConfigDesign.md
[8]: openshift/machine-config-operator#330 (comment)
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1782153
[10]: openshift/machine-config-operator#1382 (comment)
[11]: openshift/machine-config-operator#1323 (comment)
[12]: https://bugzilla.redhat.com/show_bug.cgi?id=1782149
[13]: openshift/machine-config-operator#1322 (comment)
[14]: https://gitlab.cee.redhat.com/coretools/differ
Internal link, sorry :/ But you can also browse the history at:
https://releases-rhcos-art.cloud.privileged.psi.redhat.com/?stream=releases/rhcos-4.2&release=42.81.20200114.0 etc.
[15]: https://access.redhat.com/downloads/content/290/ver=4.2/rhel---8/4.2.0/x86_64/packages
[16]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8841/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package
[17]: https://github.com/containers/skopeo/
[18]: http://pkgs.devel.redhat.com/cgit/rpms/skopeo/
[19]: https://access.redhat.com/downloads/content/290/ver=4.1/rhel---8/4.1.34/x86_64/packages
[20]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8384/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package
[21]: https://errata.devel.redhat.com/package/show/skopeo
[22]: https://errata.devel.redhat.com/errata/content/46255
[23]: https://brewweb.engineering.redhat.com/brew/rpminfo?rpmID=7604818
[24]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=7604818&filename=/etc/containers/storage.conf
[25]: https://brewweb.engineering.redhat.com/brew/search
[26]: https://brewweb.engineering.redhat.com/brew/search?match=glob&type=package&terms=skopeo
[27]: https://brewweb.engineering.redhat.com/brew/packageinfo?packageID=58395
[28]: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=971200
[29]: https://brewweb.engineering.redhat.com/brew/rpminfo?rpmID=7349205
[30]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=6958325&filename=/etc/containers/storage.conf
[31]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=7334504&filename=/etc/containers/storage.conf
[32]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=7349205&filename=/etc/containers/storage.conf
[33]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=7550403&filename=/etc/containers/storage.conf
[34]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=7727297&filename=/etc/containers/storage.conf
[35]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=7656074&filename=/etc/containers/storage.conf
…ift-4.7-ose-cluster-samples-operator Updating ose-cluster-samples-operator builder & base images to be consistent with ART
Add a new CRD and Controller that allows users to configure
certain options in crio.conf and storage.conf. A template is
used for the default values that can be changed with a CR. When
the CR is deleted, the values are reverted back to its defaults.
The following options can be configured in /etc/crio/crio.conf
The following options can be configured in /etc/containers/storage.conf:
Signed-off-by: Urvashi Mohnani umohnani@redhat.com