-
Notifications
You must be signed in to change notification settings - Fork 213
Verify a GPG signature on the release image digest from a root of trust within the payload #170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
69d04cc to
780e451
Compare
|
/test e2e-aws |
cf9f6de to
ad61a26
Compare
7227456 to
83c0ffb
Compare
|
Simplified down to fix two problems:
Added another cvo_scenario test that showed different logic. |
|
/retest |
|
/retest |
|
Are we ok with API changes? If so #293 needs to be merged before this and I need to bump the API. |
|
WTH /retest |
Looks like there is a merge conflict @smarterclayton |
All operators should use the message policy fallback to grab a final error message, if necessary.
Only bring '-v' in because that one we care about.
Use the existing Atomic image signing protocol to read detached signatures for images by digest from a remote location so that release images can be verified before they are executed. Add a Verifier interface to the CVO that abstracts checking for verified updates. On start, check the payload for a config map with the annotation release.openshift.io/verification-config-map set (value is ignored) and load the set of all public keys that must be verified along with the http or file store locations for detached signatures. Every key must be verified to accept the payload. A subsequent commit will leverage the Verifier to block downloading a new release image.
The verifier is loaded at startup and passed to the payload retriever, which then delegates to the verifier and returns an error if the digest is not authentic. If no verifier is configured, the default verifier is to reject the payload as inauthentic. Move the initialization of the sync worker to after the verifier is loaded so that we can pass the verifier into the payload retriever, which makes the New() method slightly smaller and separates out payload related actions from controller related actions.
For safety, all unverified images are considered untrusted and the CVO should not apply them. An admin may override this safe default with force (which would be mapped to oc adm upgrade --force). When this happens, we should communicate clearly which updates were verified and which were not. The CVO, unless configured otherwise, assumes everything is unverified. The payload retriever uses the verifier to check the incoming image - both tags or failed verification is reported to the user. If the admin then sets allowUnverifiedImages, the sync loop continues. Periodically in the background recheck any reconciling payload that is unverified against the verifier, just in case it was a temporary state.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@smarterclayton: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Known flake, force merging. |
Adding a client-side guard like the server-side guard [1] from openshift/cluster-version-operator@55e3cb450f (verifier: Add public key verification of release image digests, 2019-04-19, openshift/cluster-version-operator#170). This isn't that big a deal, because the client-side guard exists, but it does save users the effort of pushing a by-tag pullspec into ClusterVersion and then having to circle back around to notice that the cluster-version operator is complaining about the verification failure. The warning on --force is because Clayton considers blocking this flow completely to be a breaking API change, and also considers that failing those folks early is more invasive than letting them continue to slip through here and have some subset of them potentially blow up later if: 1. You ask to update to a by-tag pullspec. 2. Cluster updates. 3. Someone clobbers the tag you used in the registry to point it at a different release. 3. Cluster continues on, blissfully unaware. 4. CVO gets rescheduled for whatever reason. 5. Cluster pulls a fresh registry image for the new pod, but it's by-tag, so you get the new content. 6. CVO thinks it's still in reconciling mode, because the pullspec hasn't changed. 7. World explodes as the new manifests get applied in a parallel, randomized order. Or maybe the new CVO defaults to starting in install mode or some such. But still, random release rollouts trigged by CVO pod restarts are more exitement than I'd wish on anyone, even folks who use --force. It would also be acceptable to have oc attempt to resolve the by-tag pullspec into a by-digest pullspec, but there's no guarantee that the host running 'oc' has access to the same registry that the in-cluster CRI-O will be pulling from, so getting consistent client-side resolution would be tricky. [1]: https://github.com/openshift/cluster-version-operator/blame/89cb270523675e27a2f54918431170946636f5d5/pkg/verify/verify.go#L203-L204
Adding a client-side guard like the server-side guard [1] from openshift/cluster-version-operator@55e3cb450f (verifier: Add public key verification of release image digests, 2019-04-19, openshift/cluster-version-operator#170). This isn't that big a deal, because the client-side guard exists, but it does save users the effort of pushing a by-tag pullspec into ClusterVersion and then having to circle back around to notice that the cluster-version operator is complaining about the verification failure. The warning on --force (instead of a hard failure) is because Clayton considers blocking this flow completely to be a breaking API change, and also considers that failing those folks early is more invasive than letting them continue to slip through here and have some subset of them potentially blow up later if: 1. You ask to update to a by-tag pullspec. 2. Cluster updates. 3. Someone clobbers the tag you used in the registry to point it at a different release. 4. Cluster continues on, blissfully unaware. 5. CVO gets rescheduled for whatever reason. 6. Cluster pulls a fresh registry image for the new pod, but it's by-tag, so you get the new content. 7. CVO thinks it's still in reconciling mode, because the pullspec hasn't changed. 8. World explodes as the new manifests get applied in a parallel, randomized order. Or maybe the new CVO defaults to starting in install mode or some such. But still, random release rollouts trigged by CVO pod restarts are more exitement than I'd wish on anyone, even folks who use --force. It would also be acceptable to have oc attempt to resolve the by-tag pullspec into a by-digest pullspec, but there's no guarantee that the host running 'oc' has access to the same registry that the in-cluster CRI-O will be pulling from, so getting consistent client-side resolution would be tricky. [1]: https://github.com/openshift/cluster-version-operator/blame/89cb270523675e27a2f54918431170946636f5d5/pkg/verify/verify.go#L203-L204
Adding a client-side guard like the server-side guard [1] from openshift/cluster-version-operator@55e3cb450f (verifier: Add public key verification of release image digests, 2019-04-19, openshift/cluster-version-operator#170). This isn't that big a deal, because the client-side guard exists, but it does save users the effort of pushing a by-tag pullspec into ClusterVersion and then having to circle back around to notice that the cluster-version operator is complaining about the verification failure. The warning on --force (instead of a hard failure) is because Clayton considers blocking this flow completely to be a breaking API change, and also considers that failing those folks early is more invasive than letting them continue to slip through here and have some subset of them potentially blow up later if: 1. You ask to update to a by-tag pullspec. 2. Cluster updates. 3. Someone clobbers the tag you used in the registry to point it at a different release. 4. Cluster continues on, blissfully unaware. 5. CVO gets rescheduled for whatever reason. 6. Cluster pulls a fresh registry image for the new pod, but it's by-tag, so you get the new content. 7. CVO thinks it's still in reconciling mode, because the pullspec hasn't changed. 8. World explodes as the new manifests get applied in a parallel, randomized order. Or maybe the new CVO defaults to starting in install mode or some such. But still, random release rollouts trigged by CVO pod restarts are more exitement than I'd wish on anyone, even folks who use --force. It would also be acceptable to have oc attempt to resolve the by-tag pullspec into a by-digest pullspec, but there's no guarantee that the host running 'oc' has access to the same registry that the in-cluster CRI-O will be pulling from, so getting consistent client-side resolution would be tricky. [1]: https://github.com/openshift/cluster-version-operator/blame/89cb270523675e27a2f54918431170946636f5d5/pkg/verify/verify.go#L203-L204
Commit f8eff25 (PR openshift#170) missed to update the doc string Signed-off-by: Lalatendu Mohanty <[email protected]>
These docs have not been adjusted since the properties landed in ab4ff93 (Update ClusterVersion to have a 'force' update flag and track verified, 2019-04-22, openshift#293). The initial cluster-version operator implementation only used them for signature checks, openshift/cluster-version-operator@f8eff25191 (sync: Report whether updates are verified and allow admin override, 2019-04-20, openshift/cluster-version-operator#170). At that time, RetrievePayload would always attempt to verify the proposed release image, including both signature validation and "does this look like a real release image?" checks. If that verification succeeded, it would set 'verified: true'. If that verification failed, it would look at 'force', and if 'force' was true, it would set 'verified: false' and proceed with the update, while if 'force' was false, it would reject the update. Despite the API docs, availableUpdates never came into this in the CVO code. The only availableUpdates guards are client-side in 'oc', and the API docs are discussion the ClusterVersion API, not 'oc adm upgrade's --force API. In openshift/oc@0501d04ff1 (upgrade: Separate flags for safety instead of abusing force, 2019-09-27, openshift/oc#109), oc's client-side guards split --allow-explicit-upgrade (for updating to something not in availableUpdates) and --allow-upgrade-with-warnings (for updating despite client-side guards against Progressing and Degraded ClusterVersion conditions) away from --force, but again, never part of the ClusterVersion force property's CVO-side handling. Since then, the CVO grew precondition handlers in openshift/cluster-version-operator@aceb5bc66f (pkg: add precondition handlers and perform these checks before accepting Update, 2019-08-26, openshift/cluster-version-operator#243). syncOnce ran all the precondition checks, which at that point was just the Upgradeable ClusterOperator and ClusterVersion condition check. If that check failed, but 'force' was true, it would proceed with the update, while if 'force' was false, it would reject the update. In neither case would 'verified' be altered, it continued to only track the signature and "does this look like a real release image?" checks. The idea at the time was that these would fall under the "normal protections" phrasing (overriden by force) and not the "other forms of consistency checking" phrasing (whatever those were supposed to be).
These docs have not been adjusted since the properties landed in ab4ff93 (Update ClusterVersion to have a 'force' update flag and track verified, 2019-04-22, openshift#293). The initial cluster-version operator implementation only used them for signature checks, openshift/cluster-version-operator@f8eff25191 (sync: Report whether updates are verified and allow admin override, 2019-04-20, openshift/cluster-version-operator#170). At that time, RetrievePayload would always attempt to verify the proposed release image, including both signature validation and "does this look like a real release image?" checks. If that verification succeeded, it would set 'verified: true'. If that verification failed, it would look at 'force', and if 'force' was true, it would set 'verified: false' and proceed with the update, while if 'force' was false, it would reject the update. Despite the API docs, availableUpdates never came into this in the CVO code. The only availableUpdates guards are client-side in 'oc', and the API docs are discussion the ClusterVersion API, not 'oc adm upgrade's --force API. In openshift/oc@0501d04ff1 (upgrade: Separate flags for safety instead of abusing force, 2019-09-27, openshift/oc#109), oc's client-side guards split --allow-explicit-upgrade (for updating to something not in availableUpdates) and --allow-upgrade-with-warnings (for updating despite client-side guards against Progressing and Degraded ClusterVersion conditions) away from --force, but again, never part of the ClusterVersion force property's CVO-side handling. Since then, the CVO grew precondition handlers in openshift/cluster-version-operator@aceb5bc66f (pkg: add precondition handlers and perform these checks before accepting Update, 2019-08-26, openshift/cluster-version-operator#243). syncOnce ran all the precondition checks, which at that point was just the Upgradeable ClusterOperator and ClusterVersion condition check. If that check failed, but 'force' was true, it would proceed with the update, while if 'force' was false, it would reject the update. In neither case would 'verified' be altered; it continued to only track the signature and "does this look like a real release image?" checks. The idea at the time was that these would fall under the "normal protections" phrasing (overriden by force) and not the "other forms of consistency checking" phrasing (whatever those were supposed to be).
These docs have not been adjusted since the properties landed in ab4ff93 (Update ClusterVersion to have a 'force' update flag and track verified, 2019-04-22, openshift#293). The initial cluster-version operator implementation only used them for signature checks, openshift/cluster-version-operator@f8eff25191 (sync: Report whether updates are verified and allow admin override, 2019-04-20, openshift/cluster-version-operator#170). At that time, RetrievePayload would always attempt to verify the proposed release image, including both signature validation and "does this look like a real release image?" checks. If that verification succeeded, it would set 'verified: true'. If that verification failed, it would look at 'force', and if 'force' was true, it would set 'verified: false' and proceed with the update, while if 'force' was false, it would reject the update. Despite the API docs, availableUpdates never came into this in the CVO code. The only availableUpdates guards are client-side in 'oc', and the API docs are discussion the ClusterVersion API, not 'oc adm upgrade's --force API. In openshift/oc@0501d04ff1 (upgrade: Separate flags for safety instead of abusing force, 2019-09-27, openshift/oc#109), oc's client-side guards split --allow-explicit-upgrade (for updating to something not in availableUpdates) and --allow-upgrade-with-warnings (for updating despite client-side guards against Progressing and Degraded ClusterVersion conditions) away from --force, but again, never part of the ClusterVersion force property's CVO-side handling. Since then, the CVO grew precondition handlers in openshift/cluster-version-operator@aceb5bc66f (pkg: add precondition handlers and perform these checks before accepting Update, 2019-08-26, openshift/cluster-version-operator#243). syncOnce ran all the precondition checks, which at that point was just the Upgradeable ClusterOperator and ClusterVersion condition check. If that check failed, but 'force' was true, it would proceed with the update, while if 'force' was false, it would reject the update. In neither case would 'verified' be altered; it continued to only track the signature and "does this look like a real release image?" checks. The idea at the time was that these would fall under the "normal protections" phrasing (overriden by force) and not the "other forms of consistency checking" phrasing (whatever those were supposed to be).
These docs have not been adjusted since the properties landed in ab4ff93 (Update ClusterVersion to have a 'force' update flag and track verified, 2019-04-22, openshift#293). The initial cluster-version operator implementation only used them for signature checks, openshift/cluster-version-operator@f8eff25191 (sync: Report whether updates are verified and allow admin override, 2019-04-20, openshift/cluster-version-operator#170). At that time, RetrievePayload would always attempt to verify the proposed release image, including both signature validation and "does this look like a real release image?" checks. If that verification succeeded, it would set 'verified: true'. If that verification failed, it would look at 'force', and if 'force' was true, it would set 'verified: false' and proceed with the update, while if 'force' was false, it would reject the update. Despite the API docs, availableUpdates never came into this in the CVO code. The only availableUpdates guards are client-side in 'oc', and the API docs are discussion the ClusterVersion API, not 'oc adm upgrade's --force API. In openshift/oc@0501d04ff1 (upgrade: Separate flags for safety instead of abusing force, 2019-09-27, openshift/oc#109), oc's client-side guards split --allow-explicit-upgrade (for updating to something not in availableUpdates) and --allow-upgrade-with-warnings (for updating despite client-side guards against Progressing and Degraded ClusterVersion conditions) away from --force, but again, never part of the ClusterVersion force property's CVO-side handling. Since then, the CVO grew precondition handlers in openshift/cluster-version-operator@aceb5bc66f (pkg: add precondition handlers and perform these checks before accepting Update, 2019-08-26, openshift/cluster-version-operator#243). syncOnce ran all the precondition checks, which at that point was just the Upgradeable ClusterOperator and ClusterVersion condition check. If that check failed, but 'force' was true, it would proceed with the update, while if 'force' was false, it would reject the update. In neither case would 'verified' be altered; it continued to only track the signature and "does this look like a real release image?" checks. The idea at the time was that these would fall under the "normal protections" phrasing (overriden by force) and not the "other forms of consistency checking" phrasing (whatever those were supposed to be).
These docs have not been adjusted since the properties landed in ab4ff93 (Update ClusterVersion to have a 'force' update flag and track verified, 2019-04-22, openshift#293). The initial cluster-version operator implementation only used them for signature checks, openshift/cluster-version-operator@f8eff25191 (sync: Report whether updates are verified and allow admin override, 2019-04-20, openshift/cluster-version-operator#170). At that time, RetrievePayload would always attempt to verify the proposed release image, including both signature validation and "does this look like a real release image?" checks. If that verification succeeded, it would set 'verified: true'. If that verification failed, it would look at 'force', and if 'force' was true, it would set 'verified: false' and proceed with the update, while if 'force' was false, it would reject the update. Despite the API docs, availableUpdates never came into this in the CVO code. The only availableUpdates guards are client-side in 'oc', and the API docs are discussion the ClusterVersion API, not 'oc adm upgrade's --force API. In openshift/oc@0501d04ff1 (upgrade: Separate flags for safety instead of abusing force, 2019-09-27, openshift/oc#109), oc's client-side guards split --allow-explicit-upgrade (for updating to something not in availableUpdates) and --allow-upgrade-with-warnings (for updating despite client-side guards against Progressing and Degraded ClusterVersion conditions) away from --force, but again, never part of the ClusterVersion force property's CVO-side handling. Since then, the CVO grew precondition handlers in openshift/cluster-version-operator@aceb5bc66f (pkg: add precondition handlers and perform these checks before accepting Update, 2019-08-26, openshift/cluster-version-operator#243). syncOnce ran all the precondition checks, which at that point was just the Upgradeable ClusterOperator and ClusterVersion condition check. If that check failed, but 'force' was true, it would proceed with the update, while if 'force' was false, it would reject the update. In neither case would 'verified' be altered; it continued to only track the signature and "does this look like a real release image?" checks. The idea at the time was that these would fall under the "normal protections" phrasing (overriden by force) and not the "other forms of consistency checking" phrasing (whatever those were supposed to be).
Implement:
allowUnverifiedImagesIn CI, the payload would have a signing key public key mixed in to the payload (OKD, OCP CI). In OCP, we would mix in an official signing public key and the location of one or more stores to read from.
Questions identified during prototyping:
oc adm release info --verify, which it should (prevents the payload from being unreachable or inconsistent). However, we'd have to wire through the pull secret into the container that runs oc.--forcemakes me feel better, especially when we do automated updates. I would rather have all of our testing using a signature rather than only what end users see.TODO: