OCPBUGS-1694: pkg/cvo/availableupdates: Clear TLS config for *.cluster.local Cincinnati #927

wking · 2023-04-17T23:18:54Z

Sometimes folks point their ClusterVersion spec.upstream at a local OpenShift Update Service running in the same cluster.

a. Our docs recommend using the UpdateService status.policyEngineURI in the ClusterVersion spec.upstream.

b. The cluster-version operator loads Proxy information, as described here. Before this commit, it assumed that the set of trusted CAs in the trusted-ca-bundle ConfigMap was the complete trust set, both for requests through the proxy and requests to no-proxy endpoints.

c. The network operator's Proxy controller combined any user-request trustedCA content with the system default trust bundle, but this did not include the current Ingress wildcard certificate CA.

d. From our docs:

The Ingress Operator generates a default certificate for an Ingress Controller to serve as a placeholder until you configure a custom default certificate. Do not use Operator-generated default certificates in production clusters.

When cluster admins:

Neglect the docs-recommended trusted Ingress certificate for (d), e.g. because they are testing in a non-production channel, and
Use the docs-recommended status.policyEngineURI for (a).

neither (c) nor (d) would recover for them, and the Cincinnati request would fail on missing TLS/X.509 trust.

It seems like this would be a generic issue that could be sorted at the network operator's (c), for all Proxy-consuming resources who might want to reach back into the cluster via the Ingress router. But as a cheap hack until something like that happens, users can use:

http://${POLICY_ENGINE_SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/api/upgrades_info/graph

to bypass TLS. Or, if they'd rather use encrypted communication, this commit allows them to use:

https://${POLICY_ENGINE_SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/api/upgrades_info/graph

This commit breaks a few layers of abstraction:

Hard-coding .cluster.local as a known no-proxy path, although that's formally documented here.
Neglecting similar handling for release signatures, because getTransport -> HTTPClient -> loadConfigMapVerifierDataFromUpdate doesn't currently allow for "but if the target URI is noProxy, skip the TLS customization, and fall back to default trust stores".
Requiring consumers to reach around status.policyEngineURI and talk directly to the underlying local Service.

But it's only a few lines, even if it's incomplete, awkward, and brittle. Folks concerned about any of those limitations can advocate for changes to the network operator's (c) so there's a generic decision around how to handle this kind of problem. Or they can explain why the network operator's current handling is appropriate, and only the cluster-version operator needs Ingress CA injection (and if so, whether that should happen only for Cincinnati requests, or also for signature requests).

…nati Sometimes folks point their ClusterVersion spec.upstream at a local OpenShift Update Service running in the same cluster. a. Our docs recommend using the UpdateService status.policyEngineURI as the ClusterVersion spec.upstream [1]. b. The cluster-version operator loads Proxy information, as described in [2]. Before this commit, it assumed that the set of trusted CAs in the trusted-ca-bundle ConfigMap was the complete trust set, both for requests through the proxy and requests to no-proxy endpoints. c. The network operator's Proxy controller combined any user-request trustedCA content with the system default trust bundle [3], but this did not include the current Ingress wildcard certificate CA. d. From our docs [4,5]: The Ingress Operator generates a default certificate for an Ingress Controller to serve as a placeholder until you configure a custom default certificate. Do not use Operator-generated default certificates in production clusters. When cluster admins: * Neglect the docs-recommended trusted Ingress certificate for (d), e.g. because they are testing in a non-production channel, and * Use the docs-recommended status.policyEngineURI for (a). neither (c) nor (d) would recover for them, and the Cincinnati request would fail on missing TLS/X.509 trust [6]. It seems like this would be a generic issue that could be sorted at the network operator's (c), for all Proxy-consuming resources who might want to reach back into the cluster via the Ingress router. But as a cheap hack until something like that happens, users can use: http://${POLICY_ENGINE_SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/api/upgrades_info/graph to bypass TLS. Or, if they'd rather use encrypted communication, this commit allows them to use: https://${POLICY_ENGINE_SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/api/upgrades_info/graph This commit breaks a few layers of abstraction: * Hard-coding .cluster.local as a known no-proxy path, although that's formally documented in [3]. * Neglecting similar handling for release signatures, because getTransport -> HTTPClient -> loadConfigMapVerifierDataFromUpdate doesn't currently allow for "but if the target URI is noProxy, skip the TLS customization, and fall back to default trust stores". * Requiring consumers to reach around status.policyEngineURI and talk directly to the underlying local Service. But it's only a few lines, even if it's incomplete, awkward, and brittle. Folks concerned about any of those limitations can advocate for changes to the network operator's (c) so there's a generic decision around how to handle this kind of problem. Or they can explain why the network operator's current handling is appropriate, and only the cluster-version operator needs Ingress CA injection (and if so, whether that should happen only for Cincinnati requests, or also for signature requests). [1]: https://docs.openshift.com/container-platform/4.12/updating/updating-restricted-network-cluster/restricted-network-update-osus.html#update-service-create-service-cli_updating-restricted-network-cluster-osus [2]: https://github.com/openshift/enhancements/blob/6b3209fa18ab3161429743550eed36391efc785f/enhancements/proxy/global-cluster-egress-proxy.md [3]: https://github.com/openshift/api/blob/1b2161d23365fb5918167b2ba73e90ff80ca1805/config/v1/types_proxy.go#L50-L58 [4]: https://docs.openshift.com/container-platform/4.11/security/certificate_types_descriptions/ingress-certificates.html#location [5]: https://github.com/openshift/openshift-docs/blame/53aa6335eb28cdc9ac0888b05002b374342f7b1e/security/certificate_types_descriptions/ingress-certificates.adoc#L24 [6]: https://issues.redhat.com/browse/OCPBUGS-1694

openshift-ci-robot · 2023-04-17T23:18:56Z

@wking: Jira Issue OCPBUGS-1694 is in a security level that is not in the allowed security levels for this repo.
Allowed security levels for this repo are:

Red Hat Employee
default

Details

In response to this:

Sometimes folks point their ClusterVersion spec.upstream at a local OpenShift Update Service running in the same cluster.

a. Our docs recommend using the UpdateService status.policyEngineURI in the ClusterVersion spec.upstream.

b. The cluster-version operator loads Proxy information, as described here. Before this commit, it assumed that the set of trusted CAs in the trusted-ca-bundle ConfigMap was the complete trust set, both for requests through the proxy and requests to no-proxy endpoints.

c. The network operator's Proxy controller combined any user-request trustedCA content with the system default trust bundle, but this did not include the current Ingress wildcard certificate CA.

d. From our docs:

The Ingress Operator generates a default certificate for an Ingress Controller to serve as a placeholder until you configure a custom default certificate. Do not use Operator-generated default certificates in production clusters.

When cluster admins:

Neglect the docs-recommended trusted Ingress certificate for (d), e.g. because they are testing in a non-production channel, and

Use the docs-recommended status.policyEngineURI for (a).

neither (c) nor (d) would recover for them, and the Cincinnati request would fail on missing TLS/X.509 trust.

It seems like this would be a generic issue that could be sorted at the network operator's (c), for all Proxy-consuming resources who might want to reach back into the cluster via the Ingress router. But as a cheap hack until something like that happens, users can use:

http://${POLICY_ENGINE_SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/api/upgrades_info/graph

to bypass TLS. Or, if they'd rather use encrypted communication, this commit allows them to use:

https://${POLICY_ENGINE_SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/api/upgrades_info/graph

This commit breaks a few layers of abstraction:

Hard-coding .cluster.local as a known no-proxy path, although that's formally documented here.

Neglecting similar handling for release signatures, because getTransport -> HTTPClient -> loadConfigMapVerifierDataFromUpdate doesn't currently allow for "but if the target URI is noProxy, skip the TLS customization, and fall back to default trust stores".

Requiring consumers to reach around status.policyEngineURI and talk directly to the underlying local Service.

But it's only a few lines, even if it's incomplete, awkward, and brittle. Folks concerned about any of those limitations can advocate for changes to the network operator's (c) so there's a generic decision around how to handle this kind of problem. Or they can explain why the network operator's current handling is appropriate, and only the cluster-version operator needs Ingress CA injection (and if so, whether that should happen only for Cincinnati requests, or also for signature requests).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2023-04-17T23:19:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

LalatenduMohanty · 2023-04-18T13:53:52Z

/assign @LalatenduMohanty

LalatenduMohanty · 2023-04-18T13:54:13Z

/hold Adding a hold because I want to review it before it is merged

petr-muller · 2023-04-18T15:33:07Z

Code is clear but I need to think a bit about the impact ;)

DavidHurta · 2023-04-19T17:41:37Z

/cc

DavidHurta · 2023-05-01T09:18:21Z

/hold I want to double check a few things.

openshift-bot · 2023-07-31T01:00:41Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

LalatenduMohanty · 2023-08-01T16:55:11Z

/remove-lifecycle stale

LalatenduMohanty · 2023-08-01T16:55:22Z

lifecycle frozen

petr-muller · 2023-08-01T19:20:10Z

/lifecycle frozen

openshift-ci · 2023-08-01T19:20:13Z

@petr-muller: The lifecycle/frozen label cannot be applied to Pull Requests.

Details

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

DavidHurta · 2023-08-30T16:03:50Z

pkg/cvo/availableupdates.go

+	if strings.HasSuffix(upstreamURI.Hostname(), ".cluster.local") {
+		transport.TLSClientConfig = nil
+	}
+


We may instead of this "hack" utilize the ConfigMap default-ingress-cert from the openshift-config-managed namespace. For more information, here's the enhancement describing the ConfigMap (for even more curious folks, the Ingress certificates Documentation contains some general information and workflow).

From the enhancement:

The ingress operator publishes the default certificate of the default IngressController in a ConfigMap for other operators to consume. This ConfigMap is named default-ingress-cert and exists in the openshift-config-managed namespace. The intended consumers are other operators that need to incorporate the default certificate into their trust bundles in order to connect to Route resources.

Regarding a potential fix to our bug:

The standard way to validate a certificate is to verify that the certificate is signed by a trusted CA certificate. Consumers therefore may expect the default-ingress-cert ConfigMap to include the CA certificate that signed the default certificate rather than the default certificate itself.

For Go-based clients, this is not a problem as the Go TLS implementation has looser certificate validation that can be satisfied by configuring the certificate itself in the trusted certificates pool. As the ConfigMap is not intended to be used outside of OpenShift's own operators, which are Go-based, publishing the certificate itself should not pose a problem. Furthermore, the default-ingress-cert ConfigMap is an internal API, and to the extent that we document it at all, we should document that it has the default certificate, not the signing CA certificate.

So maybe we can configure the func (optr *Operator) getTLSConfig() (*tls.Config, error) method to include the default ingress CA bundle in the root CAs.

Something like this could be added to the method (not tested):

cm, err = optr.cmConfigManagedLister.Get("default-ingress-cert") if apierrors.IsNotFound(err) { return nil, nil } if err != nil { return nil, err } if cm.Data["ca-bundle.crt"] != "" { if ok := certPool.AppendCertsFromPEM([]byte(cm.Data["ca-bundle.crt"])); !ok { return nil, fmt.Errorf("unable to add ca-bundle.crt certificates") } } else { return nil, nil }

Testing the idea out using curl seems to work.

$ # setup env variables $ NAMESPACE=openshift-update-service $ NAME=sample $ POLICY_ENGINE_GRAPH_URI="$(oc -n "${NAMESPACE}" get -o jsonpath='{.status.policyEngineURI}/api/upgrades_info/v1/graph{"\n"}' updateservice "${NAME}")" $ $ # extract the default ingress CA bundle to the local machine $ oc extract -n openshift-config-managed configmap/default-ingress-cert $ $ # copy the default ingress's CA bundle to the CVO container $ oc cp ca-bundle.crt "openshift-cluster-version/$(oc get po -n openshift-cluster-version -o=name | sed "s/pod\///g"):/tmp/default-ingress-cert-ca-bundle.crt" $ $ # running curl without specifying a CA bundle - self signed certificate in certificate chain error $ oc exec -n openshift-cluster-version deployments/cluster-version-operator -- curl "$POLICY_ENGINE_GRAPH_URI?channel=stable-4.10" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: (60) SSL certificate problem: self signed certificate in certificate chain More details here: https://curl.haxx.se/docs/sslcerts.html curl failed to verify the legitimacy of the server and therefore could not establish a secure connection to it. To learn more about this situation and how to fix it, please visit the web page mentioned above. command terminated with exit code 60 $ $ # running curl using the default ingress CA bundle - success $ oc exec -n openshift-cluster-version deployments/cluster-version-operator -- curl --cacert /tmp/default-ingress-cert-ca-bundle.crt "$POLICY_ENGINE_GRAPH_URI?channel=stable-4.10" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0{"version":1,"nodes":[{"version":"4.9.28","payload":"quay.io/openshift-release-dev/ocp-release@sha256:4084d94969b186e20189649b5affba7da59f7d1943e4e5bc7ef78b981eafb7a8","metadata":{"io.openshift.upgrades.graph.release.channels":"candidate-4.10,fast-4.10,stable-4.10,candidate-4.9,fast-4.9,stable-4.9","io.openshift.upgrades.graph.release.manifestref":"sha256:4084d94969b186e20189649b5affba7da59f7d1943e4e5bc7ef78b981eafb7a8","url":"https://access.redhat.com/errata/RHBA-2022:1245"}},{"version":"4.10.20","payload":"quay.io/openshift-release-dev/ocp-release@sha256:b89ada9261a1b257012469e90d7d4839d0d2f99654f5ce76394fa3f06522b600","metadata":{"io.openshift.upgrades.graph.release.channels":"candidate-4.10,eus-4.10,fast-4.10,stable-4.10,candidate-4.11,fast-4.11,stable-4.11,eus-4.12","io.openshift.upgrades.graph.release.manifestref":"sha256:b89ada9261a1b257012469e90d7d4839d0d2f99654f5ce76394fa3f06522b600","url":"https://access.redhat.com/errata/RHBA-2022:5172"}},{"version":"4.9.21","payload":"quay.io/openshift-release-dev/ocp-release@sha256:fd96300600f9585e5847f5855ca14e2b3cafbce12aefe3b3f52c5da10c4476eb","metadata":{"io.openshift.upgrades.graph.previous.remove_regex":"4\\.8\\..*","io.openshift.upgrades.graph.release.channels":"candidate-4.10,fast-4.10,stable-4.10,candidate-4.9...

Worth pointing out from the commit message:

It seems like this would be a generic issue that could be sorted at the network operator's (c), for all Proxy-consuming resources who might want to reach back into the cluster via the Ingress router. But as a cheap hack until something like that happens, users can use:

http://${POLICY_ENGINE_SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/api/upgrades_info/graph

to bypass TLS. Or, if they'd rather use encrypted communication, this commit allows them to use:

https://${POLICY_ENGINE_SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/api/upgrades_info/graph

We have lost the ability to resolve service DNS names in #920. Unfortunately, this wouldn't probably work now 😢.

$ oc exec -n openshift-cluster-version deployments/cluster-version-operator -- curl "https://sample-policy-engine.openshift-update-service.svc.cluster.local:80/api/upgrades_info/graph?channel=stable-4.10" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: sample-policy-engine.openshift-update-service.svc.cluster.local command terminated with exit code 6

openshift-ci · 2023-09-26T14:29:14Z

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2023-12-26T01:00:28Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2024-01-25T08:30:14Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2024-02-25T00:00:24Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2024-02-25T00:01:14Z

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot requested review from LalatenduMohanty and petr-muller April 17, 2023 23:19

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 17, 2023

openshift-ci bot assigned LalatenduMohanty Apr 18, 2023

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 18, 2023

openshift-ci bot requested a review from DavidHurta April 19, 2023 17:41

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 31, 2023

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 1, 2023

DavidHurta suggested changes Aug 30, 2023

View reviewed changes

openshift-ci bot assigned DavidHurta Aug 30, 2023

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2023

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 25, 2024

openshift-ci bot closed this Feb 25, 2024

OCPBUGS-1694: pkg/cvo/availableupdates: Clear TLS config for *.cluster.local Cincinnati #927

OCPBUGS-1694: pkg/cvo/availableupdates: Clear TLS config for *.cluster.local Cincinnati #927

Uh oh!

Conversation

wking commented Apr 17, 2023

Uh oh!

openshift-ci-robot commented Apr 17, 2023

Uh oh!

openshift-ci bot commented Apr 17, 2023

Uh oh!

LalatenduMohanty commented Apr 18, 2023

Uh oh!

LalatenduMohanty commented Apr 18, 2023

Uh oh!

petr-muller commented Apr 18, 2023

Uh oh!

DavidHurta commented Apr 19, 2023

Uh oh!

DavidHurta commented May 1, 2023

Uh oh!

openshift-bot commented Jul 31, 2023

Uh oh!

LalatenduMohanty commented Aug 1, 2023

Uh oh!

LalatenduMohanty commented Aug 1, 2023

Uh oh!

petr-muller commented Aug 1, 2023

Uh oh!

openshift-ci bot commented Aug 1, 2023

Uh oh!

DavidHurta Aug 30, 2023

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Sep 26, 2023

Uh oh!

openshift-bot commented Dec 26, 2023

Uh oh!

openshift-bot commented Jan 25, 2024

Uh oh!

openshift-bot commented Feb 25, 2024

Uh oh!

openshift-ci bot commented Feb 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants