-
Notifications
You must be signed in to change notification settings - Fork 214
OCPBUGS-1694: pkg/cvo/availableupdates: Clear TLS config for *.cluster.local Cincinnati #927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nati
Sometimes folks point their ClusterVersion spec.upstream at a local
OpenShift Update Service running in the same cluster.
a. Our docs recommend using the UpdateService status.policyEngineURI
as the ClusterVersion spec.upstream [1].
b. The cluster-version operator loads Proxy information, as described
in [2]. Before this commit, it assumed that the set of trusted CAs
in the trusted-ca-bundle ConfigMap was the complete trust set, both
for requests through the proxy and requests to no-proxy endpoints.
c. The network operator's Proxy controller combined any user-request
trustedCA content with the system default trust bundle [3], but
this did not include the current Ingress wildcard certificate CA.
d. From our docs [4,5]:
The Ingress Operator generates a default certificate for an
Ingress Controller to serve as a placeholder until you configure
a custom default certificate. Do not use Operator-generated
default certificates in production clusters.
When cluster admins:
* Neglect the docs-recommended trusted Ingress certificate for (d),
e.g. because they are testing in a non-production channel, and
* Use the docs-recommended status.policyEngineURI for (a).
neither (c) nor (d) would recover for them, and the Cincinnati request
would fail on missing TLS/X.509 trust [6].
It seems like this would be a generic issue that could be sorted at
the network operator's (c), for all Proxy-consuming resources who
might want to reach back into the cluster via the Ingress router. But
as a cheap hack until something like that happens, users can use:
http://${POLICY_ENGINE_SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/api/upgrades_info/graph
to bypass TLS. Or, if they'd rather use encrypted communication, this
commit allows them to use:
https://${POLICY_ENGINE_SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/api/upgrades_info/graph
This commit breaks a few layers of abstraction:
* Hard-coding .cluster.local as a known no-proxy path, although that's
formally documented in [3].
* Neglecting similar handling for release signatures, because
getTransport -> HTTPClient -> loadConfigMapVerifierDataFromUpdate
doesn't currently allow for "but if the target URI is noProxy, skip
the TLS customization, and fall back to default trust stores".
* Requiring consumers to reach around status.policyEngineURI and talk
directly to the underlying local Service.
But it's only a few lines, even if it's incomplete, awkward, and
brittle. Folks concerned about any of those limitations can advocate
for changes to the network operator's (c) so there's a generic
decision around how to handle this kind of problem. Or they can
explain why the network operator's current handling is appropriate,
and only the cluster-version operator needs Ingress CA injection (and
if so, whether that should happen only for Cincinnati requests, or
also for signature requests).
[1]: https://docs.openshift.com/container-platform/4.12/updating/updating-restricted-network-cluster/restricted-network-update-osus.html#update-service-create-service-cli_updating-restricted-network-cluster-osus
[2]: https://github.com/openshift/enhancements/blob/6b3209fa18ab3161429743550eed36391efc785f/enhancements/proxy/global-cluster-egress-proxy.md
[3]: https://github.com/openshift/api/blob/1b2161d23365fb5918167b2ba73e90ff80ca1805/config/v1/types_proxy.go#L50-L58
[4]: https://docs.openshift.com/container-platform/4.11/security/certificate_types_descriptions/ingress-certificates.html#location
[5]: https://github.com/openshift/openshift-docs/blame/53aa6335eb28cdc9ac0888b05002b374342f7b1e/security/certificate_types_descriptions/ingress-certificates.adoc#L24
[6]: https://issues.redhat.com/browse/OCPBUGS-1694
|
@wking: Jira Issue OCPBUGS-1694 is in a security level that is not in the allowed security levels for this repo.
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/assign @LalatenduMohanty |
|
/hold Adding a hold because I want to review it before it is merged |
|
Code is clear but I need to think a bit about the impact ;) |
|
/cc |
|
/hold I want to double check a few things. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
lifecycle frozen |
|
/lifecycle frozen |
|
@petr-muller: The DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
| if strings.HasSuffix(upstreamURI.Hostname(), ".cluster.local") { | ||
| transport.TLSClientConfig = nil | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may instead of this "hack" utilize the ConfigMap default-ingress-cert from the openshift-config-managed namespace. For more information, here's the enhancement describing the ConfigMap (for even more curious folks, the Ingress certificates Documentation contains some general information and workflow).
From the enhancement:
The ingress operator publishes the default certificate of the default IngressController in a ConfigMap for other operators to consume. This ConfigMap is named default-ingress-cert and exists in the openshift-config-managed namespace. The intended consumers are other operators that need to incorporate the default certificate into their trust bundles in order to connect to Route resources.
Regarding a potential fix to our bug:
The standard way to validate a certificate is to verify that the certificate is signed by a trusted CA certificate. Consumers therefore may expect the default-ingress-cert ConfigMap to include the CA certificate that signed the default certificate rather than the default certificate itself.
For Go-based clients, this is not a problem as the Go TLS implementation has looser certificate validation that can be satisfied by configuring the certificate itself in the trusted certificates pool. As the ConfigMap is not intended to be used outside of OpenShift's own operators, which are Go-based, publishing the certificate itself should not pose a problem. Furthermore, the default-ingress-cert ConfigMap is an internal API, and to the extent that we document it at all, we should document that it has the default certificate, not the signing CA certificate.
So maybe we can configure the func (optr *Operator) getTLSConfig() (*tls.Config, error) method to include the default ingress CA bundle in the root CAs.
Something like this could be added to the method (not tested):
cm, err = optr.cmConfigManagedLister.Get("default-ingress-cert")
if apierrors.IsNotFound(err) {
return nil, nil
}
if err != nil {
return nil, err
}
if cm.Data["ca-bundle.crt"] != "" {
if ok := certPool.AppendCertsFromPEM([]byte(cm.Data["ca-bundle.crt"])); !ok {
return nil, fmt.Errorf("unable to add ca-bundle.crt certificates")
}
} else {
return nil, nil
}Testing the idea out using curl seems to work.
$ # setup env variables
$ NAMESPACE=openshift-update-service
$ NAME=sample
$ POLICY_ENGINE_GRAPH_URI="$(oc -n "${NAMESPACE}" get -o jsonpath='{.status.policyEngineURI}/api/upgrades_info/v1/graph{"\n"}' updateservice "${NAME}")"
$
$ # extract the default ingress CA bundle to the local machine
$ oc extract -n openshift-config-managed configmap/default-ingress-cert
$
$ # copy the default ingress's CA bundle to the CVO container
$ oc cp ca-bundle.crt "openshift-cluster-version/$(oc get po -n openshift-cluster-version -o=name | sed "s/pod\///g"):/tmp/default-ingress-cert-ca-bundle.crt"
$
$ # running curl without specifying a CA bundle - self signed certificate in certificate chain error
$ oc exec -n openshift-cluster-version deployments/cluster-version-operator -- curl "$POLICY_ENGINE_GRAPH_URI?channel=stable-4.10"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (60) SSL certificate problem: self signed certificate in certificate chain
More details here: https://curl.haxx.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
command terminated with exit code 60
$
$ # running curl using the default ingress CA bundle - success
$ oc exec -n openshift-cluster-version deployments/cluster-version-operator -- curl --cacert /tmp/default-ingress-cert-ca-bundle.crt "$POLICY_ENGINE_GRAPH_URI?channel=stable-4.10"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0{"version":1,"nodes":[{"version":"4.9.28","payload":"quay.io/openshift-release-dev/ocp-release@sha256:4084d94969b186e20189649b5affba7da59f7d1943e4e5bc7ef78b981eafb7a8","metadata":{"io.openshift.upgrades.graph.release.channels":"candidate-4.10,fast-4.10,stable-4.10,candidate-4.9,fast-4.9,stable-4.9","io.openshift.upgrades.graph.release.manifestref":"sha256:4084d94969b186e20189649b5affba7da59f7d1943e4e5bc7ef78b981eafb7a8","url":"https://access.redhat.com/errata/RHBA-2022:1245"}},{"version":"4.10.20","payload":"quay.io/openshift-release-dev/ocp-release@sha256:b89ada9261a1b257012469e90d7d4839d0d2f99654f5ce76394fa3f06522b600","metadata":{"io.openshift.upgrades.graph.release.channels":"candidate-4.10,eus-4.10,fast-4.10,stable-4.10,candidate-4.11,fast-4.11,stable-4.11,eus-4.12","io.openshift.upgrades.graph.release.manifestref":"sha256:b89ada9261a1b257012469e90d7d4839d0d2f99654f5ce76394fa3f06522b600","url":"https://access.redhat.com/errata/RHBA-2022:5172"}},{"version":"4.9.21","payload":"quay.io/openshift-release-dev/ocp-release@sha256:fd96300600f9585e5847f5855ca14e2b3cafbce12aefe3b3f52c5da10c4476eb","metadata":{"io.openshift.upgrades.graph.previous.remove_regex":"4\\.8\\..*","io.openshift.upgrades.graph.release.channels":"candidate-4.10,fast-4.10,stable-4.10,candidate-4.9...Worth pointing out from the commit message:
It seems like this would be a generic issue that could be sorted at the network operator's (c), for all Proxy-consuming resources who might want to reach back into the cluster via the Ingress router. But as a cheap hack until something like that happens, users can use:
http://${POLICY_ENGINE_SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/api/upgrades_info/graphto bypass TLS. Or, if they'd rather use encrypted communication, this commit allows them to use:
https://${POLICY_ENGINE_SERVICE_NAME}.${NAMESPACE}.svc.cluster.local/api/upgrades_info/graph
We have lost the ability to resolve service DNS names in #920. Unfortunately, this wouldn't probably work now 😢.
$ oc exec -n openshift-cluster-version deployments/cluster-version-operator -- curl "https://sample-policy-engine.openshift-update-service.svc.cluster.local:80/api/upgrades_info/graph?channel=stable-4.10"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: sample-policy-engine.openshift-update-service.svc.cluster.local
command terminated with exit code 6|
@wking: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Sometimes folks point their ClusterVersion
spec.upstreamat a local OpenShift Update Service running in the same cluster.a. Our docs recommend using the UpdateService
status.policyEngineURIin the ClusterVersionspec.upstream.b. The cluster-version operator loads Proxy information, as described here. Before this commit, it assumed that the set of trusted CAs in the trusted-ca-bundle ConfigMap was the complete trust set, both for requests through the proxy and requests to no-proxy endpoints.
c. The network operator's Proxy controller combined any user-request
trustedCAcontent with the system default trust bundle, but this did not include the current Ingress wildcard certificate CA.d. From our docs:
When cluster admins:
status.policyEngineURIfor (a).neither (c) nor (d) would recover for them, and the Cincinnati request would fail on missing TLS/X.509 trust.
It seems like this would be a generic issue that could be sorted at the network operator's (c), for all Proxy-consuming resources who might want to reach back into the cluster via the Ingress router. But as a cheap hack until something like that happens, users can use:
to bypass TLS. Or, if they'd rather use encrypted communication, this commit allows them to use:
This commit breaks a few layers of abstraction:
.cluster.localas a known no-proxy path, although that's formally documented here.getTransport->HTTPClient->loadConfigMapVerifierDataFromUpdatedoesn't currently allow for "but if the target URI isnoProxy, skip the TLS customization, and fall back to default trust stores".status.policyEngineURIand talk directly to the underlying local Service.But it's only a few lines, even if it's incomplete, awkward, and brittle. Folks concerned about any of those limitations can advocate for changes to the network operator's (c) so there's a generic decision around how to handle this kind of problem. Or they can explain why the network operator's current handling is appropriate, and only the cluster-version operator needs Ingress CA injection (and if so, whether that should happen only for Cincinnati requests, or also for signature requests).