Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented May 15, 2020

Attempting to address the following use-cases:

As a narrow, CVO-upstream-only alternative to the generic #115 framework.

@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label May 15, 2020
@openshift-ci-robot
Copy link

@wking: This pull request references Bugzilla bug 1773419, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1773419: enhancements/update/cluster-version-operator-x509-upstream-trust: Propose a new enhancement

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label May 15, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wking
To complete the pull request process, please assign jwforres
You can assign the PR to them by writing /assign @jwforres in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking wking force-pushed the cluster-version-operator-x509-trust branch 2 times, most recently from 2a4a331 to 001a116 Compare May 15, 2020 06:11

# Cluster-version Operator X.509 Upstream Trust

This enhancement provides a mechanism for providing [the cluster-version operator][cluster-version-operator] (CVO) with an alternative trust-store and TLS configuration to use when connecting to [the update recommendation service][update-service] that overrides both the CVO's defaults and any [proxy configuration](../proxy/global-cluster-egress-proxy.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would it override the proxy configuration? if the proxy config is applicable (the host isn't in no_proxy) it should be used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the proxy config is applicable (the host isn't in no_proxy) it should be used

This endpoint-level config seems more specific than the cluster-scoped Proxy config. If the user wants the Proxy config to apply, they can unset this config, right? But if the Proxy config took precedence, there would be no chance for the user to override for this specific endpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they want the endpoint specific CAs to take precedence, they should be listed in "noproxy". If the connection is going through the proxy, then the proxy CAs need to be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the connection is going through the proxy, then the proxy CAs need to be used.

How do we determine this, in a world where transparent, network-defined proxies may be set? Say you are in a cluster where upstreamTLS.trustedCA is set in ClusterVersion and Proxy has trustedCA but neither httpProxy nor httpsProxy. Which CA would you use connecting to an HTTPS upstream? I'd expect upstreamTLS.trustedCA, because why would the admin have set that unless they expected us to use it? Are you saying that, in the absence of httpProxy (for HTTP connections) or httpsProxy (for HTTPS connections), components should still be checking for their target in noProxy and say "I don't see my target in noProxy, so I guess there is some transparent network routing taking me to a proxy, so I'll attach this additional trusted CA bundle from the Proxy's trustedCA..."?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the admin configured a proxy ca for the cluster, you use it on the assumption that they know what they are doing and they put it there for a reason.

yes that means it becomes a valid CA for all connections, that is the power/danger of the proxy CA. (As consumed by most components today, those CAs will be considered valid even for hosts in no_proxy because they just add the proxy CAs to their bundle, so whether or not their outbound connection goes through the proxy, the CA is in the transport)

@deads2k am i missing something? do we need to be even more locked down on our use of proxy ca than we are today, such as to ensure that connections to "no proxy" hosts don't utilize the proxy CAs? I'm sure it would be ideal, it also seems extremely difficult.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking Why can't we use this enhancement only when network-defined proxies are not available. Any in case of proxy setups CVO going to an upstream Cincy is not an issue. So if Proxy takes the precedence over the endpoint specific CA it is not a blocker IMO.


This enhancement does not address signature retrieval.
Currently [the configured stores][update-keys] are hard-coded, so the CVO will only talk to an alternative TLS terminator when it is being [proxied](../proxy/global-cluster-egress-proxy.md).
The fact that the current proxy configuration does not clearly address proxies defined at the network-routing level, where `httpsProxy` may be empty but additional trust still needs to be injected is out of scope.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not "clearly" addressed, but i believe the answer is "set a dummy noproxy value, and set the proxy CAs, in the proxy config"

#### Transparent, network-level proxies

[rhbz#1773419][rhbz1773419] discusses issue when CVO upstream requests are not redirected via an `httpsProxy` setting, but are instead routed through a proxy via network-level configuration (e.g. all outgoing TCP traffic is routed through a proxy).
The CVO currently only injects the Proxy config's `trustedCA` data [when `httpsProxy` is non-empty][trustedCA-vs-httpsProxy], so there is currently no mechanism for configuring the CVO's trusted CA store in this situation, short of forcing an explicit value into `httpsProxy`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CVO ought to consume the proxy CAs as long as they are defined, regardless of whether httpsProxy is set or not. That's a bug in the CVO's proxy config handling behavior. (orthogonal to the use case being solved by this enhancement)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CVO ought to consume the proxy CAs as long as they are defined, regardless of whether httpsProxy is set or not

I'm in favor of using the Proxy's trustedCA for transparent, network-level proxies. But that has been historically unpopular. If you think we can build a consensus around consuming trustedCA regardless of httpProxy or httpsProxy, I'm happy to cover this case that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that issue was closed because it was trying to use a field called "AdditionalCAs" in a way that wasn't well defined/aligned w/ the original intention. I don't think it's closure really means "we should never use proxy CAs for network proxies", it just means "the installer needs to make it clear which CAs are for proxies and which CAs are for "additional use".

The concern w/ that PR is that it was taking a field that was previously only consumed for proxy CAs when a proxy config was defined, and now always consuming it, when the field doc did not make that usage clear. It really had nothing to do with how the proxy CAs+config operated, just how the installer turned its inputs into a proxy CA/config.


The network operator supports copying trust bundles between ConfigMaps based on labels on the target ConfigMaps.
But because we are not using the Proxy configuration, the CVO will need to grow its own implementation to retrieve data from the referenced ConfigMap in the `openshift-config` namespace.
Because it is the CVO retrieving and consuming the data, there is no need to copy a ConfigMap over to volume-mount into the CVO pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need more details on how you'll do this. you need to copy the CA content over before you initialize your TLS transport that will be talking to the service in question. (not too hard, just want to make sure that's accounted for).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... you need to copy the CA content over before you initialize your TLS transport that will be talking to the service in question...

CVO does this today, using the ConfigMap referenced from Proxy's trustedCA (when httpsProxy is set). We'd just have to twiddle the existing logic slightly to slot the new properties in at higher precedence. Added links to that code to the enhancement with 47f96bb -> c5ba370.

Copy link
Member

@mhrivnak mhrivnak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. This achieves what we are looking for, which is the ability to run cincinnati on a management cluster, and then configure a number of spoke clusters to use it.

// Intermediate profiles are currently supported, and the maximum available MinTLSVersions
// is VersionTLS12.
// +optional
SecurityProfile *TLSSecurityProfile `json:"securityProfile,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the CVO need a different security profile than the rest of the platform? Our existing value in apiservers controls all ingresses, auth, kube-apiserver, and perhaps kubelets soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the CVO need a different security profile than the rest of the platform?

Say the local Cincy uses some weak cypher. Wouldn't we want a way to relax the CVO for Cincy connections without impacting its other connections? This seems like exactly the same sort of reasoning behind wanting per-target trust bundles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should not consider Cincinnati to CVO communication as other ingress communication in OpenShift.

@wking wking force-pushed the cluster-version-operator-x509-trust branch 5 times, most recently from 57900f5 to 519ea4d Compare May 15, 2020 16:50
…pose a new enhancement

Attempting to address the following use-cases:

* CVO pointed at a local Cincinnati (or other update recommendation
  service) signed by some local CA that the CVO does not trust out of
  the box.

* CVO routed through a TLS-terminating proxy signed by some local CA,
  but without the httpsProxy Proxy entry that would cause the CVO to
  use the Proxy's trustedCA [1].

As a narrow, CVO-upstream-only alternative to the generic openshift#115 framework.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1773419#c1
@wking wking force-pushed the cluster-version-operator-x509-trust branch from 519ea4d to faa4a20 Compare May 15, 2020 16:53
@rthallisey
Copy link

lgtm. Thanks @wking

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2020
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 23, 2020
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

@wking: This pull request references Bugzilla bug 1773419. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state.

Details

In response to this:

Bug 1773419: enhancements/update/cluster-version-operator-x509-upstream-trust: Propose a new enhancement

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants