Skip to content

Conversation

@ahardin-rh
Copy link
Contributor

@ahardin-rh ahardin-rh commented Nov 21, 2019

@ahardin-rh ahardin-rh added this to the Future Release milestone Nov 21, 2019
@ahardin-rh ahardin-rh self-assigned this Nov 21, 2019
@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 21, 2019
@openshift-docs-preview-bot

The preview will be available shortly at:

@ahardin-rh ahardin-rh force-pushed the OCP-certificates branch 2 times, most recently from 60c4d11 to ae29cbf Compare December 5, 2019 21:25
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k Can you please help me build out this section? The customer is asking for the information outlined here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k Can you please help provide information by end of week? The information we are needing for monitoring certificates includes:

  • The purpose
  • File path
  • Default expiration term
  • How to set custom expiration term
  • How to specify the expiration date of all certificates used in OpenShift when installing OpenShift
  • Which services use it
  • How to update/extend it
  • Will certificates that are about to expire be automatically renewed by the operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about any monitoring certificates. Perhaps @s-urbaniak ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some amount of client cert monitoring done by the kube api and we have alerts on this: https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/a7ee9d1abe1b1a3670a02ede1135cadb660b9d0c/alerts/kube_apiserver.libsonnet#L125-L148

I don't believe there is any serving certs monitoring, as this is typically done via blackbox probes which is not something the monitoring team has available as part of the monitoring framework today (in the same way as teams can self serve scraping/alerting with the Prometheus Operator CRDs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brancz how do we document that? Is there a way to pull/show what Prometheus Operator CRDs rules the individual operators are providing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brancz Can you please comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rphillips @sjenning Can you please review what is here for node certs so far? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rphillips @sjenning Can you please provide feedback by end of week?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanchezl Can you please review what I have in this section so far?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanchezl Can you please review by end of week?

@ahardin-rh ahardin-rh force-pushed the OCP-certificates branch 2 times, most recently from 64ca725 to 47a8df6 Compare December 18, 2019 21:11
Copy link
Contributor

@adellape adellape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor drive-by comments from a cursory read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we talk about the CA's and the validity of these? How long they are valid for; how to rotate them, etc?

@rphillips @sjenning can either of you comment on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rphillips @sjenning Can you please comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahardin-rh this is correct. We might want to add that once the cluster is installed the Node certificates are auto-rotated.

@xingxingxia
Copy link
Contributor

xingxingxia commented Apr 1, 2020

@xingxingxia Can you please review the latest changes? Thank you!

@ahardin-rh what do you mean by "the latest changes"? Do you mean the change for #18254 (comment) ? If yes, I reviewed http://file.rdu.redhat.com/~ahardin/12052019/OCP-certificates/authentication/certificate-types-descriptions.html#control-plane-certificates, it lgtm. If for the other rest of the PR content, still like #18254 (comment) , need other subteam QE who are familiar for the owned area. Update: @ahardin-rh , I already @'ed them in Slack, they acked to review as soon as they can.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding to 'metric-signer', I can't find it in the right dir, pls help to correct me if any misunderstanding:
sh-4.2# cd /etc/ssl/
sh-4.2# ls
certs etcd
sh-4.2# cd etcd/
sh-4.2# ls
ca.crt system:etcd-peer:etcd-0.xxx.qe.gcp.devcluster.openshift.com.crt
metric-ca.crt system:etcd-peer:etcd-0.xx.qe.gcp.devcluster.openshift.com.key
root-ca.crt system:etcd-server:etcd-0.xx.qe.gcp.devcluster.openshift.com.crt
system:etcd-metric:etcd-0.xxx.qe.gcp.devcluster.openshift.com.crt system:etcd-server:etcd-0.xxx.qe.gcp.devcluster.openshift.com.key
system:etcd-metric:etcd-0.xxx.qe.gcp.devcluster.openshift.com.key
sh-4.2#

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hexfusion Can you please review?

cc @pweil- @deads2k

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doc for ocp 4.3 only or for 4.4? if it only for 4.3, it should be ok, regarding to 4.4, there is not dir: /etc/ssl/etcd directory.
sh-4.2# cd /etc/ssl
sh-4.2# ls
certs
sh-4.2# cd certs/
sh-4.2# ls
ca-bundle.crt ca-bundle.trust.crt

@sunilcio
Copy link

sunilcio commented Apr 2, 2020

Node certificates section LGTM for 4.4 , http://file.rdu.redhat.com/~ahardin/12052019/OCP-certificates/authentication/certificate-types-descriptions.html#node-certificates_ocp-certificates

@ahardin-rh I see bug 1800636 is targeting 4.3.z release, I see auto-rotation is implemented from 4.4. Please help clarify.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good summary of what @ecordell and @awgreene have described.

If there are no objections to the wording I'm lgtm on this 👍

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Proxy certificates should be provided by the user, not very clear about managed by the system mean here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danehans Can you please confirm? I may have gotten details mixed up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahardin-rh the RHCOS trust bundle is managed by the system and the Proxy resource is used to add user-provided certs to the trust bundle. Cluster Network Operator merges the the two into a combined bundle and operators mount the bundle into their trust store.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detailed renewal steps of Proxy cert are actually noted in the above Customization section:

Updating the user-provided trust bundle consists of either:

updating the PEM-encoded certificates in the ConfigMap referenced by trustedCA, or
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danehans Can you please confirm?

@lihongan
Copy link

lihongan commented Apr 3, 2020

The section Ingress certificates looks good. Thank you @ahardin-rh

@ahardin-rh
Copy link
Contributor Author

Node certificates section LGTM for 4.4 , http://file.rdu.redhat.com/~ahardin/12052019/OCP-certificates/authentication/certificate-types-descriptions.html#node-certificates_ocp-certificates

@ahardin-rh I see bug 1800636 is targeting 4.3.z release, I see auto-rotation is implemented from 4.4. Please help clarify.

@sunilcio Thank you! It is my understanding that all of the content in this PR is applicable to 4.3.z. For example, I know that service CA auto-rotation is available as of 4.3.5.

cc @pweil-

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/openshift-ingress-it /openshift-ingress-operator/

Copy link
Contributor

@danehans danehans Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the rest of the text in this section. This looks like how MCO implements proxy trustedCA. Maybe the text should go into a new section such as modules/machine-certificates.adoc? cc: @runcom

The mechanism operators use for writing the trust bundle consists of:

  • The operator requests trust bundle injection by creating a ConfigMap in the operator's namespace with label config.openshift.io/inject-trusted-cabundle: "true". Here's an example of the Ingress Operator:
apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    release.openshift.io/create-only: "true"
  labels:
    config.openshift.io/inject-trusted-cabundle: "true"
  name: trusted-ca
  namespace: openshift-ingress-operator
  • Cluster Network Operator injects the trusted ca bundle into this ConfigMap:
kind: ConfigMap
metadata:
  annotations:
    release.openshift.io/create-only: "true"
  labels:
    config.openshift.io/inject-trusted-cabundle: "true"
  name: trusted-ca
  namespace: openshift-ingress-operator
apiVersion: v1
data:
  ca-bundle.crt: |
    <PEM_ENCODED_TRUSTED_CA_CERTS>

ca-bundle.crt contains either the RHCOS trust bundle or the merged RHCOS/user-provided bundle.

  • If the operator makes egress requests, it will typically mount this ConfigMap to /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem. If the operand makes egress requests, the operator will plumb the contents of ca-bundle.crt into the operand's trust store, typically /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem.
  • The operator watches the ConfigMap for changes and updates the trust bundle accordingly.

Copy link

@awgreene awgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 small notes.

Comment on lines +18 to +24

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this targeted at 4.4? This is changing with the introduction of OLM support for admission webhooks in 4.5

Comment on lines +18 to +20

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure if this is necessary to call out, but OLM will not update the certificates of operators that it manages in proxy environments. These certificates must be managed by the user via the subscription config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user, the existence of a feature called the “cluster-wide proxy” makes an unambiguous promise that components managed by the cluster will use those settings. Cluster-wide proxy settings include a field for trust settings.

So this isn’t a minor point. Anything that breaks the promise made by calling it the settings “cluster-wide” should be fixed. If it’s not going to work as advertised it should be called out as clearly as possible to help set users’ expectations.

Or rename the feature to better indicate purpose. As it stands very few features that are documented as being part of an OpenShift Container Platform cluster actually follow these settings.

Copy link

@obockows obockows left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great you have created something like that - very awaited by customers and support engineers, but would like to add my feedback.
Not long ago I was studying for CKA (Certified Kubernetes Admin) and I was using outstanding course:
https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/
after that course, I fully understood all correlations between certs in control pane for K8s.
What was very very helpful was his spreadshet:
https://github.com/mmumshad/kubernetes-the-hard-way/tree/master/tools
I understand in case of OCP it would require a lot of effort but maybe someday we could create such kind of table to better visualize correlations.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly that the alerting framework gives alerts when we have less than 5 minutes to the expiration?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"User-provided certificates " - are we talking here about "certificate-authority-data" from kubeconfig? or we are talking about 3rd party certificates we can add here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe some link to the place where it's described how to do that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand here it's secret/signing-key in namespace openshift-service-ca ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ca-bundle.crt -> are we talking about configmap/signing-cabundle ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this part should be the most robust, due to the fact customers are very concern about it (most likely due to OCP 3.x expirations)
So it shoud be described that first is the boostrap and then we have new certs after ~ 24 hours that are valid for 20 days? (ocp 4.2?) and they are autorotated... when? if left >= 20% time of lifetime?
We should also mention who signed them (I see Issuer: CN = kube-csr-signer_@1586557208) and where it's located (what namespace and what secrets/configmaps?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would describe here what is the purpose of client and server certs

Copy link
Contributor

@xingxingxia xingxingxia Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahardin-rh I remembered in #18254 (comment) sanchezl requested change for openshift-ingress to openshift-config. Not sure why still seeing openshift-ingress :)
Seems this PR is messed up with so many conversations. I suggest divide the content into different PRs and mention corresponding team owners (no matter QE or Dev or whoever) to review their team's certs content.

Copy link
Contributor

@xingxingxia xingxingxia May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahardin-rh , didn't see the correction in your new update. Here please change openshift-ingress to openshift-config, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that! I thought I saved the change, but I guess not. It's updated now!

@ahardin-rh ahardin-rh merged commit 22aa928 into openshift:master May 12, 2020
@ahardin-rh
Copy link
Contributor Author

/cherrypick enterprise-4.4

@openshift-cherrypick-robot

@ahardin-rh: new pull request created: #22061

Details

In response to this:

/cherrypick enterprise-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ahardin-rh
Copy link
Contributor Author

/cherrypick enterprise-4.5

@openshift-cherrypick-robot

@ahardin-rh: new pull request created: #22062

Details

In response to this:

/cherrypick enterprise-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

branch/enterprise-4.4 branch/enterprise-4.5 peer-review-done Signifies that the peer review team has reviewed this PR size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.