Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion _topic_map.yml
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,9 @@ Topics:
- Name: Understanding identity provider configuration
File: dedicated-understanding-authentication
Distros: openshift-dedicated
- Name: Certificate types and descriptions
File: certificate-types-descriptions
Distros: openshift-enterprise,openshift-webscale,openshift-origin
- Name: Configuring the internal OAuth server
File: configuring-internal-oauth
Distros: openshift-enterprise,openshift-webscale,openshift-origin
Expand Down Expand Up @@ -816,7 +819,7 @@ Topics:
File: images-other-jenkins-agent
- Name: Building and deploying a DPDK payload using the s2i image
File: cnf-building-and-deploying-a-dpdk-payload
Distros: openshift-webscale
Distros: openshift-webscale
---
Name: Applications
Dir: applications
Expand Down
166 changes: 166 additions & 0 deletions authentication/certificate-types-descriptions.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
[id="ocp-certificates"]
= Certificate types and descriptions
include::modules/common-attributes.adoc[]
:context: ocp-certificates

toc::[]

== Certificate validation

{product-title} monitors certificates for proper validity, for the cluster
certificates it issues and manages. The {product-title} alerting framework has
rules to help identify when a certificate issue is about to occur. These rules
consist of the following checks:

* API server client certificate expiration is less than five minutes.
Copy link
Contributor

@xingxingxia xingxingxia Mar 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These rules consist of the following checks

Only one check is seen currently, seems not complete here.
For API server client certificate expiration is less than five minutes, I'm not sure if less than five minutes is correct, need Dev help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I will work with Engineering to confirm.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly that the alerting framework gives alerts when we have less than 5 minutes to the expiration?


include::modules/user-provided-certificates-for-api-server.adoc[leveloffset=+1]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahardin-rh maybe, based on @deads2k comments in our email thread we could add a category called control plane certificates. That could state something like

  • managed by the system/rotated automatically
  • located in {a, b, c} namespaces
  • Note with known issue of expiration during certain corner cases with link to recovery of control plane cert doc for good measure

wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pweil- Sounds good to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pweil- The draft is updated. What specific namespaces should be listed? Thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ashley. Based on our discussion with @deads2k it sounds like these can be listed:

openshift-config-managed
openshift-kube-apiserver
openshift-kube-apiserver-operator
openshift-kube-controller-manager
openshift-kube-controller-manager-operator
openshift-kube-scheduler

include::modules/proxy-certificates.adoc[leveloffset=+1]
include::modules/service-ca-certificates.adoc[leveloffset=+1]
include::modules/node-certificates.adoc[leveloffset=+1]
include::modules/bootstrap-certificates.adoc[leveloffset=+1]
include::modules/etcd-certificates.adoc[leveloffset=+1]
include::modules/olm-certificates.adoc[leveloffset=+1]
include::modules/user-provided-certificates-for-default-ingress.adoc[leveloffset=+1]

== Ingress certificates

[discrete]
== Purpose

The Ingress Operator uses certificates for:

* Securing access to metrics for Prometheus.
* Securing access to routes.

[discrete]
== Location

To secure access to Ingress Operator and Ingress Controller metrics, the Ingress
Operator uses service serving certificates. The Operator requests a certificate
from the `service-ca` controller for its own metrics, and the `service-ca`
controller puts the certificate in a secret named `metrics-tls` in the
`openshift-ingress-operator` namespace. Additionally, the Ingress Operator
requests a certificate for each Ingress Controller, and the `service-ca`
controller puts the certificate in a secret named `router-metrics-certs-<name>`,
where `<name>` is the name of the Ingress Controller, in the
`openshift-ingress` namespace.

Each Ingress Controller has a default certificate that it uses for secured
routes that do not specify their own certificates. Unless you specify a custom
certificate, the Operator uses a self-signed certificate by default. The
Operator uses its own self-signed signing certificate to sign any default
certificate that it generates. The Operator generates this signing certificate
and puts it in a secret named `router-ca` in the `openshift-ingress-operator`
namespace. When the Operator generates a default certificate, it puts the default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/openshift-ingress-it /openshift-ingress-operator/

certificate in a secret named `router-certs-<name>` (where `<name>` is the name
of the Ingress Controller) in the `openshift-ingress` namespace.

[WARNING]
====
The Ingress Operator generates a default certificate for an Ingress Controller
to serve as a placeholder until you configure a custom default certificate. Do
not use Operator-generated default certificates in production clusters.
====

[discrete]
== Expiration

The expiration terms for the Ingress Operator's certificates are as follows:

* The expiration date for metrics certificates that the `service-ca` controller
creates is two years after the date of creation.
* The expiration date for the Operator's signing certificate is two years after
the date of creation.
* The expiration date for default certificates that the Operator generates is two
years after the date of creation.

You cannot specify custom expiration terms on certificates that the Ingress
Operator or `service-ca` controller creates.

You cannot specify expiration terms when installing {product-title} for
certificates that the Ingress Operator or `service-ca` controller creates.

[discrete]
== Services

Prometheus uses the certificates that secure metrics.

The Ingress Operator uses its signing certificate to sign default certificates
that it generates for Ingress Controllers for which you do not set custom
default certificates.

Cluster components that use secured routes may use the default Ingress
Controller's default certificate.

Ingress to the cluster via a secured route uses the default certificate of the
Ingress Controller by which the route is accessed unless the route specifies
its own certificate.

[discrete]
== Management

Ingress certificates are managed by the user. See
xref:../authentication/certificates/replacing-default-ingress-certificate.adoc#replacing-default-ingress[Replacing
the default ingress certificate] for more information.

[discrete]
== Renewal

The `service-ca` controller automatically rotates the certificates that it
issues. However, it is possible to use `oc delete secret <secret>` to
manually rotate service serving certificates.

The Ingress Operator does not rotate its own signing certificate or the default
certificates that it generates. Operator-generated default certificates are
intended as placeholders for custom default certificates that you configure.

= Monitoring and cluster logging Operator component certificates

Monitoring components secure their traffic with service CA certificates. These
certificates are valid for 2 years and are replaced automatically on rotation of
the service CA, which is every 13 months.


If the certificate lives in the `openshift-monitoring` or `openshift-logging`
namespace, it is system managed and rotated automatically.

[discrete]
== Management

These certificates are managed by the system and not the user.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-urbaniak can we say here, that if it lives in openshift-monitoring it is system managed and rotated automatically?

@jcantrill same question but for logging namespaces.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like @jcantrill confirmed this in our call today for the logging namespace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pweil hrm, somehow github i missed this github notification :-/ yes, internal traffic service CA certificates are rotated by the system.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I'm the same way with GH alerts and BZ alerts. Too many to NOT miss at least one. Thanks for confirming.

@ahardin-rh looks like we can add that update for the openshift-monitoring namespace. Thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconcile() calls CreateOrUpdateLogStore().
https://github.com/openshift/cluster-logging-operator/blob/01bf653aa2290b0230d164daa255d0c58c5f1a92/pkg/k8shandler/reconciler.go#L38

CreateOrUpdateLogStore() runs cert_generation.sh.
https://github.com/openshift/cluster-logging-operator/blob/57b091eb6ef0d1f2e263d5aa70e7d3e1b4666390/pkg/k8shandler/certificates.go#L127

cert_generation.sh checks the expire dates by using "openssl x509 -checkend 0" command, and regenerate certificates if they have already expired.
https://github.com/openshift/cluster-logging-operator/blob/ec11d6a5fe5f3c01c2793fc19793172bf51ad15b/scripts/cert_generation.sh

I confirmed certificates are regenerated as @jcantrill said.

However, I have the following concerns.

  • Pods will need to be recreated in order to reload the regenerated certificates. So, who will recreate them?
  • cert_generation.sh regenerates certificates just after they expired. On the other hand, other certificates(e.g. service-ca certificate) are regenarated before a few month ago from the expiration date. cert_generation.sh also should regenerate before a few month ago.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pods will need to be recreated in order to reload the regenerated certificates. So, who will recreate them?

For openshift-monitoring, It looks like the same problem will occur.
https://github.com/openshift/cluster-monitoring-operator/blob/0a458269aecb799b162aa9dc8fc61ccc0fa2be1b/pkg/manifests/tls.go#L68

I would appreciate if you can help to clear my concerns.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pods will need to be recreated in order to reload the regenerated certificates. So, who will recreate them?

For cluster-logging, it regenerates certificates by using cert_generation.sh.
So I tested the regeneration by modifying cert_generation.sh.

As a result:

  • ElasticSearch pod is recreated automatically after certificates are generated
  • Kibana's certificates are reloaded without recreating pod
  • Fluentd's certificates aren't reloaded. It's a problem
  • Become unable to access kibana route since the route has old certificate. It's also the problem

So, it looks like cluster logging operator should be fixed.
I would appreciate any comments or suggestions.

The detail is below.
I regenerated the certificates as follows.

[core@rhocp410-worker6 ~]$ sudo nsenter -t $(pgrep cluster-logging) --mount
[root@rhocp410-worker6 /]# cp -p /usr/bin/scripts/cert_generation.sh /usr/bin/scripts/cert_generation.sh.org^C
[root@rhocp410-worker6 /]# vi /usr/bin/scripts/cert_generation.sh
[root@rhocp410-worker6 /]# diff -u /usr/bin/scripts/cert_generation.sh.org /usr/bin/scripts/cert_generation.sh
--- /usr/bin/scripts/cert_generation.sh.org     2020-03-23 09:10:22.000000000 +0000
+++ /usr/bin/scripts/cert_generation.sh 2020-04-03 03:53:33.824057567 +0000
@@ -6,7 +6,7 @@
 NAMESPACE=$2
 CA_PATH=${CA_PATH:-$WORKING_DIR/ca.crt}
 LOG_STORE=$3
-REGENERATE_NEEDED=0
+REGENERATE_NEEDED=1

 function init_cert_files() {

@@ -20,7 +20,7 @@
 }

 function generate_signing_ca() {
-  if [ ! -f ${WORKING_DIR}/ca.crt ] || [ ! -f ${WORKING_DIR}/ca.key ] || ! openssl x509 -checkend 0 -noout -in ${WORKING_DIR}/ca.crt; then
+  if [ $REGENERATE_NEEDED -ne 0 ] || [ ! -f ${WORKING_DIR}/ca.crt ] || [ ! -f ${WORKING_DIR}/ca.key ] || ! openssl x509 -checkend 0 -noout -in ${WORKING_DIR}/ca.crt; then
     openssl req -x509 \
                 -new \
                 -newkey rsa:4096 \

Before Regeneration:

  [root@rhocp410-worker6 /]# cksum /tmp/ocp-clo/*
  ...
  3323181290 1850 /tmp/ocp-clo/ca.crt
  3600238151 2535 /tmp/ocp-clo/ca.db
  3849082317 20 /tmp/ocp-clo/ca.db.attr
  3849082317 20 /tmp/ocp-clo/ca.db.attr.old
  1917137176 2469 /tmp/ocp-clo/ca.db.old
  681125202 3272 /tmp/ocp-clo/ca.key
  1561848553 3 /tmp/ocp-clo/ca.serial.txt
  1557087854 3 /tmp/ocp-clo/ca.serial.txt.old
  ...

After Regeneration:

  [root@rhocp410-worker6 /]# cksum /tmp/ocp-clo/*
  ...
  3449268057 1850 /tmp/ocp-clo/ca.crt
  549398629 14196 /tmp/ocp-clo/ca.db
  3849082317 20 /tmp/ocp-clo/ca.db.attr
  3849082317 20 /tmp/ocp-clo/ca.db.attr.old
  3321310373 14130 /tmp/ocp-clo/ca.db.old
  652093969 3272 /tmp/ocp-clo/ca.key
  1218432204 3 /tmp/ocp-clo/ca.serial.txt
  1301580633 3 /tmp/ocp-clo/ca.serial.txt.old
  ...
  [root@rhocp410-worker6 /]# openssl x509 -in /tmp/ocp-clo/system.logging.fluentd.crt -text | grep Not
              Not Before: Apr  3 04:29:52 2020 GMT
              Not After : Apr  3 04:29:52 2022 GMT

Only elasticsearch pod was recreated after that.

[Fri Apr 03 13:30:18 root@api ~]# oc get pods -o wide
NAME                                           READY   STATUS        RESTARTS   AGE     IP            NODE                                             NOMINATED NODE   READINESS GATES
cluster-logging-operator-785b4dd44b-tc227      1/1     Running       0          18h     10.130.2.7    rhocp410-worker6.rhocp410.cluster.sub.nec.test   <none>           <none>
curator-1585884600-sng8l                       0/1     Error         0          60m     10.130.0.20   rhocp410-worker0.rhocp410.cluster.sub.nec.test   <none>           <none>
elasticsearch-cdm-pxnmds9o-1-6455fc7db-ml699   2/2     Terminating   0          5m14s   10.130.0.24   rhocp410-worker0.rhocp410.cluster.sub.nec.test   <none>           <none>
fluentd-t5vmb                                  1/1     Running       0          78m     10.131.2.18   rhocp410-worker5.rhocp410.cluster.sub.nec.test   <none>           <none>
fluentd-xv4vg                                  1/1     Running       0          78m     10.130.2.11   rhocp410-worker6.rhocp410.cluster.sub.nec.test   <none>           <none>
kibana-6ccd94b746-r4jdz                        2/2     Running       0          78m     10.129.0.23   rhocp410-worker2.rhocp410.cluster.sub.nec.test   <none>           <none>

[Fri Apr 03 13:30:46 root@api ~]# oc get pods -o wide
NAME                                           READY   STATUS              RESTARTS   AGE   IP            NODE                                             NOMINATED NODE   READINESS GATES
cluster-logging-operator-785b4dd44b-tc227      1/1     Running             0          19h   10.130.2.7    rhocp410-worker6.rhocp410.cluster.sub.nec.test   <none>           <none>
curator-1585884600-sng8l                       0/1     Error               0          60m   10.130.0.20   rhocp410-worker0.rhocp410.cluster.sub.nec.test   <none>           <none>
elasticsearch-cdm-pxnmds9o-1-6455fc7db-tw264   0/2     ContainerCreating   0          3s    <none>        rhocp410-worker0.rhocp410.cluster.sub.nec.test   <none>           <none>
fluentd-t5vmb                                  1/1     Running             0          78m   10.131.2.18   rhocp410-worker5.rhocp410.cluster.sub.nec.test   <none>           <none>
fluentd-xv4vg                                  1/1     Running             0          78m   10.130.2.11   rhocp410-worker6.rhocp410.cluster.sub.nec.test   <none>           <none>
kibana-6ccd94b746-r4jdz                        2/2     Running             0          78m   10.129.0.23   rhocp410-worker2.rhocp410.cluster.sub.nec.test   <none>           <none>```

Elasticsearch and kibana reloaded certificates, but fluentd is still using old certificate.

bash-4.2$ curl https://elasticsearch.openshift-logging.svc:9200 -vvvs 2>&1 | grep -i date
*       start date: Apr 03 04:29:56 2020 GMT
*       expire date: Apr 03 04:29:56 2022 GMT

bash-4.2$ curl https://fluentd.openshift-logging.svc.cluster.local:24231 -vvvs 2>&1 | grep -i date
*       start date: Mar 31 11:01:54 2020 GMT
*       expire date: Mar 31 11:01:55 2022 GMT

bash-4.2$ curl https://kibana.openshift-logging.svc.cluster.local -vvvs 2>&1 | grep -i date
*       start date: Apr 03 04:29:54 2020 GMT
*       expire date: Apr 03 04:29:54 2022 GMT

Kibana route became unavailable since it has old certificate.

[Fri Apr 03 13:38:49 root@api ~]# curl https://kibana-openshift-logging.apps.rhocp410.cluster.sub.nec.test -ks | grep available
      <h1>Application is not available</h1>
[Fri Apr 03 13:38:56 root@api ~]# oc get route -n openshift-logging kibana
NAME     HOST/PORT                                                     PATH   SERVICES   PORT    TERMINATION          WILDCARD
kibana   kibana-openshift-logging.apps.rhocp410.cluster.sub.nec.test          kibana     <all>   reencrypt/Redirect   None
[Fri Apr 03 13:39:00 root@api ~]# oc get route -n openshift-logging kibana -o jsonpath='{.spec.tls.caCertificate}'| openssl x509 -text | grep Not
            Not Before: Feb 19 11:13:50 2020 GMT
            Not After : Feb 17 11:13:50 2025 GMT

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k-keiichi since this is focused specifically on the documentation of certificates can you open a new BZ for each type of rotation bug you're seeing? That way it can be assigned to the appropriate team for direct comment.

It looks like you need to open one for monitoring and one for logging. Please include steps to recreate, and the info you posted above so the engineering teams can respond. Thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pweil
Thanks. I will create new BZs for the above comments.


= Control plane certificates

[discrete]
== Location

Control plane certificates are included in these namespaces:

* openshift-config-managed
* openshift-kube-apiserver
* openshift-kube-apiserver-operator
* openshift-kube-controller-manager
* openshift-kube-controller-manager-operator
* openshift-kube-scheduler

[discrete]
== Management

Control plane certificates are managed by the system and rotated automatically.

In the rare case that your control plane certificates expired, see
xref:../backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-recovering-expired-certs[Recovering
from expired control plane certificates]

.Additional resources

* xref:../authentication/certificates/service-serving-certificate.adoc#add-service-serving[Manually rotate service serving certificates]
* xref:../authentication/certificates/service-serving-certificate.adoc#add-service-serving[Securing service traffic using service serving certificate secrets]
* xref:../backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-recovering-expired-certs[Recovering
from expired control plane certificates]
* xref:../networking/enable-cluster-wide-proxy.adoc#enable-cluster-wide-proxy[Configuring the cluster-wide proxy]
* xref:../authentication/certificates/api-server.adoc#api-server-certificates[Adding API server certificates]
* xref:../authentication/certificates/replacing-default-ingress-certificate.adoc#replacing-default-ingress[Replacing the default ingress certificate]
* xref:../nodes/nodes/nodes-nodes-working.adoc#nodes-nodes-working[Working with nodes]
* xref:../backup_and_restore/disaster_recovery/scenario-1-infra-recovery.adoc#dr-scenario-1-recover-master-hosts_dr-infrastructure-recovery[Recovering from lost master hosts]
Empty file.
Binary file added images/darkcircle-0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/darkcircle-10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/darkcircle-9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ingress-certificates-workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
38 changes: 38 additions & 0 deletions modules/bootstrap-certificates.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
// Module included in the following assemblies:
//
// * authentication/certificate-types-descriptions.adoc

[id="bootstrap-certificates_{context}"]
= Bootstrap certificates

[discrete]
== Purpose

The kubelet, in {product-title} 4 and later, uses the bootstrap certificate
located in `/etc/kubernetes/kubeconfig` to initially bootstrap. This is followed
by the
link:https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/#bootstrap-initialization[bootstrap
initialization process] and
link:https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/#authorize-kubelet-to-create-csr[authorization
of the kubelet to create a CSR].

In that process, the kubelet generates a CSR while communicating over the
bootstrap channel. The controller manager signs the CSR, resulting in a
certificate that the kubelet manages.

[discrete]
== Management

These certificates are managed by the system and not the user.

[discrete]
== Expiration
This bootstrap CA is valid for 10 years.

The kubelet-managed certificate is valid for one year and rotates automatically at
around the 80 percent mark of that one year.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can say this in terms of months?


[discrete]
== Customization

You cannot customize the bootstrap certificates.
41 changes: 41 additions & 0 deletions modules/etcd-certificates.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
// Module included in the following assemblies:
//
// * authentication/certificate-types-descriptions.adoc

[id="etcd-certificates_{context}"]
= etcd certificates

[discrete]
== Purpose

etcd certificates are signed by the etcd-signer; they come from a certificate

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would describe here what is the purpose of client and server certs

authority (CA) that is generated by the bootstrap process.

[discrete]
== Expiration

The CA certificates are valid for 10 years. The peer, client, and server
certificates are valid for three years.

[discrete]
== Management

These certificates are managed by the system and not the user.

[discrete]
== Services

etcd certificates are used for encrypted communication between etcd member
peers, as well as encrypted client traffic. The following certificates are
generated and used by etcd and other processes that communicate with etcd:

* Peer certificates: Used for communication between etcd members.
* Client certificates: Used for encrypted server-client communication. Client
certificates are currently used by the API server only, and no other service
should connect to etcd directly except for the proxy. Client secrets
(`etcd-client`, `etcd-metric-client`, `etcd-metric-signer`, and `etcd-signer`)
are added to the `openshift-config`, `openshift-monitoring`, and
`openshift-kube-apiserver` namespaces.
* Server certificates: Used by the etcd server for authenticating client requests.
* Metric certificates: All metric consumers connect to proxy with metric-client
certificates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doc for ocp 4.3 only or for 4.4? if it only for 4.3, it should be ok, regarding to 4.4, there is not dir: /etc/ssl/etcd directory.
sh-4.2# cd /etc/ssl
sh-4.2# ls
certs
sh-4.2# cd certs/
sh-4.2# ls
ca-bundle.crt ca-bundle.trust.crt

19 changes: 19 additions & 0 deletions modules/node-certificates.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@

// Module included in the following assemblies:
//
// * authentication/certificates/certificate-reference.adoc

[id="node-certificates_{context}"]
= Node certificates
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rphillips @sjenning Can you please review what is here for node certs so far? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rphillips @sjenning Can you please provide feedback by end of week?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we talk about the CA's and the validity of these? How long they are valid for; how to rotate them, etc?

@rphillips @sjenning can either of you comment on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rphillips @sjenning Can you please comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahardin-rh this is correct. We might want to add that once the cluster is installed the Node certificates are auto-rotated.


[discrete]
== Purpose

Node certificates are signed by the cluster; they come from a certificate

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this part should be the most robust, due to the fact customers are very concern about it (most likely due to OCP 3.x expirations)
So it shoud be described that first is the boostrap and then we have new certs after ~ 24 hours that are valid for 20 days? (ocp 4.2?) and they are autorotated... when? if left >= 20% time of lifetime?
We should also mention who signed them (I see Issuer: CN = kube-csr-signer_@1586557208) and where it's located (what namespace and what secrets/configmaps?)

authority (CA) that is generated by the bootstrap process. Once the cluster is
installed, the node certificates are auto-rotated.

[discrete]
== Management

These certificates are managed by the system and not the user.
28 changes: 28 additions & 0 deletions modules/olm-certificates.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// Module included in the following assemblies:
//
// * authentication/certificate-types-descriptions.adoc

[id="olm-certificates_{context}"]
= OLM certificates
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good summary of what @ecordell and @awgreene have described.

If there are no objections to the wording I'm lgtm on this 👍


[discrete]
== Management

All certificates for OpenShift Lifecycle Manager (OLM) components
(`olm-operator`, `catalog-operator`, `packageserver`, and
`marketplace-operator`) are managed by the system.

Operators installed via OLM can have certificates generated for them if they are
providing API services. `packageserver` is one example.

Certificates in the `openshift-operator-lifecycle-manager` namespace are managed
by OLM with the exception of certificates used by Operators that require a
validating or mutating webhook.
Comment on lines +18 to +20

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure if this is necessary to call out, but OLM will not update the certificates of operators that it manages in proxy environments. These certificates must be managed by the user via the subscription config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user, the existence of a feature called the “cluster-wide proxy” makes an unambiguous promise that components managed by the cluster will use those settings. Cluster-wide proxy settings include a field for trust settings.

So this isn’t a minor point. Anything that breaks the promise made by calling it the settings “cluster-wide” should be fixed. If it’s not going to work as advertised it should be called out as clearly as possible to help set users’ expectations.

Or rename the feature to better indicate purpose. As it stands very few features that are documented as being part of an OpenShift Container Platform cluster actually follow these settings.


Operators that install validating or mutating webhooks must currently manage
those certificates themselves. They do not require the user to manage the
certificates.
Comment on lines +18 to +24

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this targeted at 4.4? This is changing with the introduction of OLM support for admission webhooks in 4.5


OLM will not update the certificates of Operators that it manages in proxy
environments. These certificates must be managed by the user via the
subscription config.
Loading