diff --git a/CONVENTIONS.md b/CONVENTIONS.md index 096a06581e..5896222a98 100644 --- a/CONVENTIONS.md +++ b/CONVENTIONS.md @@ -45,7 +45,7 @@ components that make up the project, but exceptions are allowed when necessary. The conventions are intended to help with consistency across the project. Users of the platform expect consistency in the experience and operation of the cluster. -Developers of the platform expect consistency in the code to quickly identify issuses +Developers of the platform expect consistency in the code to quickly identify issues across codebases. Consistency enables shared understanding, simplifies explanations, and reduces accidental complexity. @@ -85,3 +85,86 @@ metal3-io](https://github.com/metal3-io/metal3-docs/blob/master/design/bare-meta ### API OpenShift APIs follow the [Kubernetes API conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md). + +### Operators + +The OpenShift project is an early adopter of, and makes extensive use of, [the +operator pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), +and so it is incumbent on us to establish some conventions around operators. + +#### Taints and Tolerations + +An operator deployed by the CVO should run on master nodes and therefore should +tolerate the following taint: + +* `node-role.kubernetes.io/master` + +For example: + +```yaml +spec: + template: + spec: + tolerations: + - key: node-role.kubernetes.io/master + operator: Exists + effect: NoSchedule +``` + +Tolerating this taint should suffice for the vast majority of core OpenShift +operators. In exceptional cases, an operator may tolerate one or more of the +following taints if doing so is necessary to form a functional Kubernetes node: + +* `node.kubernetes.io/disk-pressure` +* `node.kubernetes.io/memory-pressure` +* `node.kubernetes.io/network-unavailable` +* `node.kubernetes.io/not-ready` +* `node.kubernetes.io/pid-pressure` +* `node.kubernetes.io/unreachable` + +Operators should not specify tolerations in their manifests for any of the taints in +the above list without an explicit and credible justification. + +When an operator configures its operand, the operator likewise may specify +tolerations for the aforementioned taints but should do so only as necessary and only +with explicit justification. + +Note that the DefaultTolerationSeconds and PodTolerationRestriction admission plugins +may add time-bound tolerations to an operator or operand in addition to the +tolerations that the operator has specified. + +If appropriate, a CRD that corresponds to an operand may provide an API to allow +users to specify a custom list of tolerations for that operand. For examples, see +the +[imagepruners.imageregistry.operator.openshift.io/v1](https://github.com/openshift/api/blob/34f54f12813aaed8822bb5bc56e97cbbfa92171d/imageregistry/v1/types_imagepruner.go#L67-L69), +[configs.imageregistry.operator.openshift.io/v1](https://github.com/openshift/api/blob/34f54f12813aaed8822bb5bc56e97cbbfa92171d/imageregistry/v1/types.go#L82-L84), +[builds.config.openshift.io/v1](https://github.com/openshift/api/blob/34f54f12813aaed8822bb5bc56e97cbbfa92171d/config/v1/types_build.go#L96-L99), +and +[ingresscontrollers.operator.openshift.io/v1](https://github.com/openshift/api/blob/34f54f12813aaed8822bb5bc56e97cbbfa92171d/operator/v1/types_ingress.go#L183-L191) +APIs. + +In exceptional cases, an operand may tolerate all taints: + +* if the operand is required to form a functional Kubernetes node, or +* if the operand is required to support workloads sourced from an internal or external registry that core components depend upon, + +then the operand should tolerate all taints: + +```yaml +spec: + template: + spec: + tolerations: + - operator: Exists +``` + +An example of an operand that matches the first case is kube-proxy, which is required +for services to work. An example of an operand that matches the second case is the +DNS node resolver, which adds an entry to the `/etc/hosts` file on all node hosts so +that the container runtime is able to resolve the name of the cluster image registry; +absent this entry in `/etc/hosts`, upgrades could fail to pull images of core +components. + +If an operand meets neither of the two conditions listed above, it must not tolerate +all taints. This constraint is enforced by [a CI test +job](https://github.com/openshift/origin/blob/7d07adcf518a846b898cd0958b85f2daf624476a/test/extended/operators/tolerations.go).