-
Notifications
You must be signed in to change notification settings - Fork 531
Create single node control plane based on installer bootstrap #440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| ### Non-Goals | ||
|
|
||
| 1. Single ignition config that can be used for multiple clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that many will see this as valuable enough to build in from the start because of latency issues, but @crawford probably knows better.
|
|
||
| Demonstrate a prototype of creating a simple static Ignition file that boots an RHCOS machine and launches a basic Kube control plane | ||
|
|
||
| ### Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think goals need to enumerate the components we want running. I'll throw one possible starting point out:
- etcd
- kube-apiserver
- kube-controller-manager
- kube-scheduler
- oauth-apiserver
- oauth-server
- olm
- nothing else.
This gives a kube control plane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eranco74 lets list what is provided by bootstrap static pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the bootstrap node we have these static pods yamls:
- etcd-member-pod.yaml
- kube-apiserver-pod.yaml
- kube-controller-manager-pod.yaml
- kube-scheduler-pod.yaml
- bootstrap-pod.yaml (cluster version operator)
- recycler-pod.yaml (doesn't seem relevant)
Running containers:
crictl ps | awk '{print $7}'
POD
kube-apiserver-insecure-readyz
kube-apiserver
kube-controller-manager
kube-scheduler
cluster-version-operator
etcd-metrics
etcd-member
Pods that show up with kubectl:
kubectl --kubeconfig auth/kubeconfig get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system bootstrap-kube-apiserver-master1 2/2 Running 0 37m
kube-system bootstrap-kube-controller-manager-master1 1/1 Running 0 37m
kube-system bootstrap-kube-scheduler-master1 1/1 Running 0 37m
We also have machineconfigoperator-bootstrap-pod.yaml that runs machine-config-server (get removed once the bootstrap manage to apply all the manifests https://github.com/openshift/installer/blob/master/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L371)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the static pods manifests we have in (baremetal) Openshift master node (3 nodes installation):
1. etcd-pod.yaml
1. kube-apiserver-pod.yaml
2. kube-controller-manager-pod.yaml
3. kube-scheduler-pod.yaml
4. coredns.yaml
5. haproxy.yaml
6. keepalived.yaml
7. mdns-publisher.yaml
8. recycler-pod.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the keeplived/coredns/haproxy are there because u're looking into BM platfrom cluster. I guess when we try similar with none, static pods will be aligned.
|
|
||
| 1. Create a single node cluster composed of static pods, similar to the installer bootstrap. | ||
|
|
||
| ### Non-Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- running a single node means certain management activity is difficult/impossible with current operator design. We should indicate whether we want operators running and if so, for which parts
- should indicate whether or not this needs to be able to upgrade. again @crawford
|
|
||
| Initial POC - https://docs.google.com/document/d/1pWauEQXl__39fMeLBIQpPnBNXdd92JNOylAnk8LCW_M/edit?usp=sharing | ||
|
|
||
| All certificates will be generated by the openshift installer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the installer wants to stop creating certificates and would prefer to delegate to the operators themselves in rendering.
|
|
||
| 1. Create a single node cluster composed of static pods, similar to the installer bootstrap. | ||
|
|
||
| ### Non-Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- decide whether or not cert rotation is important. If not, and if we decide to produce these static pods, it is possible for us to choose a different expiry, measured in years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can start with 10 years validity and add the rotation later
|
|
||
| 1. Operators are not running in the cluster and we need a way to rotate all certificates with a bash script using oc similar to this: | ||
| https://github.com/code-ready/snc/blame/master/kubelet-bootstrap-cred-manager-ds.yaml.in | ||
| Is this the best way to handle it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we decide we don't rotate, but that clusters created in this mode create certificates good for X years instead of X days
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Than we'll need to update all the parts that generate those certificates, right? There is no single place ?
|
|
||
| ### Implementation Details/Notes/Constraints [optional] | ||
|
|
||
| Initial POC - https://docs.google.com/document/d/1pWauEQXl__39fMeLBIQpPnBNXdd92JNOylAnk8LCW_M/edit?usp=sharing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explode a high level flow here and whether or not it worked?
I recall suggesting that you
- install a cluster
- remove two nodes
- run an etcd recovery on the remaining node to get a good etcd
- see if mostly works
If that mostly works, then I think we can talk about a possible path forward where the full configuration input is provided in manifests and operators render out the "finished" static pod instead of a bootstrap static pod. Or something similar. @crawford again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cluster downscale POC:
The cluster seems OK except for:
openshift-ingress router:
1/2 in status Pending since it's configured with 2 replicas and we have a single node.
Etcd-quorum-guard:
2/3 in status Pending since it’s configured with 3 replicas and we have a single node.
We ran openshift conformance tests (Feature:ProjectAPI) on the single node as well (6 pass, 0 skip (48.2s))
We did another POC transforming the installer bootsrap node to a single node cluster (replaced the link with the POC details)
Main changes are: - Put a little more emphasis on describing the installer interface - Add more details to the summary and motivation section - Mention the non-goal of being able to expand this cluster - Mention we want to support users customizing this all-in-one config - Copy the POC details from the Google doc so they are publicly visible - Add an open question about whether the bootstrap static pods are suitable
| ## Proposal | ||
|
|
||
| When a machine is booted with aio.ign, the aiokube systemd service is | ||
| launched (similar to bootkube in the bootstrap ignition). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bootkube.sh
|
I'm wondering whether API-VIP is required (instead of relying on external dns) |
eca5199 to
bec93b0
Compare
| - "@markmc" | ||
| creation-date: yyyy-mm-dd | ||
| last-updated: yyyy-mm-dd | ||
| status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: fill in the dates and pick a status?
|
|
||
| # Single node installation | ||
|
|
||
| Add a new `create aio-config` command to `openshift-installer` which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not wild about abbreviated command names, although I am ~ok with shorter aliases. I'd rather address long-command-name concerns with auto-complete scripts ;). Can we make this create single-node-config or some such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's also not clear to me why you can't just use the existing create ignition-configs with an install-config.yaml requesting replicas: 1 for the control plane and replicas: 0 for compute. Why does this need a new subcommand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new sub-command is required since we want a new installation flow that allow installing the node without having an auxiliary node (bootstrap) just an rhcos + ignition.
| replaces: | ||
| - "/enhancements/that-less-than-great-idea.md" | ||
| superseded-by: | ||
| - "/enhancements/our-past-effort.md" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove replaces and superseded-by unless you have more to put in them than the dummy placeholders.
Renamed `create aio-config` to `create single-node-config`
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: eranco74 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| # Single node installation | ||
|
|
||
| Add a new `create single-node-config` command to `openshift-installer` which | ||
| allows a user to create an `aio.ign` Ignition configuration which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does aio signify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All In One
| https://github.com/openshift/enhancements/pull/302 | ||
|
|
||
| ## Motivation | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it's missing the use-case? Is it just for demoing something - I'm not really clear on the why for this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a first step for zero touch single node cluster.
This enhancement describes a new single-node cluster profile for production use in "edge" deployments that are not considered to be resource-constrained, such as telecommunications bare metal environments. Signed-off-by: Doug Hellmann <[email protected]>
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
No description provided.