Skip to content

Conversation

@eranco74
Copy link
Contributor

No description provided.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 18, 2020

### Non-Goals

1. Single ignition config that can be used for multiple clusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that many will see this as valuable enough to build in from the start because of latency issues, but @crawford probably knows better.


Demonstrate a prototype of creating a simple static Ignition file that boots an RHCOS machine and launches a basic Kube control plane

### Goals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think goals need to enumerate the components we want running. I'll throw one possible starting point out:

  1. etcd
  2. kube-apiserver
  3. kube-controller-manager
  4. kube-scheduler
  5. oauth-apiserver
  6. oauth-server
  7. olm
  8. nothing else.

This gives a kube control plane.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eranco74 lets list what is provided by bootstrap static pods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the bootstrap node we have these static pods yamls:

  1. etcd-member-pod.yaml
  2. kube-apiserver-pod.yaml
  3. kube-controller-manager-pod.yaml
  4. kube-scheduler-pod.yaml
  5. bootstrap-pod.yaml (cluster version operator)
  6. recycler-pod.yaml (doesn't seem relevant)
    Running containers:
crictl ps | awk '{print $7}'
POD
kube-apiserver-insecure-readyz
kube-apiserver
kube-controller-manager
kube-scheduler
cluster-version-operator
etcd-metrics
etcd-member

Pods that show up with kubectl:

kubectl --kubeconfig auth/kubeconfig get pods -A
NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE
kube-system   bootstrap-kube-apiserver-master1            2/2     Running   0          37m
kube-system   bootstrap-kube-controller-manager-master1   1/1     Running   0          37m
kube-system   bootstrap-kube-scheduler-master1            1/1     Running   0          37m

We also have machineconfigoperator-bootstrap-pod.yaml that runs machine-config-server (get removed once the bootstrap manage to apply all the manifests https://github.com/openshift/installer/blob/master/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L371)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the static pods manifests we have in (baremetal) Openshift master node (3 nodes installation):

1. etcd-pod.yaml
1. kube-apiserver-pod.yaml
2. kube-controller-manager-pod.yaml
3. kube-scheduler-pod.yaml
4. coredns.yaml 
5. haproxy.yaml
6. keepalived.yaml
7. mdns-publisher.yaml
8. recycler-pod.yaml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the keeplived/coredns/haproxy are there because u're looking into BM platfrom cluster. I guess when we try similar with none, static pods will be aligned.


1. Create a single node cluster composed of static pods, similar to the installer bootstrap.

### Non-Goals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. running a single node means certain management activity is difficult/impossible with current operator design. We should indicate whether we want operators running and if so, for which parts
  2. should indicate whether or not this needs to be able to upgrade. again @crawford


Initial POC - https://docs.google.com/document/d/1pWauEQXl__39fMeLBIQpPnBNXdd92JNOylAnk8LCW_M/edit?usp=sharing

All certificates will be generated by the openshift installer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the installer wants to stop creating certificates and would prefer to delegate to the operators themselves in rendering.


1. Create a single node cluster composed of static pods, similar to the installer bootstrap.

### Non-Goals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. decide whether or not cert rotation is important. If not, and if we decide to produce these static pods, it is possible for us to choose a different expiry, measured in years.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can start with 10 years validity and add the rotation later


1. Operators are not running in the cluster and we need a way to rotate all certificates with a bash script using oc similar to this:
https://github.com/code-ready/snc/blame/master/kubelet-bootstrap-cred-manager-ds.yaml.in
Is this the best way to handle it?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we decide we don't rotate, but that clusters created in this mode create certificates good for X years instead of X days

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Than we'll need to update all the parts that generate those certificates, right? There is no single place ?


### Implementation Details/Notes/Constraints [optional]

Initial POC - https://docs.google.com/document/d/1pWauEQXl__39fMeLBIQpPnBNXdd92JNOylAnk8LCW_M/edit?usp=sharing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explode a high level flow here and whether or not it worked?

I recall suggesting that you

  1. install a cluster
  2. remove two nodes
  3. run an etcd recovery on the remaining node to get a good etcd
  4. see if mostly works

If that mostly works, then I think we can talk about a possible path forward where the full configuration input is provided in manifests and operators render out the "finished" static pod instead of a bootstrap static pod. Or something similar. @crawford again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster downscale POC:
The cluster seems OK except for:
openshift-ingress router:
1/2 in status Pending since it's configured with 2 replicas and we have a single node.
Etcd-quorum-guard:
2/3 in status Pending since it’s configured with 3 replicas and we have a single node.
We ran openshift conformance tests (Feature:ProjectAPI) on the single node as well (6 pass, 0 skip (48.2s))

We did another POC transforming the installer bootsrap node to a single node cluster (replaced the link with the POC details)

Main changes are:

- Put a little more emphasis on describing the installer interface
- Add more details to the summary and motivation section
- Mention the non-goal of being able to expand this cluster
- Mention we want to support users customizing this all-in-one config
- Copy the POC details from the Google doc so they are publicly visible
- Add an open question about whether the bootstrap static pods are suitable
@eranco74 eranco74 marked this pull request as ready for review August 19, 2020 15:38
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 19, 2020
## Proposal

When a machine is booted with aio.ign, the aiokube systemd service is
launched (similar to bootkube in the bootstrap ignition).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bootkube.sh

@romfreiman
Copy link

I'm wondering whether API-VIP is required (instead of relying on external dns)

@eranco74 eranco74 force-pushed the aio branch 2 times, most recently from eca5199 to bec93b0 Compare August 20, 2020 16:17
- "@markmc"
creation-date: yyyy-mm-dd
last-updated: yyyy-mm-dd
status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fill in the dates and pick a status?


# Single node installation

Add a new `create aio-config` command to `openshift-installer` which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not wild about abbreviated command names, although I am ~ok with shorter aliases. I'd rather address long-command-name concerns with auto-complete scripts ;). Can we make this create single-node-config or some such?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's also not clear to me why you can't just use the existing create ignition-configs with an install-config.yaml requesting replicas: 1 for the control plane and replicas: 0 for compute. Why does this need a new subcommand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new sub-command is required since we want a new installation flow that allow installing the node without having an auxiliary node (bootstrap) just an rhcos + ignition.

replaces:
- "/enhancements/that-less-than-great-idea.md"
superseded-by:
- "/enhancements/our-past-effort.md"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove replaces and superseded-by unless you have more to put in them than the dummy placeholders.

Renamed `create aio-config` to `create single-node-config`
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: eranco74
To complete the pull request process, please assign dhellmann
You can assign the PR to them by writing /assign @dhellmann in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

# Single node installation

Add a new `create single-node-config` command to `openshift-installer` which
allows a user to create an `aio.ign` Ignition configuration which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does aio signify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All In One

https://github.com/openshift/enhancements/pull/302

## Motivation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it's missing the use-case? Is it just for demoing something - I'm not really clear on the why for this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first step for zero touch single node cluster.

romfreiman referenced this pull request in dhellmann/openshift-enhancements Oct 16, 2020
This enhancement describes a new single-node cluster profile for
production use in "edge" deployments that are not considered to be
resource-constrained, such as telecommunications bare metal
environments.

Signed-off-by: Doug Hellmann <[email protected]>
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 8, 2020
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 7, 2021
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants