Skip to content

Conversation

@deads2k
Copy link
Contributor

@deads2k deads2k commented May 1, 2020

/hold

This is only a thought experiment. I started down the path of seeing if I could quickly build this, but the smart way to build requires a few foundational refactors and testing of fake client libraries and fake indexer techniques that haven't been proven.

Let's talk about

  1. whether we're interested enough to invest at least a week before seeing fruit
  2. whether we agree that the approach results in a supportable static
  3. whether we think that the general idea of executing loops in order (with a cycle or two) is maintainable in the long run.

@smarterclayton @mfojtik @sttts @soltysh @hexfusion

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 1, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 1, 2020
Comment on lines +56 to +82
We can create a new kind of render command which takes existing inputs *and* config.openshift.io resources.
Similar to how we built the original disaster recovery for certificates, we can factor the command to run the various
control loops "in order".
We can initialize our control loops using fake clients and wire listeners to synthetically update indexers backing fake
listers.
This is like we do for unit tests, only wired into the update reactors for the client.
If we separate the reactive bits of the control loops, the informer watch triggers adn the like, from the data input bits
(I think this is possible), we can have very high fidelity.
In the kube-apiserver, the ordering would like this for instance:
1. cert-rotation - we need to create certs
2. encryption - this would need a special mode to say: just encrypt it right away
3. bound tokens - this creates some secrets for us
4. static-resources - this creates targets, SAs, and stuff
5. config observation - we need to set the operator observed config to be able to generate the final config.
6. target config - writes the kube-apiserver configmap
7. resource sync - copies bits from A to B
8. loop through config observation, target config, resource sync one more time (yeah, cycles)
9. revision controller

Now we do a couple neat things:
1. Export all content from the fake clients to produce resource manifests that will be created bootkube style against
the kube-apiserver.
Someone will have grown a dependency and we know for sure that the next operator will require input from the previous one.
2. Wire up the fake clients to our installer command.
In theory, this command will create an exact copy of the "normal" kube-apiserver static pod that we create.

This gives leaves supporting only one shape of static pods, which makes support of these static pods much easier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in theory supportable so during a 4.y release cycle we could define static versions that people can templatize (minimally) with things like on disk certs? I.e. would this "shape" to be roughly supportable with limited flexibility to change, without having to change the existing operator dramatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our first attempt would be to get people to run this installer with an input that looks exactly like what they would use in a "real" cluster. So we would accept a manifest containing their serving cert and a manifest containing their apiserver.config.openshift.io that says how to use it.

This would allow us...

  1. to have one set of code managing the user input
  2. having a single external interface to the world instead of promising the shape of a static pod
  3. allow the cluster-admin to test/confirm his changes in a real cluster and take those settings as input to producing a single-node cluster.

If we start trying to allow injection of disk certs, our on-disk static pods become an API we need to support.


## Open Questions [optional]

1. Do we even have a use-case for spending the time building this thought experiment?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrguitar @imcleod do you have any?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ILT is working on collecting use cases for single node clusters. Progress has been slow on my side with other commitments but we'll have this put together soon. In the absence of this, people keep wanting to take a single physical server and create a 3 node cluster w/ three VMs on the single node. The overhead of that is insane IMO. Regardless, single node clusters are on the roadmap so please spend the cycles on this.

@ashcrow
Copy link
Member

ashcrow commented May 1, 2020

/cc @arithx @LorbusChris

We can initialize our control loops using fake clients and wire listeners to synthetically update indexers backing fake
listers.
This is like we do for unit tests, only wired into the update reactors for the client.
If we separate the reactive bits of the control loops, the informer watch triggers adn the like, from the data input bits
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adn the like -> and the like

This is like we do for unit tests, only wired into the update reactors for the client.
If we separate the reactive bits of the control loops, the informer watch triggers adn the like, from the data input bits
(I think this is possible), we can have very high fidelity.
In the kube-apiserver, the ordering would like this for instance:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like this -> would be like this

A while back, Seth Jennings had a cool idea for trying to create a single node cluster using ignition.
The cluster would be non-configurable after "creation", non-upgradable, non-HA.
The cluster would only have etcd, kube-apiserver, kube-controller-manager, kube-scheduler.
This is a description of how we could generate supportable static pods.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the outcome? Not really understanding the 'static pod' and 'supportability' in this context. The summary and motivation lack a justification and a why ...


## Motivation

Documenting a thought experiment about single node clusters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single node kubernetes clusters. Not OpenShift clusters.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'd be curious to know why we care about having a k8s cluster which is somewhat similar to OpenShift cluster but that much. Is the goal to provide a k8s cluster which control plane is managed similarly how the OpenShift control plane or something else that I'm missing here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a way to start core operators of OpenShift?

@smarterclayton
Copy link
Contributor

I will be referencing this from a broader discussion document that covers an approach to minimal edge clusters using basic control plane, no-operator deployments.

Copy link
Contributor

@tnozicka tnozicka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k What is the benefit of creating a single node cluster that isn't OpenShift, doesn't have operators and isn't configurable? To me this feels like a different product but maybe I just missed the use case.

I like the idea of bootstrapping via static pods and ignition, but to make a real cluster if possible. That would cost more time though.


### Restrictions
Some things become impractical once we cannot reconfigure the kube-apiserver, they include...
1. short lifespan of kcm and ksch client certificates - we can no longer rotate these
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some would be rotated after expiry as the recovery cert rotation is embeded

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 23, 2020
@romfreiman
Copy link

/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 24, 2020
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 24, 2021
@romfreiman
Copy link

/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 25, 2021
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 23, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 23, 2021
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Aug 22, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 22, 2021

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.