Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine States & Preboot Bootstrapping #997

Merged

Conversation

vincepri
Copy link
Member

Signed-off-by: Vince Prignano [email protected]

What this PR does / why we need it:
This proposal outlines splitting up the Machine data model into two new providers/controllers along with the generic Machine controller. The two new providers are the Infrastructure provider and the Bootstrap provider. The infrastructure provider handles provisioning infrastructure (cloud, bare metal, virtual) and the bootstrap provider handles Kubernetes bootstrapping. This change will improve separation of concerns for the Machine controller and gives more degrees of freedom to the user.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:


@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 10, 2019
@k8s-ci-robot k8s-ci-robot requested review from detiber and justinsb June 10, 2019 14:27
@vincepri
Copy link
Member Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 10, 2019
@vincepri vincepri force-pushed the states-preboot-proposal branch from aca3142 to c3529cb Compare June 10, 2019 14:36
@justaugustus
Copy link
Member

@k8s-ci-robot
Copy link
Contributor

@justaugustus: GitHub didn't allow me to request PR reviews from the following users: khenidak.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @justaugustus @ritazh @khenidak @feiskyer @juan-lee

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@detiber
Copy link
Member

detiber commented Jun 10, 2019

/assign @davidewatson @detiber @justinsb @timothysc

Copy link
Contributor

@davidewatson davidewatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work. IMHO, we should proceed with implementation to prove out the design.

My primary feedback is around the Phase field and whether it is really necessary to expose a state machine in order to achieve ordering. My understanding is that controllers should be locally ignorant of the global state (it is unstable after all) and instead ordering should be an emergent property of the system.

The reason I am of two minds on this is that:

  1. On the one hand, the current design parallels Pod Phases which are already familiar since Machines were designed to parallel Pods. This might be an asset for developers and operators who need to learn these new CRDs. However, phases have a number of problems related to extensibility. Once phases are defined it inevitable that users and programs will come to depend on them.

  2. On the other, there has been a long-running debate on the best way to avoid exposing state machines. Cf. Eliminate Phase and simplify Conditions. For some recent background, I recommend watching Daniel Smiths talk at KubeCon EU 2019 [1].

  3. While I do think we need to stop writing controllers in terms of state machines, I don't think it would be too hard to refactor the code to rely on top-level fields or Conditions instead of phases.

[1] “The Kubernetes Control Plane ...For Busy People Who Like Pictures”, https://www.youtube.com/watch?v=zCXiXKMqnuE
https://static.sched.com/hosted_files/kccnceu19/c0/control%20plane%20in%20pictures%20final.pdf


## Summary

This proposal outlines splitting up the Machine data model into two new providers/controllers along with the generic Machine controller. The two new providers are the Infrastructure provider and the Bootstrap provider. The infrastructure provider handles provisioning infrastructure (cloud, bare metal, virtual) and the bootstrap provider handles Kubernetes bootstrapping. This change will improve separation of concerns for the Machine controller and gives more degrees of freedom to the user.
Copy link
Contributor

@davidewatson davidewatson Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully, this will be my least interesting comment: it is easier to review markdown when an ~ 80 character line limit is observed. This makes it easier for the reviewer to point to the specific text they are commenting on. Cf. https://sembr.org/, https://rhodesmill.org/brandon/2012/one-sentence-per-line/

- Description: The state of the infrastructure provider.

```go
type Bootstrap struct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Bootstrap were a CRD and not just a type then this field could copy the ProviderSpec pattern:

type ProviderSpec struct {

Copy link
Contributor

@dhellmann dhellmann Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the Bootstrap.ConfigRef going to point to a provider-specific CR like you describe?

Copy link
Contributor

@davidewatson davidewatson Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? When I saw ConfigRef I assumed it pointed to a ConfigMap. Maybe it could use a better name...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. ProviderDataRef?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the idea of finding a better name, but I would like to avoid ProviderDataRef, since I think it could cause confusion based on having only a singular "Provider" in v1alpha1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found this separate structure confusing at first. I don't see the rationale and as @dhellmann says, it is not consistent with other references. I second to simplify the model in this sense, unless the bootstrap structure has further motivation.

Copy link
Contributor

@ncdc ncdc Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought process was around yaml/json UX. Option 1:

spec:
  bootstrap:
    data: "xyz"
    configRef:
      apiVersion: foo/v1
      kind: Foo
      name: foo123
  ...

Option 2

spec:
  bootstrapData: "xyz"
  bootstrapConfigRef:
      apiVersion: foo/v1
      kind: Foo
      name: foo123
  ...

With Option 1, we're grouping multiple things related to bootstrapping together. With Option 2, you have to write "bootstrap" multiple times.

If everyone is largely in agreement with the concept of a reference to a custom resource for bootstrap configuration, and a field to hold the bootstrap data, could we do the following?

  1. Indicate that you agree w/the concept
  2. Continue to debate the name and structure, BUT
  3. After all other comments/issues/concerns are addressed, let the proposal merge in whatever state these bootstrap fields happen to be in, and we can continue to have the naming/structure discussion, if needed, after the proposal merges

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per k8s API Naming Conventions:

The name of a field referring to another resource of kind Foo by name should be called fooName. The name of a field referring to another resource of kind Foo by ObjectReference (or subset thereof) should be called fooRef.

Therefore I think it should be bootstrapConfigRef and option 2 (to avoid stuttering).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer option 1, better grouping of related fields/data. @davidewatson This kinda falls outside of that naming convention (although I understand why you make the point) because there is no BootstrapConfig kind here, but rather provider-specific CRDs, called arbitrary names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair.

Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several comments, but the only one that needs to be dealt with before I would be happy to approve this as a provisional KEP is the stuff that talks about the Phase field in the state transition section.

- Description: The state of the infrastructure provider.

```go
type Bootstrap struct
Copy link
Contributor

@dhellmann dhellmann Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the Bootstrap.ConfigRef going to point to a provider-specific CR like you describe?

- **To add**
- **Phase**
- Type: `MachinePhaseType`
- Description: Phase represents the current state of a Machine. This field is not meant to be used as a source of truth. Controllers working on Machines should always respect the documented transition conditions to determine the state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one point we discussed removing the single "phase" field in favor of having a separate status field for each aspect we expect to be managed by a separate controller (see BootstrapPhase and InfrastructurePhase below). If we still want to do that, I can make those edits in a follow-up PR.

Copy link
Contributor

@davidewatson davidewatson Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, top-level fields or conditions were discussed. Personally, I would like to consider the "outlook" model. History has proven this could be a long conversation though. Bias for action, at least while we are still in alpha... :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second @dhellmann proposal. Having conditions is more flexible and considering we are moving to implement v1alpha2 as a mean of learning, leaving room for extensions without force breaking changes is an interesting strategy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have individual "statuses" that we can represent as separate boolean values, +1 to that. I am opposed to an arbitrary array of strings for conditions, however.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd definitely like to see a better approach to a string-based-Phase as a follow-up CAEP 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidewatson I also like the outlook model here, would be interested in chatting about this for potential follow up.

- `Machine.ObjectMeta.DeletionTimestamp` is not `<nil>`

#### Expectations
- Node associated should be drained and cordoned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation question: Will the machine controller do that or do we expect each provider implementation to do it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should be doing this in a centralized manner, especially since the underlying "infrastructure" provider here would not really have knowledge of Kubernetes itself as it does today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be the responsibility of a singular cluster-api component, presumably the machine controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. There's also a NodeMaintenance operator that the KubeVirt team might be willing to bring further upstream. https://github.com/kubevirt/node-maintenance-operator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as it happens, that is already in progress: kubernetes/enhancements#1080

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an end user, I want symmetry: If CAPI joins the node, I expect CAPI to gracefully remove that node, too.

At the KubeCon EU Contributors' Summit, I asked @lavalamp about the idea of a "server-side drain," and, from what I recall, he thought applying taints would have the desired effect.

I think we decided that the CAPI control plane has access to the workload cluster API. The Machine controller would have to (1) identify the node (it should be able to do that from the Machine object) and (2) apply Taint(s) to that node.

@sethp-nr
Copy link
Contributor

Overall, I'm a big 👍on the proposal, since I think it introduces some much-needed sequencing and layering in the current flow in a really nice way. I'm having a little trouble grasping how to map some of the problems we're working with today onto this model, though, so I'd appreciate some help understanding how these pieces will fit together.

My questions are around the Bootstrap.ConfigRef and InfrastructureRef contract – today using CAPA, my machine definition might look something like:

apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
...
spec:
  providerSpec:
    value:
      kubeadmConfiguration:
        join: <some config>

After this proposal, I understand that to be changing to something more like this:

apiVersion: cluster.k8s.io/v1alpha2
kind: Machine
metadata:
  name: foo-bar-xx2zy
...
spec:
  bootstrap:
    configRef:
      apiVersion: kubeadm.bootstrap.k8s.io/v1
      kind: KubeadmBootstrapConfig
      name: foo-bar-xx2zy
---
apiVersion: kubeadm.bootstrap.k8s.io/v1
kind: 
metadata:
  name: foo-bar-xx2zy
joinConfiguration: <some config>

And something similar for the Infrastructure-y side of things (including, for CAPA, things like subnet id). Is that the right mental model?

If so, what is the minimal constellation of objects I need to turn up a Node, assuming I want to accept the defaults for everything (or defer the decision to some other code)?

The other aspect I'm struggling with is around templates. I see this description for 1:N references:

For the MachineSet, the references to the Bootstrap and Infrastructure resources are references to templates. When the MachineSet Controller creates a new Machine replica, it retrieves the Bootstrap and Infrastructure resources referenced by the MachineSet’s Machine spec template and creates copies for that Machine to use exclusively.

I'm having trouble visualizing how the objects would all line up once they're copied, is there an example available? Also, would the MachineSet controller need to know to avoid copying e.g Bootstrap.Data, or is it expected that the targets of template refs are distinguishable in some way from the perspective of the bootstrap/infrastructure controller?


## Motivation
- A singular provider that combines infrastructure provisioning and Kubernetes bootstrapping is too restrictive. Many current v1alpha1 providers target single cloud providers. If a user wants to install a specific Kubernetes distribution (Kubernetes, OpenShift, Rancher, etc.), it is left to the provider to add support for it. In practical terms, we are not able to guarantee that a user can install the same distribution regardless of infrastructure environment.
- Both users and developers are confused by the code organization split where all the controllers live in the cluster-api repository, but some controllers are part of the generic cluster-api pod (MachineDeployment, MachineSet), while other controllers are part of each provider’s pod (Cluster, Machine). We would like for the code location to correspond to the pod, which hopefully will reduce both user and developer confusion.
Copy link
Contributor

@vikaschoudhary16 vikaschoudhary16 Jun 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Some generic controllers like MachineDeployment and Machineset live in the cluster-api repo, while some provider specific controllers like Cluster and Machine , live in the provider specific repo.

Deployment of controllers depends on choice. For example, in OpenShift, all controllers are deployed in single pod. Vendoring of cluster-api is cumbersome though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this actually has to change. The motivation is meant to define the current state of the world for kubernetes-sigs repositories and code organization. While we're aware that other implementation exist outside, the main focus is to provide a holistic view limited the subproject's repositories.

Copy link
Contributor

@ingvagabund ingvagabund Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see direct implication of the second motivation point wrt. Machine States & Preboot Bootstrapping title. Though I understand it's important to provide layout and manifests that reflect structure of the new to-be-done cluster-api plane and relation between individual components. And also that's up to each consumer to decide if all the components of the cluster-api plane will live inside the same pod (or, will be distributed among multiple pods/deployments). Given the fact, implication of the second motivation item transcends beyond cluster-api, making what the right view might look like on cluster-api repository not easily predictable and questionable. Still, it's important to provide some default view.

As I read the second item, it does not sufficiently express its intention in the scope of the proposal. Does the first part reflect the following rephrasing?

Both users and developers are confused by the code organization split where generic controllers (MachineDeployment, MachineSet) live fully inside cluster-api repository while other (Cluster, Machine) are partially implemented in cluster-api repository but vendored in provider repositories that implement provider specific bits. Thus, giving existence to separate pods which both deploy all controllers instead of a single one.

If so, can we be more specific in what We would like for the code location to correspond to the pod, which hopefully will reduce both user and developer confusion. actually means? Is it to create completely new layout where machine controller will become fully general without injecting actuator implementation? And, actuators will be turned into separate controllers with their own life-cycle where a relation between machine life-cycle and actuator will be provided? Or, machine controllers will still be fully vendored in provider repositories as it is today and the intention is to just change currently provided manifests in the cluster-api repository?

EDIT: Asking the question in the context mentioned above so the motivation is clear enough without reading the rest of the proposal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ingvagabund @vikaschoudhary16 I think this point is just poorly phrased and I'll need to rework it a little later or tomorrow. Thanks for the great feedback. I read it again multiple times and I think it's unclear what we initially wanted to communicate.

How does that sound?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intention of this to be that a clearer separation of concerns between infrastructure and bootstrap providers will be reflected in code organisation and thus easier for developers to grok. It's the pod bit here that confuses me: that is a deployment issue, rather than a code organisation issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Thank you!!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I thought the wording made sense, could be refined. However I want to make certain we are not conflating ideas.

Deployment of controllers depends on choice. For example, in OpenShift, all controllers are deployed in single pod. Vendoring of cluster-api is cumbersome though.

Is orthogonal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded, ptal

@ncdc
Copy link
Contributor

ncdc commented Jun 11, 2019

@sethp-nr

And something similar for the Infrastructure-y side of things (including, for CAPA, things like subnet id). Is that the right mental model?

Yes, that's the right way to think about it.

If so, what is the minimal constellation of objects I need to turn up a Node, assuming I want to accept the defaults for everything (or defer the decision to some other code)?

  • Machine
  • Something to configure bootstrap data, which could be:
    • You set machine.spec.bootstrap.data yourself
    • A bootstrap provider custom resource, such as KubeadmBootstrapConfig. The controller for this could generate defaults for unset fields (as it currently does in CAPA, for example).
  • An infrastructure configuration custom resource, such as AWSMachineConfig
    • This would specify the instance type, region, etc. for AWS

We haven't fully thought through how to minimize this for a "just give me the defaults" case. @detiber was thinking we could potentially have some specially-named ConfigMaps to use as templates for the bootstrap config resource and maybe even the infra config resource. If the bootstrap ref is unset, the machine controller could look up the ConfigMap and create a new instance of the resource contained inside it (and do the same thing for an unset infra ref, too).

Re templates...

I'm having trouble visualizing how the objects would all line up once they're copied, is there an example available? Also, would the MachineSet controller need to know to avoid copying e.g Bootstrap.Data, or is it expected that the targets of template refs are distinguishable in some way from the perspective of the bootstrap/infrastructure controller?

If we have this partial MachineSet:

metadata:
  name: machine1
spec:
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: kubeadm.whatever/v1
          kind: KubeadmBootstrapConfig
          name: worker-template
        data: nil

the machine set controller would

  1. fetch worker-template, create a new KubeadmBootstrapConfig named ... maybe machine1 (to match the machine's name) (??? - we need name generation rules)
  2. create a new Machine with spec.bootstrap.configRef pointing to the newly created machine1 KubeadmBootstrapConfig

If the MachineSet has spec.template.spec.bootstrap.data filled in, I would interpret that as the user's intent to use the same bootstrap data for all Machine replicas created from that MachineSet.

Does this help clarify things?

@sethp-nr
Copy link
Contributor

Thanks @ncdc, that helps. I'm not super familiar with the tradeoffs between using a TypedReference vs. an embedded RawExtension (having seen neither before approaching the CAP* projects 😄) – I see in the proposal that validation is a concern, as is authz. Are there others?

I'm also interested in evolving the "I don't want to choose my infrastructure/bootstrapper" story; it would be nice(tm) for our higher-level tooling to be independent of that decision making, but I wouldn't put that above implementing what we've got here.

Re templating: I am also interested in how the naming rules will shake out! Lining up the name with the name of the Machine feels reasonable to me, but do I worry a little bit about conflicts and mistakes and race conditions, etc. That should all shake out from the implementation, though.

I also wonder if it would be worth adding a blurb about the bootstrap + infrastructure providers' responsibilities that they should avoid allocating scare resources in response to a new/partial resource, since that one might be a template and won't end up hooked up to a Machine?


#### As a Kubernetes operator, I’d like to provide custom bootstrap data without the use of a Kubernetes controller.

#### As a Kubernetes operator, I’d like to monitor the progress of fulfilling a Machine and understand what errors, if any, have been reported by the controllers involved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this looking for the phases from the above diagrams (Provisioning, Provisioned, Running, etc) or something more detailed? For example, a more complex on-prem environment might have separate steps for requisitioning a machine, bringing up the machine, installing a base, etc that are understandable and relevant to the operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first iteration, this is expressed by the Phase field as you mentioned. In the future, we might want to allow for more details to be bubbled up in the Status.


#### As an infrastructure provider author, I would like to build a controller to manage provisioning machines using tools of my own choosing.

#### As an infrastructure provider author, I would like to build a controller to manage provisioning machines without being restricted to a CRUD API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What business/usage reasons cause CRUD to be a restriction?

Copy link
Member Author

@vincepri vincepri Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to @dhellmann which contributed the above user story

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare metal hosts can't be created or deleted, so we've had to simulate doing those in those particular methods of our current actuator. "Creating" a bare metal host involves a provisioning operation that may take as long as 20 minutes. Ideally we would have some way to continue to report status to the Machine during that time, which is easier if we have a reconciliation loop than if we have to fit those updates into the CRUD API. "Deleting" the host also takes many minutes while the storage is wiped.


Figure 2 is a simplified diagram of the configuration of a Machine using a Kubeadm bootstrap provider. Machine.Spec.Bootstrap.ConfigRef points to a bootstrap provider-specific custom resource; in this example, a KubeadmBootstrapConfig called controlplane-0.

![Figure 3](https://i.imgur.com/dKUJoSf.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figures 3, 4, 5, 6, 7, and 8 all have the same title (with different numbers). This is confusing since they have different content. Maybe the titles could be changed to highlight the focus of the individual diagrams?

- Description: The state of the infrastructure provider.

```go
type Bootstrap struct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document more examples of what Bootstrap controllers might look like? I'm concerned this is too theoretical right now and we don't fully understand the needs to know if it'll satisfy our goals.

For example, the KubeadmBootstrap approach is somehow generating an output (user data / cloud-init), however as a community we have not yet established support for a) if OS images are expected to support a single input/user-data format, b) if these bootstrap controllers are supposed to know who/what they're generating the data for (to influence its output format), c) if providers are going to end up needing to transform these bootstrap-specific outputs.

Also, it's not clear to me how bootstrapping (or with KubeadmBootstrap) would be dealing with the concerns around installing a "specific Kubernetes distribution". Is that going to be a part of the kubeadm bootstrap spec? Is that some other undocumented convention which is expected to be shared across bootstrap implementers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that having the logic of the boostrap configuration separated into its own controller(s) avoids the need to fix which user data format is expected. Each infrastructure provider will support its own format(s) and the different controllers (ideally independent of the infrastructure) will provide the boostrap data in an specific format, which could be used in different infrastructures. This is the approach used, for instance in the Gardener project for handling bootstrap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to be honest... Almost 0 specs strictly adhere to their original design due to discovery that occurs during implementation. The goal here is to enumerate a limited set of options, pick one, go and iterate.

I fully expect we will make mistakes, and that is ok, we will refine over time. So long as we've done a reasonable level of diligence.


Figure 2 is a simplified diagram of the configuration of a Machine using a Kubeadm bootstrap provider. Machine.Spec.Bootstrap.ConfigRef points to a bootstrap provider-specific custom resource; in this example, a KubeadmBootstrapConfig called controlplane-0.

![Figure 3](https://i.imgur.com/dKUJoSf.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In figure 3, the first three "opt" boxes that overlap the blue column all have the same condition ("[Required only if the object hasn't been seen before]"). I think all three of them could be a single box, since if one of them happens, then all of them happen.

Similarly, the two opt boxes that overlap both columns share a single condition, and could also be combined into a single box.

- **To add**
- **ConfigRef [optional]**
- Type: `*corev1.TypedLocalObjectReference`
- Description: ConfigRef is a reference to a bootstrap provider-specific resource that holds configuration details. The reference is optional to allow users/operators to specify Bootstrap.Data without the need of a controller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having one reference assumes there's only one bootstrap provider involved. However, consider the following actual use case:

  • One controller provides the basic OS configuration (e.g. cloud-init)
  • Another controller provides patches for the OS configuration depending on, for example, the machine class, to accommodate some specific workload (e.g. kernel settings).

These two configuration could be made by the same controller, but separating these two concerns makes more sense as the generic OS configuration could be used in multiple scenarios and providers.

Therefore, the ConfigrRef could be a list of named references (a map) so that each controller knows which reference(s) to set/access.

The need for multiple references could be extended to other stages, like providing infrastructure configuration before actual provisioning of the machine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this proposal prevents that possibility. I had imagined that being a bootstrap provider problem, something like:

kind: Machine
...
  configRef:
    apiVersion: mergable.something/v1
    kind: MergableBootstrapConfig
    name: my-config
---
apiVersion: mergable.something/v1
kind: MergableBootstrapConfig
spec:
  my-base-config:
      apiVersion: kubeadm.whatever/v1
      kind: KubeadmBootstrapConfig
      name: base-config
  my-workload-tuning-config:
      apiVersion: kernel.config/v1
      kind: SomeCustomKernelConfig
      name: workload-tuning

with the implementation of whatever controller reads the MergableBootstrapConfig resources defining the merge semantics and preserving the contract with the consumer of the .Data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is possible if ONE provider manage that, but not if two independent providers must collaborate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow what you mean. My example had three bootstrap providers: Kubeadm, SomeCustomKernel, and a third "merging" provider whose responsibility it is to broker the interaction between the other two. Wouldn't Kubeadm and SomeCustomKernel count as independent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that all these three controllers must agree on working on a common CRD which is not specified as part or CAPI, so there's some dependency between them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another controller provides patches for the OS configuration depending on, for example, the machine class, to accommodate some specific workload (e.g. kernel settings).

That's a pretty wild scenario, if you are doing this I'd recommend against it. The recommended approach would be to pre-bake your images and simplify your bootstrap sequence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another controller provides patches for the OS configuration depending on, for example, the machine class, to accommodate some specific workload (e.g. kernel settings).

That's a pretty wild scenario, if you are doing this I'd recommend against it. The recommended approach would be to pre-bake your images and simplify your bootstrap sequence.

This is not a wild scenario. This is essentially what we do. Provisioning an instance has nothing to do with creating the artifacts necessary to provide to a cloud (such as user-data), nor does it have much to do with taking an action on that instance afterwards. Assuming the 'preboot' portion is the same controller as the 'postboot' portion is not a good fit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Software provisioning is it's own fractal tool chain. We should allow extensions to enable, but it's never been a primary design of cluster api since epoch, and should be completely separate from cluster API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd like to see the simple case working before we deal with the complexity of merging configurations in the way described here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Software provisioning is it's own fractal tool chain. We should allow extensions to enable, but it's never been a primary design of cluster api since epoch, and should be completely separate from cluster API.

I understand that the value of providing this kind of extension mechanisms is precisely to allow customization to diverse use cases without having to deal with them in CAPI. The bootstrap controllers live outside CAPI.

What I understand CAPI should ensure, within reasonable boundaries of complexity, is to offer an extension mechanism with is flexible enough to handle diverse use cases. In this case, changing from one single element to a map of elements doesn't seams to be utterly complex.

I agree, however, with @dhellmann that we can iterate over the design.

### Non-Goals
1. To modify the Cluster object.
1. To fully implement machine state lifecycle hooks. This must be part of a future proposal that builds on these proposed changes.
1. To setup OS configuration, install software or any other features related to image building.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the example we use for boostrap configuration is cloud-init which allows setting up OS configuration and installing required software, therefore making this distinction arbitrary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but cloud-init is not a universal standard (e.g. we're considering building an ignition-driven bootstrap provider). I read that statement as saying it's not the Machine's responsibility to define what software need be installed, but instead that belongs to the interaction between the bootstrap provider and the thing that ultimately provisions the resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's a misunderstanding here about who uses the bootstrap information. It is provided by the bootstrap controller and used by the provider specific infrastructure provisioner. Both of these elements live outside CAPI. Therefore, I don't see the point about been opinionated with respect of what the bootstrap data contains or is used.

- Description: Bootstrap is a reference to a local struct which encapsulates fields to configure the Machine’s bootstrapping mechanism.
- **InfrastructureRef [required]**
- Type: `corev1.TypedLocalObjectReference`
- Description: InfrastructureRef is a required reference to a Custom Resource Definition offered by an infrastructure provider.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean a "Custom Resource" (an object) or a "Custom Resource Definition" (a schema)?

@vincepri vincepri force-pushed the states-preboot-proposal branch from f3a79a2 to 1f0ce6d Compare June 18, 2019 14:38
@vincepri
Copy link
Member Author

/retest

@vincepri vincepri force-pushed the states-preboot-proposal branch from 1f0ce6d to 9872bcd Compare June 20, 2019 17:04
Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

I see no 'major blockers' and if they arise we will refine.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 21, 2019
@davidewatson
Copy link
Contributor

/lgtm
/approve

There may be significant changes necessary as we iterate (e.g. consider the outlook model, break apart the interdependence between the Machine CRD and the machine infrastructure and bootstrapping controllers, etc.) That said I think it is important for us to iterate. The year is almost half over...

@ncdc
Copy link
Contributor

ncdc commented Jun 21, 2019

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidewatson, ncdc, timothysc, vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [davidewatson,timothysc,vincepri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.