-
Notifications
You must be signed in to change notification settings - Fork 585
[OCPCLOUD-1416] Add ControlPlaneMachineSet to Machine API group #1112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OCPCLOUD-1416] Add ControlPlaneMachineSet to Machine API group #1112
Conversation
78d6932 to
db4ef19
Compare
| @@ -0,0 +1,244 @@ | |||
| package v1beta1 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we won't change this and it will be on by default, it shoudl be v1. I'm thinking about writing a script to ban new APIs that try to be beta :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we won't change this
Now that is a good question, we are currently trying to make this as compatible with Cluster API as possible to support future work there, but, we will definitely want to be adding fields to this in the future and possibly deprecating some of it depending on how the Cluster API project stabilises their (v1beta1) API.
I've written a bunch about this in the enhancement which I thinks we may actually want to evolve this in the future.
For example, depending on how things go, we may want to move the current template into a discriminated union (which I believe could be achieved on upgrade using a conversion webhook) to allow a new machineTemplate or some other field to be introduced to support CAPI as well.
I've not really seen how other OpenShift APIs have actually evolved, so I'd be keen to learn more about this and what we are signing ourselves up for with these changes that we are predicting
| // failure domain field. This will be overriden when the Machines | ||
| // are created based on the FailureDomains field. | ||
| // +kubebuilder:validation:Required | ||
| Spec MachineSpec `json:"spec"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
! not a reference, an actual MachineSpec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I hate this too! But, it matches our existing APIs (compare this CRD with the MachineSet CRD, they are very very similar). If we had a chance to start our API again, I would love to have a separate CRD for the MachineSpec that is then referenced (as CAPI does), but I don't think we can easily change our existing API to do this and I know historically there were a few strong voices against the proliferation of CRDs within the MAPI project.
I think it would be weird/confusing to have this as a v1 with a reference when the rest of our Machine API just inlines the specs
| // More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata | ||
| // Labels are required to match the ControlPlaneMachineSet selector. | ||
| // +kubebuilder:validation:Required | ||
| ObjectMeta metav1.ObjectMeta `json:"metadata"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What information is valid to set here and why would I set it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches the pattern of the MachineSet, where the labels and annotations in this objectmeta become the labels and annotations on the newly minted machines (much like in a deployment/replicaset/pod relationship).
I'm gonna take a stab and say you're going to suggest we trim this to our own objectmeta object which supports only the fields we need? (We sort of did this in MachineSet already, can't recall exactly the history of that, will do some archeology and come back)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we added that because of https://bugzilla.redhat.com/show_bug.cgi?id=1702089, for some reason the CRD generation wasn't generating a schema for the objectmeta back then and so the solution was to embed it in our own types so we could get schema validation. We copied from CAPI so that's why we have the fields we have, even though we only actually use labels and annotations 🤔
| Template ControlPlaneMachineSetTemplate `json:"template"` | ||
| } | ||
|
|
||
| type ControlPlaneMachineSetTemplate struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being used as a MachineSet that has different RBAC permissions? Would this be more naturally modelled as a MachineSet that only the controller you have managing this resource is allowed to update directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great question which I hope the [alternatives section(https://github.com/openshift/enhancements/pull/1008/files#diff-167c14da636a95920dc7f720fd0d0be7490305b02914e4113ee9aebf2f51a3bdR931-R965) in the enhancement can help to answer.
Yes, this is a bit like MachineSets, but with previous iterations of this enhancement which used a collection of MachineSets, there were an awful lots of caveats, gotchas and risks of "user does something" and we don't really have a safe way to react. Having spoken with several of the staff engineers while taking over this work they shared concerns similar to mine that by layering MachineSets, we are exposing a resource in the middle that isn't needed and, given this is the control plane we are trying to manage, could be dangerous if a customer goes in and starts modifying it
ac4505e to
37c3a48
Compare
143d6d8 to
0058f53
Compare
|
As this project is not part of the no-ff process, I'll apply the labels manually /label docs-approved |
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| ) | ||
|
|
||
| // +genclient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-namespaced. If someone wants to create an externally managed control plane variant, that would be a different type so they can evolve independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with this statement
There are a number of arguments for making this namespaced:
- The rest of the Machine API is namespaced, it is confusing to have part of the API non-namespaced
- We have an existing design to migrate users from MAPI to CAPI, this mirrors resources and converts them to work with Cluster API. By making this a cluster scoped resource we would have to create some new migration mechanism specifically for this resource, which then would need documenting separately to the rest of Machine APi
- If this is cluster scoped, we cannot use it with CAPI. CAPI Control Plane implementations (which we intended to make this), must be namespaced - if we make this cluster scoped, there is no point having the discriminated union over future Machine types, we would have to start again to implement a CAPI version of this (this is a technical limitation, Cluster API will add an owner reference to the ControlPlane implementation CR for a namespaced object, you cannot have an owner reference from a namespaced to a cluster scoped object)
- This project has already been designed such that it could be used in a centralised machine management or Cluster API style management/guest cluster pattern, making it cluster scoped would prevent us from leveraging this within those use cases and we would have to create an identical resource with an identical implementation to make this work with multiple instances within the management cluster
- Restricting this to cluster scoped seems to add a whole bunch of restrictions which in the future I think we will come to regret, as it will inevitable cause a lot of work for us in the future duplicating this to make it work multiple times
I'm aware of your concern with the fact that logically there should only be one of these in an HA cluster, I believe we can resolve that by restricting it initially to a single instance in a single namespace, we already restrict Machine API to a single namespace, this is something users are familiar with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counterpoints in favor cluster scoped include making it impossible to let a user create one in the wrong namespace, making it impossible to accidentally grant privileges to namespace level users by using an uncareful *, making it impossible to use a different namespace that may eventually have an impact, and being at a scope that is universally recognized as privleged.
In addition, given that migration would still be one to one it could still be done with a namespaced reference.
With regard to "but we could use the same type for a different topology management mode", it seems extremely likely that the set of capabilities and control desired in self-managed versus externally managed be different as we've seen with other operators where the constraints imposed by external management are different. This is likely to lead to logically divergent schemas that you will you attempt to force into a single type.
However, in spite of all that, I am willing to let this play out if you really want it. But when you start developing divergent schemas, you will need a plan either maintain two different schemas (and thus a very confusing controller on top of it) or develop some killer validation routines that manage to avoid the cyclical re-bootstrapping problem.
| // Currently, the only valid value is machine.v1beta1.machine.openshift.io. | ||
| // +unionDiscriminator | ||
| // +kubebuilder:default:="machines.v1beta1.machine.openshift.io" | ||
| // +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no benefit to optional, let's go required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given there's only one valid value, I thought making it optional and defaulting it to that one valid value would be beneficial to users.
Is removing a default later in the cycle considered a breaking change? Eg once we add a separate machine type to this enum?
| // OpenShiftMachineV1Beta1Machine defines the template for creating Machines | ||
| // from the v1beta1.machine.openshift.io API group. | ||
| // +kubebuilder:validation:Required | ||
| OpenShiftMachineV1Beta1Machine *OpenShiftMachineV1Beta1MachineTemplate `json:"machines.v1beta1.machine.openshift.io,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually know. is this a valid json key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell, it seems to be, I managed to create a resource with Kind and this CRD installed, and I can get the value out using jq, eg this works, you just have to make sure you've quoted the key
kubectl get controlplanemachineset cluster -o json | jq ".spec.template.\"machines.v1beta1.machine.openshift.io\""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I am wondering if some libraries won't handle this well, perhaps using underscores would be less likely to cause issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I've switched over to underscores which seems to work well as well, I suspect there are json libraries that split on periods so probably not the best choice
| // +optional | ||
| Type ControlPlaneMachineSetStrategyType `json:"type,omitempty"` | ||
|
|
||
| // This is left as a struct to allow future rolling update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for noting this. It would not have occurred to me otherwise.
| } | ||
|
|
||
| // AWSFailureDomain configures failure domain information for the AWS platform | ||
| type AWSFailureDomain struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a "pick one" type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, in this struct you can have one, or the other, or both, is completely flexible and up to the user. I think in most cases users will need to use both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation will need to enforce that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you enforce "you must set at least one of these" with kubebuilder validation tags? Perhaps MinProperties does this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yep, just checked that and it works, will double check other placements as well, they all have only one field so will have to make sure they're all required
| // +listType=map | ||
| // +listMapKey=type | ||
| // +optional | ||
| Conditions []metav1.Condition `json:"conditions,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: conditions first please.
0058f53 to
1a56ef5
Compare
|
@deads2k I've switched the Just got one outstanding question on the default, which I've left alone for now, PTAL #1112 (comment) |
1a56ef5 to
4388a4d
Compare
|
Actually, the more I think about this, this resource has to be namespace scoped, PTAL #1112 (comment) |
4388a4d to
73953e9
Compare
73953e9 to
89e1891
Compare
I suggest against taking this approach and instead building the API that will allow evolution of different things to be different. I've left a comment above explaining why, but I will let this play out. /approve |
89e1891 to
94fcd74
Compare
|
Made some minor changes adding missing optional tags for the status fields and making the |
elmiko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one spelling error, otherwise looks good to me
machine/v1/types_aws.go
Outdated
| // a validation error. | ||
| // +union | ||
| type AWSResourceReference struct { | ||
| // Type determines how the reference will fetch the AWS resouce. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Type determines how the reference will fetch the AWS resouce. | |
| // Type determines how the reference will fetch the AWS resource. |
94fcd74 to
f162692
Compare
elmiko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
f162692 to
bf40233
Compare
|
@JoelSpeed: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, elmiko, JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR adds the new
ControlPlaneMachineSetCRD to the machine api group as described in openshift/enhancements#1008To provide an example, a valid ControlPlaneMachineSet will look something like