Add enhancement for AWS Placement Groups in Machine API #995

JoelSpeed · 2022-01-05T17:48:59Z

This PR adds a new enhancement describing how we will integrate support for AWS Placement Groups into the Machine API.

Notably, as we have a specific customer requirement, we are not planning to introduce this into the installer initially as this is not required by the customer. This has been left as a future implementation and is noted within the enhancement.

sdodson · 2022-01-05T18:54:16Z

/uncc @imcleod @kbsingh
/cc @sdodson @deads2k @danwinship

jeana-redhat

Noted a few things while reading - pretty sure my questions don't need to be addressed until closer to actually creating product docs for this.

jeana-redhat · 2022-01-05T17:53:18Z

enhancements/machine-api/aws-placement-groups.md

+### User Stories
+
+- As an OpenShift cluster administrator, I would like to use a spread placement group to ensure that my critical
+  workloads are unlikely to suffer from hardware failures simultanesouly.


Suggested change

workloads are unlikely to suffer from hardware failures simultanesouly.

workloads are unlikely to suffer from hardware failures simultaneously.

jeana-redhat · 2022-01-05T17:55:53Z

enhancements/machine-api/aws-placement-groups.md

+
+- As an OpenShift cluster administrator, I would like to use a spread placement group to ensure that my critical
+  workloads are unlikely to suffer from hardware failures simultanesouly.
+- As an operator of HPC workloads on OpenShift, I would like to use a cluster placement group to ensure that there is


HPC

I don't see this term in existing product docs. Google tells me it stands for "high-performance computing", I'm assuming that is the meaning here?

Yeah that's my bad, i should have spelled that out on first usage, will update. You're correct

All good. Mostly I am wondering if we refer to this in prod docs with some other term, since it seems like the idea of HPC ought to have come up. But that's for me to dig in on :)

enhancements/machine-api/aws-placement-groups.md

elmiko

this generally reads well to me @JoelSpeed. i added a comment about the CAO that we discussed during our team standup. my only other comment is that i think it would be nice to call more attention to the idea that this feature is highly sensitive to user configuration, but i'm not sure where that might fit, maybe drawbacks?

enhancements/machine-api/aws-placement-groups.md

danwinship · 2022-01-07T13:20:44Z

enhancements/machine-api/aws-placement-groups.md

+There are currently no viable alternatives. This feature is unique in AWS and there isn't an alternative way for users
+to make use of the benefits provided by placement groups.


(It's not unique to AWS; Azure and GCP have similar features.)

I was trying to highlight that I don't think there's another way to place groups of instances together in AWS, so there is no alternative, I'll try to reword this

enhancements/machine-api/aws-placement-groups.md

danwinship · 2022-01-07T13:46:35Z

enhancements/machine-api/aws-placement-groups.md

+When using Cluster placement groups it is recommended to launch all instances of the same instance type and if
+possible, to launch instances simultaneously to prevent the likelihood that capacity limits are reached within the
+placement group.


maybe "AWS recommends launching ..." rather than "it is recommended to launch ..." to make it clear that we aren't recommending that users do something that we don't actually let them do...

danwinship · 2022-01-07T13:52:35Z

enhancements/machine-api/aws-placement-groups.md

+
+###### Limitations of Spread Placement Groups
+
+- Spread placement groups are not supported for Dedicated Instances.


maybe clarify somehow that this isn't about "OpenShift Dedicated"? (likewise below)

Will prefix with AWS for now, perhaps linking to the AWS docs might help

danwinship · 2022-01-07T13:54:25Z

enhancements/machine-api/aws-placement-groups.md

+
+### Open Questions
+
+- Do we need to propogate user defined tags from the Infrastructure resource onto the placement groups we create?


Suggested change

- Do we need to propogate user defined tags from the Infrastructure resource onto the placement groups we create?

- Do we need to propagate user defined tags from the Infrastructure resource onto the placement groups we create?

danwinship · 2022-01-07T14:00:41Z

enhancements/machine-api/aws-placement-groups.md

+  // This parameter is only used when a Machine is being created and the named
+  // placement group does not exist.


It seems like maybe if the placement group already exists, that it should ensure that it has the correct type, and error out if not?

Otherwise someone might accidentally use the same GroupName for two different MachineSets with different placement needs.

(If MachineSets and placement groups were 1-to-1 then the placement group name could just be based off the MachineSet name...)

It seems like maybe if the placement group already exists, that it should ensure that it has the correct type, and error out if not?

We could implement that, though at the moment, the group type is optional, so would it make sense to ignore the check if the placement group type is empty?

Otherwise someone might accidentally use the same GroupName for two different MachineSets with different placement needs.

It is valid to use the same placement group across multiple machinesets, though yes, I can see the intention here, I think if the user has set an intended type, then checking against this should be safe. The get out jail card is to either fix the type or remove the type field

(If MachineSets and placement groups were 1-to-1 then the placement group name could just be based off the MachineSet name...)

This isn't necessarily true as there are valid use cases for using the same placement group across multiple machinesets, have mentioned this elsewhere

sdodson

Very well put together enhancement, just a few questions and comments.

enhancements/machine-api/aws-placement-groups.md

deads2k · 2022-01-18T18:29:00Z

enhancements/machine-api/aws-placement-groups.md

+### Non-Goals
+
+- Allowing this feature to be leveraged during installation (day 0)
+  - Users will be able to modify MachineSets after manifest creation if required but this will not be exposed as a


if I render manifests and then modify them, will I be able to set a PlacementGroup?

Yes for worker nodes, this won't work for control plane machines though as we don't use MAPI to create them. Will clarify this statement to show this is workers only

deads2k · 2022-01-18T18:29:28Z

enhancements/machine-api/aws-placement-groups.md

+  - Users will be able to modify MachineSets after manifest creation if required but this will not be exposed as a
+    part of the installation configuration before manifest generation
+- Support for Placement Groups for Control Plane Machines
+  - Users may replace their control plane at a later date using future automation if they wish to have their control


If you claim this, I'd like like to see exactly how this will happen. If you choose to permanently exclude it, you don't have to design it.

Heh, so the future automation I'm referring to is what I'm designing in #1008, now that the PR is up, I could link to this doc I guess but I think given how early that other enhancement is in review, I might wait a bit

deads2k · 2022-01-18T18:30:59Z

enhancements/machine-api/aws-placement-groups.md

+  ensure that there is
+  minimal latency between instances within my cluster, allowing maximal throughput of my workloads.
+- As an operator of a large scale distributed and replicated data store, I would like to use a partition placement
+  group to ensure that different shards of my data are replicated within different fault domains within the same


is this a current customer requirement or just explaining how else this could be used? I'm ok with the latter, but I'd like to understand

This one is not a current customer requirement as far as I understand. The customer ask we have that's driving this is for using Cluster placement groups, which is the second user story.

I want to implement this feature fully so we've explored all of the use cases and added users stories for each

deads2k · 2022-01-18T18:33:16Z

enhancements/machine-api/aws-placement-groups.md

+The changes are outlined below:
+
+```golang
+type Placement struct {


this is part of AWSMachineProviderConfig. How many of these are present in our standard installation?

This is part of AWSMachineProviderConfig, it is a singleton within the provider config.

On AWS, we create 3 MachineSets and 3 control plane Machines, each of those will contain an AWSMachineProviderConfig, and the MachineSets each create 1 Machine by default.

If a user wants to add placement groups during install, they probably want different placement groups per MachineSet, so I'd expect them to make up to 3 edits during the manifest phase

The installer will create 1 MachineSet for each availability zone. By default, the installer will include all availability zones in a region. So, taking us-east-1 as an example, not 3 edits but rather 6 edits.

deads2k · 2022-01-18T18:34:23Z

enhancements/machine-api/aws-placement-groups.md

+  // +optional
+  GroupType AWSPlacementGroupType `json:"groupType,omitempty"`
+
+  // PartitionCount specifies the number of partitions for a Partition placement


This description really makes this sound like it should be part of discriminated union.

So I've added the type as an explicit option so we know what the customer wants us to do, as there (currently) is no configuration apart from for partition types, I didn't think we would need to add another struct level to make it a proper discriminated union, but I guess there's no harm in it

To clarify you're suggestion is:

type Placement struct { Region AvailabilityZone Tenancy GroupType AWSPlacementGroupType GroupName string Partition *AWSPartitionPlacement } type AWSPartitionPlacement struct { Count int32 Number int32 }

I'm wondering if @deads2k was suggesting that the entire placement group policy should be a discriminating union?

That example makes sense to me, but I would suggest breaking it out into a separate type, which could be inlined:

type Placement struct { // ... GroupUnion `json:",inline"` } // +union type GroupUnion struct { // +unionDiscriminator // +kubebuilder:validation:Required // +required GroupType AWSPlacementGroupType // +optional Cluster *AWSClusterPlacement // +optional Partition *AWSPartitionPlacement // +optional Spread *AWSSpreadPlacement } type AWSClusterPlacement struct {} type AWSPartitionPlacement struct { Count int32 Number int32 } type AWSSpreadPlacement struct {}

I can do this yep, though, I'm not hugely familiar with the union types, does this mean users have to set one of the structs to be non-empty? Or can they just set the type for Cluster or Spread and omit the struct? Also, do we even need to add those structs to the API if they are empty/not required right now?

The user can specify the discriminator, or specify one of the structs, or both. I think you can omit the empty structs and corresponding fields if you prefer.

deads2k · 2022-01-18T18:34:59Z

enhancements/machine-api/aws-placement-groups.md

+The changes are outlined below:
+
+```golang
+type Placement struct {


can you link to the current docs for me?

Do you mean product docs? https://docs.openshift.com/container-platform/4.9/machine_management/creating_machinesets/creating-machineset-aws.html

deads2k · 2022-01-18T18:37:07Z

enhancements/machine-api/aws-placement-groups.md

+
+```golang
+type Placement struct {
+  // Existing Fields


The existing fields in Placement are

Region string json:"region,omitempty"

AvailabilityZone string json:"availabilityZone,omitempty"

Tenancy InstanceTenancy json:"tenancy,omitempty"

Can you describe how these interact with cluster, partition, and spread options? Are they expected to empty when setting a groupName? Does Spread make sense if there's only one AZ?

Region and AvailabilityZone remain the same, there's no effect apart from you may want to create multiple machinesets across a single AZ to make use of a partition placement group, though we would normally just spread machines across AZs

Tenancy is affected because you can't use dedicated hosts with certain placement group types, this is called out in the sections where I talk about each type of placement group

Does Spread make sense if there's only one AZ?

Yes! Placement groups are a way of defining how your EC2 instances are laid out within a AZ, as opposed to across AZs. For spread in particular, each AZ in AWS is divided into 7 sections, each section having independent power and network for HA. When you create a spread placement group, it will place at most 1 instance in each of the 7 sections. If you use the same PG across 2 AZs, you then in theory have 14 sections, 7 in each AZ.

deads2k · 2022-01-18T18:37:55Z

enhancements/machine-api/aws-placement-groups.md

+types without detriment.
+
+Spread placement groups have a limited capacity. You can only launch seven instances within a Spread placement group
+within a single availability zone.


can I use it for muitiple availability zones normally?

There's nothing to stop you using it across multiple availability zones, but a MachineSet is tied to a single AZ, so each MachineSet gains a limit of seven instances when it has a spread placement group (unless they create more than one machineset in the same zone, in which case, fewer than 7 I would expect)

deads2k · 2022-01-18T18:39:35Z

enhancements/machine-api/aws-placement-groups.md

+  // placement group does not exist.
+  // Valid values are "Cluster", "Partition", "Spread", and omitted, which means
+  // no opinion and the platform chooses a good default which may change over time.
+  // The current default value is "Spread".


is "spread" equivalent to our current behavior? I honestly don't know.

No. The current behaviour is that an ec2 instance we create can go anywhere within the AZ, just gonna paste a response from another comment as I think it explains the difference

For spread in particular, each AZ in AWS is divided into 7 sections, each section having independent power and network for HA. When you create a spread placement group, it will place at most 1 instance in each of the 7 sections. If you use the same PG across 2 AZs, you then in theory have 14 sections, 7 in each AZ.

In our current behaviour, we could quite easily create 5 instances and have them all end up in the same section of the datacenter, meaning a power failure in that section would wipe out all instances

I've discussed this with @sdodson and we agreed that there isn't actually a sane default. If the value is omitted and we need to create a placement group, that should be considered an error, I've updated the description here.

deads2k · 2022-01-18T18:40:25Z

enhancements/machine-api/aws-placement-groups.md

+  // PartitionNumber specifies the numbered partition in which instances should be launched.
+  // This value is only used in conjunction with a `Partition` placement Group.
+  // It is recommended to only use this value if multiple MachineSets share
+  // a single Placement Group, in which case, each MachineSet should represent an individual partition number.


cam you provide an example of using this later in the enhancement? i think it's too big for godoc here, but I don't quite see it.

Yep, have added a yaml example to the Partition Placement Groups heading

enhancements/machine-api/aws-placement-groups.md

deads2k · 2022-01-18T18:41:48Z

enhancements/machine-api/aws-placement-groups.md

+As Spread placement groups keep instances in different racks, they can be expanded over time and can also mix instance
+types without detriment.
+
+Spread placement groups have a limited capacity. You can only launch seven instances within a Spread placement group


when someone tries to scale beyond this, how will the failure be presented to the user?

An error will be presented to them on the Machine, which will have moved into the Failed phase, this should be sufficient to prompt them that they can't add a new instance to the placement group

deads2k · 2022-01-18T18:46:42Z

enhancements/machine-api/aws-placement-groups.md

+To ensure we leave user accounts clean, we should remove any placement group that is no longer required.
+
+Each time a Machine is removed, we must check whether it belongs to a placement group.
+If it belongs to a placement group, we must also check that there are no other instances within the placement group.


why should we delete a placement group that we did not create?

We shouldn't, we will only delete placement groups that look like they were created by MAPI (tagged appropriately) and also only in the case when we can recreate the PG if required later (ie we know what groupType it is)

Does this mean we will still leak a placement group in the following scenario?

User specifies a placement group on a Machine or MachineSet.

OpenShift creates the placement group.

User creates an instance outside of OpenShift and assigns it to the placement group.

User deletes the Machine or MachineSet from step 1.

OpenShift observes that the placement group is still in use and does not delete it.

User deletes the instance from step 3.

Maybe that's fine since the user is doing funky things in this example. Anyway, I guess the installer's cluster deprovisioning logic would still clean up any unused placement group with the cluster tag.

Maybe that's fine since the user is doing funky things in this example. Anyway, I guess the installer's cluster deprovisioning logic would still clean up any unused placement group with the cluster tag.

You're right, we would "leak" it in this case, but, this isn't particularly bad as placement groups are a no-cost feature of AWS, the only problem would be if the customer leaked, by doing this process, a lot of placement groups and then reached the 500 group limit

The installer deprovision should catch it though as I've already implemented that and tested it before I'd implemented the removal logic in the POC

deads2k · 2022-01-18T18:49:35Z

enhancements/machine-api/aws-placement-groups.md

+If using the Cluster Autoscaler, the Cluster Autoscaler will detect the failed launch and will, after some period (15
+minutes by default), attempt to scale a different group of instances to fulfil the capacity gap, if it is able to.
+
+We recommend that users leverage multiple MachineSets, each with their own Placement Group to minimise the likelihood


"minimise" how very british. :) I don't think we ever made a doc declaring american spelling as standard, but now I secretly suspect those people who add spell checkers of trying to backdoor it!

Entirely tongue in cheek, the spelling is fine :)

I don't think we ever made a doc declaring american spelling as standard, but now I secretly suspect those people who add spell checkers of trying to backdoor it!

If I spell it with a z (pronounced "zed") then I get red squiggly line under it ;)

I have wondered about whether we had an official spelling recommendation, I know product docs have to be american english but this isn't customer facing (or is it?)

Until I'm forced to write and speak in american english, I'll stick to british and keep pronouncing quay in the british way as well ;)

how very british

The Britishism that trips me up the most is the parsimonious spelling of "fulfil".

I have wondered about whether we had an official spelling recommendation, I know product docs have to be american english but this isn't customer facing (or is it?)

Upstream has a similar guideline:

The English-language documentation uses U.S. English spelling and grammar.

https://kubernetes.io/docs/contribute/style/style-guide/

I don't know whether that applies to enhancements though, and anyway, we're not upstream.

Until I'm forced to write and speak in american english, I'll stick to british and keep pronouncing quay in the british way as well ;)

For what it's worth, I think you have parts of the US on your side with that one (I think the British pronunciation is common in the northeastern states).

deads2k · 2022-01-18T18:54:47Z

enhancements/machine-api/aws-placement-groups.md

+
+Using existing webhooks that inspect the Machine provider configuration, we can warn users if they create a MachineSet
+with a Spread placement group and attempt to scale it over seven instances.
+This however is not fool-proof, as the user may not have set the placement group type, or may have set it incorrectly


if we require everything to be set ahead of time, we could add this protection?

We don't have required fields within the provider specs, especially with optional features such as this so I'm not sure if we could reasonably enforce things to be set ahead of time. I can imagine customers frequently tweak their machineset definitions, delete all the current machines and let MAPI create them new ones based on the new configuration

deads2k · 2022-01-18T18:55:18Z

enhancements/machine-api/aws-placement-groups.md

+
+We may also wish to extend the Cluster Autoscaler Operator to inspect MachineAutoscalers and their targets.
+If any MachineAutoscaler targeted a MachineSet with a spread placement group, it could warn if the maximum replica
+count is set higher than 7. This however is non-trivial as there are currently no webhooks for the Cluster Autoscaler


We create this resource via the API, right? Why can't we enforce it during creation?

We could if we had webhooks, but we don't today and I'd rather not expand the scope of this enhancement to introduce them, it doesn't seem justified for a single validation like this.

The MachineAutoscaler is a separate CR to the MachineSet, so to check this, we need to have something lookup the MachineSet based on the object reference, then check the spec of that and report back whether it's using a spread group or not. While that's entirely possible, there's nothing to stop the end user changing the definition of their machineset during the lifecycle of the cluster, they could create the MachineAutoscaler, then configure the placement groups later

sdodson

Aside from the new comments it looks fine to me, api review not withstanding.

enhancements/machine-api/aws-placement-groups.md

sdodson · 2022-02-02T19:28:36Z

enhancements/machine-api/aws-placement-groups.md

+  // +optional
+  GroupType AWSPlacementGroupType `json:"groupType,omitempty"`
+
+  // PartitionCount specifies the number of partitions for a Partition placement


I'm wondering if @deads2k was suggesting that the entire placement group policy should be a discriminating union?

JoelSpeed · 2022-02-03T11:35:12Z

Aside from the new comments it looks fine to me, api review not withstanding.

I've updated based on the last couple of comments and the union suggestion

deads2k · 2022-02-03T18:31:54Z

enhancements/machine-api/aws-placement-groups.md

+  // When a Machine is deleted, if this leaves an empty Placement Group, the group is also
+  // deleted.
+  // +optional
+  GroupName string `json:"groupName,omitempty"`


If I have three machinesets and each contains an AWSMachineProviderConfig and I specify a GroupName of "BestGroupEver" in each machineset, what happens?

That's allowed and is a valid use case. If you read through the partition placement groups section, it mentions having a partition placement group with multiple partitions and having each partition represented by the group number.

There isn't a negative effect to having multiple MachineSets with the same group name, if it already exists, it will just be used as is

There isn't a negative effect to having multiple MachineSets with the same group name, if it already exists, it will just be used as is

Given three different definitions of the GroupName, which of those definition is accurate? Based on what I'm seeing here, I gather that it is racy.

Using this as both the definition PlacementGroup and the reference to which PlacementGroup to use makes the usage of this API inconsistent depending on read order. The same resources played against two different clusters can have two different results. this is a problem with the API.

As an alternative, why not have one resource that describes the PlacementGroup and simply reference those PlacementGroups from here.

Given three different definitions of the GroupName, which of those definition is accurate? Based on what I'm seeing here, I gather that it is racy.

I think you're right this would cause a problem for users as one of the MachineSets would come up successfully and the others, as we have it implemented now, would fail with errors

As an alternative, why not have one resource that describes the PlacementGroup and simply reference those PlacementGroups from here.

This is something we could do, though it would be a lot more effort to get done (will need to review team planning/product requirements) and does add complexity for the users. A concern that has come up before is proliferation of the number of CRDs within OpenShift, does this tiny feature warrant adding a new CRD? Is that acceptable?

I'll have a few conversations today and gather feedback, fwiw, to avoid a lot of the complexity here about preventing misconfiguration, having a new CRD is probably the best thing to do. I'm wondering if we should investigate the idea of the AWSCluster resource from Cluster API which gathers a lot of information like this, if that's a singleton within the cluster and has a list of placement group configs, it could manage those and the other "cluster level" infrastructure

deads2k · 2022-02-03T18:32:52Z

enhancements/machine-api/aws-placement-groups.md

+  GroupName string `json:"groupName,omitempty"`
+
+  // AWSPlacementGroupUnion embeds the placement group configuration within the placement.
+  AWSPlacementGroupUnion `json:",inline"`


What if i specify the same group three times as in https://github.com/openshift/enhancements/pull/995/files#r798856729 and each one specifies a different GroupType?

We are planning (I will double check if this made it into the enhancement or just comments) to check the group type against the group we find on AWS. If the user specifies a cluster PG and we find a spread PG, we will return an error and tell them they need to fix the Machine configuration

deads2k · 2022-02-03T18:33:57Z

enhancements/machine-api/aws-placement-groups.md

+  // +kubebuilder:validation:Minimum:=1
+  // +kubebuilder:validation:Maximum:=7
+  // +optional
+  Count int32 `json:"count,omitempty"`


What happens if different machinesets specify this number differently for the same GroupName?

From the above statement, "This value is only observed when creating a placement group", I would infer that the parameters applied would be those specified in the first reconciled machineset that is configured to use the group.

The more I read and come to understand how placement groups work, the more it seems to me that placement groups really should be configured as separate config objects from machines/machinesets. Defining placement groups within machinesets makes the semantics extremely confusing. As it is, both the basic usage and behaviors of placement groups as well as edge cases (such as the ones David brings up) are difficult to understand.

The more I read and come to understand how placement groups work, the more it seems to me that placement groups really should be configured as separate config objects from machines/machinesets. Defining placement groups within machinesets makes the semantics extremely confusing. As it is, both the basic usage and behaviors of placement groups as well as edge cases (such as the ones David brings up) are difficult to understand.

I could envisage a separate CRD for this that then gets created, thought that's a lot of work and means more controllers and more complication from a user perspective, so I think there's a balance to make here.

The upstream community in Azure already have similar concepts being created by the Machine controller so we have followed that pattern for now.

What happens if different machinesets specify this number differently for the same GroupName?

We can validate that this configuration is matching the existing placement group and error out if it isn't, this would enforce the count and type to be consistent across multiple MachineSets

Miciah · 2022-02-03T21:41:19Z

enhancements/machine-api/aws-placement-groups.md

+  // AWSPartitionPlacementGroupType is the "Partition" placement group type.
+  // Partition placement groups reduce the likelihood of hardware failures
+  // disrupting your application's availability.
+  // Partition placement groups are recommended for use with large scale
+  // distributed and replicated workloads.


It isn't clear to me from this godoc what "Partition" actually does. From my reading of the implementation details, if I understand them correctly, "Partition" defines two very different behaviors depending on whether number is specified or not:

When number is specified, "Partition" behaves like "Cluster" except it places the instances within the numberth partition.

When number is not specified, "Partition" behaves like "Spread" except it restricts instances to racks belonging to partitions of the placement group.

Is that correct?

Sort of

When number is specified, "Partition" behaves like "Cluster" except it places the instances within the numberth partition.

The additional behaviour here is that if you had two cluster groups, they may overlap and instances in one group may be on the same physical hardware as the other. With partitions, each partition is distinct, so you are guaranteed that an instance in partition 1 is not on the same hardware as partition 2

When number is not specified, "Partition" behaves like "Spread" except it restricts instances to racks belonging to partitions of the placement group.

This is right, the only difference is that in spread you can only have 1 instance per hardware failure domain (the 7 partitions in each AZ), where in a partition it spreads them equally across however many partitions you choose

Miciah · 2022-02-04T00:58:28Z

enhancements/machine-api/aws-placement-groups.md

+We will introduce new fields within the `Placement` section of the AWS Machine Provider Config which will allow users
+to specify which placement group to create instances within.
+
+If the placement group does not exist, it will be created based upon the group type and, if applicable partition count provided.


This sentence is missing either a comma or a conjunct. I would guess the former, but I want to make sure there isn't some missing information here.

Suggested change

If the placement group does not exist, it will be created based upon the group type and, if applicable partition count provided.

If the placement group does not exist, it will be created based upon the group type and, if applicable, partition count provided.

Suggested change

If the placement group does not exist, it will be created based upon the group type and, if applicable partition count provided.

If the placement group does not exist, it will be created based upon the group type and [missing words here], if applicable partition count provided.

First suggestion, missing comma is what I had intended

Miciah · 2022-02-04T01:06:13Z

enhancements/machine-api/aws-placement-groups.md

+
+  // GroupName specifies the name of an AWS placement group to create the Machine within.
+  // If the group specified does not exist, it will be created based on the GroupType
+  // paramenter.


Suggested change

// paramenter.

// parameter.

Miciah · 2022-02-04T01:07:50Z

enhancements/machine-api/aws-placement-groups.md

+// +union
+type AWSPlacementGroupUnion struct {
+  // GroupType specifies the type of AWS placement group to use for this Machine.
+  // This parameter is only used when a Machine is being created and the named


"Machine or MachineSet", right? Or do I have a fundamental misunderstanding of this API?

Suggested change

// This parameter is only used when a Machine is being created and the named

// This parameter is only used when a Machine or MachineSet is being created and the named

MachineSet is to a Machine as a ReplicaSet is to a Pod, so we don't actually create the PG until the first Machine in a MachineSet is created

Miciah · 2022-02-04T01:21:40Z

enhancements/machine-api/aws-placement-groups.md

+  // GroupType specifies the type of AWS placement group to use for this Machine.
+  // This parameter is only used when a Machine is being created and the named
+  // placement group does not exist.
+  // Valid values are "Cluster", "Partition", "Spread".
+  // If no value is given and a placement group is required to be created, this is
+  // considered an error.
+  // Note: If the value of this field is "Spread", instanceTenancy may not be set
+  // to "dedicated".


Note that the godoc on the AWSPlacementGroupType type definitions does not get included in the generated Swagger, so any documentation from the const definitions that you want the user to see in the generated Swagger should be repeated here for the field definition. That said, I don't know whether users really look to Swagger for API documentation, so maybe this isn't important. (Where do users turn for MAPI documentation? Do most users just read the Go definitions?)

We don't currently have the Machine ProviderSpecs documented very well, typically we have guides that instruct users to turn on a particular feature. I imagine there will be a new page or section for placement groups under which we will talk about the new fields and the way that the structure should be created within the MachineSet.

Our docs writer would normally grab this information from the enhancement IIUC (cc @jeana-redhat)

Miciah · 2022-02-04T01:52:14Z

enhancements/machine-api/aws-placement-groups.md

+In particular this means checking that the `groupType` has been set on the Machine configuration, if this is not set,
+it is likely that Machine API did not create the placement group (and wouldn't be able to create a new placement group)


I'm confused because you wrote the following above:

Should the user not define the type, we will return an error and place the Machine into a Failed state.

So if groupName is specified and groupType is omitted, then the Machine API definitely didn't create the placement group, right?

So if groupName is specified and groupType is omitted, then the Machine API definitely didn't create the placement group, right?

I believe so yes. Though, having now talked about making sure that the placement group we are using is the correct kind for the customer, I think we want to require name and type and then perform validation that the group is of the type they are expecting. I think I missed this while updating from previous feedback

Miciah · 2022-02-04T01:59:13Z

enhancements/machine-api/aws-placement-groups.md

+- You can create a maximum of 500 placement groups per account in each Region.
+- The name that you specify for a placement group must be unique within your AWS account for the Region.
+- An instance can be launched in one placement group at a time; it cannot span multiple placement groups.
+- You cannot launch AWS Dedicated Hosts in placement groups.


Above, you wrote the following:

A partition placement group with AWS Dedicated Instances can have a maximum of two partitions.

I don't know how to reconcile the "general limitation" on line 336 with the cited limitation for "Partition". Is there a distinction between "AWS Dedicated Hosts" and "AWS Dedicated Instances"? Is the user able launch a AWS Dedicated Host and then put it in a placement group after it is launched?

This was copied from the AWS docs, which having checked, of course contradict themselves. I think the general limitations is "You should assume this", and then looking at partition in particular, "you can now use this but only for this type of placement group" (Dedicated hosts/instances are synonyms)

To thicken the plot, it looks like there are two different pages for Amazon EC2 Dedicated Hosts and Amazon EC2 Dedicated Instances
. Quite confusing unfortunately, are we really sure they are synonyms? The instanceTenancy type also has a "dedicated" and an "host" value here, could they highlight this difference already?

Reading the two pages, they appear to me to describe the same thing

Miciah · 2022-02-04T02:00:19Z

enhancements/machine-api/aws-placement-groups.md

+If using the Cluster Autoscaler, the Cluster Autoscaler will detect the failed launch and will, after some period (15
+minutes by default), attempt to scale a different group of instances to fulfil the capacity gap, if it is able to.
+
+We recommend that users leverage multiple MachineSets, each with their own Placement Group to minimise the likelihood


how very british

The Britishism that trips me up the most is the parsimonious spelling of "fulfil".

I have wondered about whether we had an official spelling recommendation, I know product docs have to be american english but this isn't customer facing (or is it?)

Upstream has a similar guideline:

The English-language documentation uses U.S. English spelling and grammar.

https://kubernetes.io/docs/contribute/style/style-guide/

I don't know whether that applies to enhancements though, and anyway, we're not upstream.

Until I'm forced to write and speak in american english, I'll stick to british and keep pronouncing quay in the british way as well ;)

For what it's worth, I think you have parts of the US on your side with that one (I think the British pronunciation is common in the northeastern states).

Miciah · 2022-02-04T02:03:39Z

enhancements/machine-api/aws-placement-groups.md

+
+This failure will prevent adding additional compute capacity to the cluster and in turn, will result in potentially
+pending workloads.
+As this feature is only supported on worker nodes, we do not expect any detrimental impact on the control plane.


It may not be supported, but is it possible for users to configure this on the control-plane Machines, or will we block this somehow?

Not supported for the moment, this is blocked purely on the fact that we aren't going to implement this in the installer. I've added some notes about control plane machines to the future implementation and drawbacks sections

Miciah · 2022-02-04T02:07:12Z

enhancements/machine-api/aws-placement-groups.md

+- Support for Placement Groups for Control Plane Machines
+  - Users may replace their control plane at a later date using future automation if they wish to have their control
+    plane within a placement group
+- Support for moving Machines between placement groups


Does this mean it is not possible to change the placement group of an instance after it has been launched?

You can, but not via Machine API. To change the PG, you have to shut down the instance on AWS and then move it, and then restart it. AWS document that this can't be done but I've been informed by AWS engineers that it can be done

JoelSpeed · 2022-02-04T10:20:40Z

@Miciah @deads2k Thanks for the reviews, I've updated the enhancement to clarify some of the points you made and responded in the comments for each

The important change is we will now require group name and group type and validate these so that if users do share the config across multiple groups, they definitely get the correct type of group. Will add the group count for partition into that validation as well

staebler · 2022-02-05T02:02:53Z

enhancements/machine-api/aws-placement-groups.md

+  // GroupName specifies the name of an AWS placement group to create the Machine within.
+  // If the group specified does not exist, it will be created based on the GroupType
+  // parameter.
+  // When a Machine is deleted, if this leaves an empty Placement Group, the group is also


We should call out in this comment that only Placement Groups created by the cluster will be deleted.

staebler · 2022-02-05T02:16:58Z

enhancements/machine-api/aws-placement-groups.md

+installer generated machineset manifests before the cluster is created.
+We do not intend to support placement groups for control plane machines at all at this point.
+
+We will however need to update the installer cluster deprovisioning logic, as currently it does not know how to remove


If the user adds an instance to a placement group created by the cluster, then the destroyer will never complete as it will continually fail to delete the placement group. That does not sound like a terrible problem to me: But I did want to call it out. It is similar issue to not being able to delete a VPC to which the user has added an instance.

I've added a note under this to call that out, I guess the only thing we can do is try to make sure a useful error is returned in this case?

staebler · 2022-02-05T02:19:50Z

enhancements/machine-api/aws-placement-groups.md

+
+```yaml
+- ec2:DescribePlacementGroups
+- ec2:CreatePlacementGroup


ec2:DeletePlacementGroup, too, right?

If so, then this permission also needs to be added to the permissions checked by the installer since it will be needed by the destroyer.

Good catch, I hadn't done any testing of the deletion (or implemented it) when this was written so it slipped through. It is needed though so will add a note.

Do you happen to have a link to roughly where the list of permissions the installer checks is sourced from? Can raise a PR

https://github.com/openshift/installer/blob/e9bd965ebc73bf2fd8c2b052ecf3918437727c0a/pkg/asset/installconfig/aws/permissions.go#L175-L192

PR openshift/installer#5655

staebler · 2022-02-05T02:21:41Z

enhancements/machine-api/aws-placement-groups.md

+
+### Open Questions
+
+- Do we need to propagate user defined tags from the Infrastructure resource onto the placement groups we create?


I would suspect so.

I've already created a PR for this so this isn't actually an open question anymore, will remove

damdo · 2022-02-09T10:37:00Z

enhancements/machine-api/aws-placement-groups.md

+## Proposal
+
+We will introduce a new CRD, `AWSPlacementGroup` within the `machine.openshift.io/v1` API group and a new
+new object reference within the `Placement` section of the AWS Machine Provider Config which will allow users


Suggested change

new object reference within the `Placement` section of the AWS Machine Provider Config which will allow users

object reference within the `Placement` section of the AWS Machine Provider Config which will allow users

damdo · 2022-02-09T12:55:49Z

enhancements/machine-api/aws-placement-groups.md

+	// provided by attached IAM role where the actuator is running.
+	// +optional
+	CredentialsSecret *corev1.LocalObjectReference `json:"credentialsSecret,omitempty"`


Suggested change

// provided by attached IAM role where the actuator is running.

// +optional

CredentialsSecret *corev1.LocalObjectReference `json:"credentialsSecret,omitempty"`

// provided by attached IAM role where the actuator is running.

// +optional

CredentialsSecret *corev1.LocalObjectReference `json:"credentialsSecret,omitempty"`

damdo · 2022-02-09T14:33:41Z

enhancements/machine-api/aws-placement-groups.md

+  // Note the partition count of a placement group cannot be changed after creation.
+  // If unset, AWS will provide a default partition count.
+  // This default is currently 2.
+  // Note: If instanceTenancy is set to "dedicated", the maximum number of partitions


This is the Tenancy field under Placement right?
https://github.com/openshift/api/blob/b632c5fc10c0ea86cb18577db9388109a43d465f/machine/v1beta1/types_awsprovider.go#L189
It would be nice if we could make it more obvious it is that one. Maybe we could specifying its full path?

damdo · 2022-02-09T14:54:55Z

enhancements/machine-api/aws-placement-groups.md

+  // PartitionNumber specifies the numbered partition in which instances should be launched.
+  // It is recommended to only use this value if multiple MachineSets share
+  // a single Placement Group, in which case, each MachineSet should represent an individual partition number.
+  // If unset, when a Partition placement group is used, AWS will attempt to
+  // distribute instances evenly between partitions.
+  // +kubebuilder:validation:Minimum:=1
+  // +kubebuilder:validation:Maximum:=7
+  // +optional
+  PartitionNumber int32 `json:"number,omitempty"`
+}


My understanding (but correct me if I'm wrong here) after reading the AWS docs is that the partition number is a concept only for Partition placement groups.

With that in mind, what happens if the user specifies a PartitionNumber but then references an AWSPlacementGroup of an AWSPlacementGroupType (which doesn't have the concept of partition number)? What is the behaviour we intend to pursue? We could maybe add a note on that in the spec comment above.

We should return an error in that case to the user, they probably didn't mean to set the partition number in that case I would guess

damdo · 2022-02-09T15:58:03Z

enhancements/machine-api/aws-placement-groups.md

+      availabilityZone: us-east-1a
+      group:
+        name: my-partition-placement-group
+      partitionNumber: 1 | 2 | 3 # This value will vary across the three MachineSets, either 1, 2 or 3


So my understanding is that once a partition placement group is created (and it's initial count specified), then machines can be created within the various logical partitions (correct me if I'm wrong here).

A couple of follow up questions to this:

the count of the desired logical partitions must be specified at creation time, instead where these partitions are within a region is not pin-pointed by the creation step. How does this work? Is a partition automatically assigned to an AZ depending on where the first instance that we put into the partition lives? Or does a single logical partition allow, for example, 2 instances that live on two different AZs to be in the same logical partition? And if the latter is true what how does this count towards the partitions per Availability Zone cap?

do you envision us allowing different MachineSets biding to the same logical partition number? If yes, what if the two MachineSets instances are on different AZs? (this then ties back to the previous question).

So you can have 7 partitions maximum, but that is 7 partitions per availability zone in actuality.

So there is a partition numbered 1 in AZ A and in AZ B, and you can put the machines in to partition 1 on each of these, but because they are separate AZs, they are separate failure domains already. It's best to just think of placement groups as a way to organise machines within a single AZ

Machines in AZ A P1 and P2 are protected from hardware failures by being in different redundancy zones within the AZ, AZ A P1 and AZ B P1 are protected because they are in separate AZs (data centers)

Ok I see. Makes sense, thank you.

damdo · 2022-02-09T17:13:07Z

enhancements/machine-api/aws-placement-groups.md

+To ensure we leave user accounts clean, we should remove any placement group that is no longer required.
+If a user deletes a managed `AWSPlacementGroup`, we will remove it from AWS provided it has no instances registered
+within it. To enable this clean up logic a finalizer will need to be added to each `AWSPlacementGroup` to allow the
+controller to remove the instance.


By instance here you mean the "physical" instance of the AWSPlacementGroup, not an AWS EC2 instance within the group, correct?

I mean the placement group on AWS, will clarify

damdo · 2022-02-09T17:32:14Z

enhancements/machine-api/aws-placement-groups.md

+- You can create a maximum of 500 placement groups per account in each Region.
+- The name that you specify for a placement group must be unique within your AWS account for the Region.
+- An instance can be launched in one placement group at a time; it cannot span multiple placement groups.
+- You cannot launch AWS Dedicated Hosts in placement groups.


To thicken the plot, it looks like there are two different pages for Amazon EC2 Dedicated Hosts and Amazon EC2 Dedicated Instances
. Quite confusing unfortunately, are we really sure they are synonyms? The instanceTenancy type also has a "dedicated" and an "host" value here, could they highlight this difference already?

damdo

Thanks for the answers and for including the suggestions @JoelSpeed
/lgtm

Fedosin

/lgtm

JoelSpeed · 2022-02-23T14:29:57Z

@sdodson @deads2k I believe everything here has been resolved, I've updated the API to match what was approved in the API repo, if you are happy, I'll squash the commits and we can merge this and get on with the implementation

JoelSpeed · 2022-03-21T11:18:45Z

I've squashed the commits here so that we can get this merged

sdodson · 2022-03-21T13:40:18Z

/lgtm
/approve

openshift-ci · 2022-03-21T13:41:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Fedosin, sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [sdodson]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2022-03-21T13:50:59Z

@JoelSpeed: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci bot requested review from imcleod and kbsingh January 5, 2022 17:49

openshift-ci bot requested review from danwinship, deads2k and sdodson and removed request for imcleod and kbsingh January 5, 2022 18:54

jeana-redhat reviewed Jan 5, 2022

View reviewed changes

JoelSpeed mentioned this pull request Jan 6, 2022

[OCPCLOUD-1373] Add AWS EFA Networking enhancement #989

Merged

elmiko reviewed Jan 6, 2022

View reviewed changes

enhancements/machine-api/aws-placement-groups.md Show resolved Hide resolved

danwinship reviewed Jan 7, 2022

View reviewed changes

This was referenced Jan 7, 2022

[OCPCLOUD-1380] Add Placement Group options to AWS machine provider config openshift/api#1091

Merged

[OCPCLOUD-1385] Ensure removal of placement-groups during cluster destroy on AWS openshift/installer#5528

Merged

sdodson reviewed Jan 14, 2022

View reviewed changes

enhancements/machine-api/aws-placement-groups.md Show resolved Hide resolved

deads2k reviewed Jan 18, 2022

View reviewed changes

enhancements/machine-api/aws-placement-groups.md Outdated Show resolved Hide resolved

deads2k reviewed Jan 18, 2022

View reviewed changes

enhancements/machine-api/aws-placement-groups.md Outdated Show resolved Hide resolved

deads2k reviewed Jan 18, 2022

View reviewed changes

enhancements/machine-api/aws-placement-groups.md Show resolved Hide resolved

deads2k reviewed Jan 18, 2022

View reviewed changes

sdodson reviewed Feb 2, 2022

View reviewed changes

deads2k reviewed Feb 3, 2022

View reviewed changes

Miciah reviewed Feb 4, 2022

View reviewed changes

staebler reviewed Feb 5, 2022

View reviewed changes

damdo reviewed Feb 9, 2022

View reviewed changes

damdo reviewed Feb 11, 2022

View reviewed changes

openshift-ci bot assigned damdo Feb 11, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 11, 2022

Fedosin approved these changes Feb 11, 2022

View reviewed changes

openshift-ci bot assigned Fedosin Feb 11, 2022

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 23, 2022

Add enhancement for AWS Placement Groups in Machine API

bff8f1f

JoelSpeed force-pushed the placement-groups branch from da5f106 to bff8f1f Compare March 21, 2022 11:17

openshift-ci bot assigned sdodson Mar 21, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 21, 2022

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 21, 2022

openshift-merge-robot merged commit f7d252a into openshift:master Mar 21, 2022

	workloads are unlikely to suffer from hardware failures simultanesouly.
	workloads are unlikely to suffer from hardware failures simultaneously.

		There are currently no viable alternatives. This feature is unique in AWS and there isn't an alternative way for users
		to make use of the benefits provided by placement groups.


		###### Limitations of Spread Placement Groups

		- Spread placement groups are not supported for Dedicated Instances.


		### Open Questions

		- Do we need to propogate user defined tags from the Infrastructure resource onto the placement groups we create?

	- Do we need to propogate user defined tags from the Infrastructure resource onto the placement groups we create?
	- Do we need to propagate user defined tags from the Infrastructure resource onto the placement groups we create?

		// This parameter is only used when a Machine is being created and the named
		// placement group does not exist.

	If the placement group does not exist, it will be created based upon the group type and, if applicable partition count provided.
	If the placement group does not exist, it will be created based upon the group type and, if applicable, partition count provided.

		In particular this means checking that the `groupType` has been set on the Machine configuration, if this is not set,
		it is likely that Machine API did not create the placement group (and wouldn't be able to create a new placement group)

	new object reference within the `Placement` section of the AWS Machine Provider Config which will allow users
	object reference within the `Placement` section of the AWS Machine Provider Config which will allow users

Add enhancement for AWS Placement Groups in Machine API #995

Add enhancement for AWS Placement Groups in Machine API #995

Uh oh!

Conversation

JoelSpeed commented Jan 5, 2022

Uh oh!

sdodson commented Jan 5, 2022

Uh oh!

jeana-redhat left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elmiko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdodson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment