feat(ecs): Autoscaling Group Capacity Provider #9192

pahud · 2020-07-21T16:05:08Z

feat(ecs): Autoscaling Group Capacity Provider

This PR adds the Autoscaling Group Capacity Provider support for Amazon ECS

Closes: #5471

Considerations

Use addCapacityProvider to create the CapacityProvider resource. Behind the scene, it addCapacity for this cluster and use the retuned AutoscalingGroup to provision the CapacityProvider so we can leverage the existing CapacityOptions.
When managedTerminationProtection is enabled, which is the default behavior, behind the scene the CapacityProvider construct will create a EnforcedInstanceProtection custom resource which will do the following required steps on create:
a. set NewInstancesProtectedFromScaleIn=True for this ASG
b. set ProtectedFromScaleIn=True for all existing instances in this ASG
When managedTerminationProtection is enabled, on CapacityProvider resource deletion, the EnforcedInstanceProtection custom resource which will do the following things:
a. set NewInstancesProtectedFromScaleIn=False for this ASG
b. set ProtectedFromScaleIn=False for all existing instances in this ASG
In this case if the ASG is going to terminate with the CapacityProvider, the instances can successfully be terminated otherwise the whole stack will be pending.
CapacityProviderConfiguration to break the circular dependency and configure the capacity providers as well as the strategies for the cluster

Known Issues

Update the properties from AWS::ECS::CapacityProvider with the same AutoscalingGroup will encounter the error:

Invalid request provided: CreateCapacityProvider error: The specified Auto Scaling group
ARN is already being used by another capacity provider. Specify a unique Auto Scaling gr
oup ARN and try again. (Service: AmazonECS; Status Code: 400; Error Code: ClientExceptio
n; Request ID: 46641521-15db-40e1-83b8-90bd3dc75a3e; Proxy: null)

Looks like when updating the AWS::ECS::CapacityProvider resource, a new AWS::ECS::CapacityProvider with the same ASG provider will be created first for replacement and will immediately fail because the same ASG is not allowed for two CapacityProvider. No idea how to work it around at this moment.

Sample

const cluster = new ecs.Cluster(stack, 'Cluster', { vpc });

// create the 1st capacity provider with on-demand t3.large instances
cluster.addCapacityProvider('CP', {
  capacityOptions: {
    instanceType: new ec2.InstanceType('t3.large'),
    minCapacity: 2,
  },
  managedScaling: true,
  managedTerminationProtection: true,
  defaultStrategy: { base: 1, weight: 1 },
});

// create the 2nd capacity provider with ec2 spot t3.large instances
cluster.addCapacityProvider('CPSpot', {
  capacityOptions: {
    instanceType: new ec2.InstanceType('t3.large'),
    minCapacity: 1,
    spotPrice: '0.1',
  },
  managedScaling: true,
  managedTerminationProtection: true,
  defaultStrategy: { weight: 3 },
});

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

…city-provider-l2

pahud · 2020-07-22T05:05:33Z

Hi @uttarasridhar

I am ready for the first round. Please take a look.

thanks.

…nto ecs-capacity-provider-l2

pahud · 2020-07-24T18:03:45Z

converted to draft. Working on the circular dependency issue addressed in aws/containers-roadmap#631 (comment)

jgaudet · 2020-07-24T18:30:40Z

Question: In order for #5471 to be considered closed, does CloudFormation need to support setting a custom capacity provider on a service?

pahud · 2020-07-25T00:24:53Z

Question: In order for #5471 to be considered closed, does CloudFormation need to support setting a custom capacity provider on a service?

It would be great if AWS::ECS::Service can support capacityProviderStrategy. But it's still possible to build something like

service.addCapacityProvider(CapacityProviderStrategy)

And behind the scene the AwsCustomResource update the service instead. I will give it a try. Any comments are welcome here.

pahud · 2020-07-25T00:31:33Z

converted to draft. Working on the circular dependency issue addressed in aws/containers-roadmap#631 (comment)

To break the circular dependency, maybe we should read the cluster name from parameter store rather than the this.clusterName like this

aws-cdk/packages/@aws-cdk/aws-ecs/lib/cluster.ts

Lines 197 to 198 in dac9bb3

    
           // Tie instances to cluster 
        
           autoScalingGroup.addUserData(`echo ECS_CLUSTER=${this.clusterName} >> /etc/ecs/ecs.config`);

…nto ecs-capacity-provider-l2

…endency

pahud · 2020-07-31T08:13:57Z

parameter store doesn't seem to work. Trying to use the CapacityProviderConfiguration custom resource to break the circular dependency instead.

iamhopaul123

Thank you so much for contributing❤️ I believe this will be very good news for our ECS customers!

packages/@aws-cdk/aws-ecs/README.md

packages/@aws-cdk/aws-ecs/lib/capacity-provider.ts

packages/@aws-cdk/aws-ecs/lib/cluster.ts

iamhopaul123 · 2020-08-11T19:30:03Z

packages/@aws-cdk/aws-ecs/README.md

+const cluster = new ecs.Cluster(stack, 'Cluster', { vpc });
+
+// create the 1st capacity provider with on-demand t3.large instances
+const cp = cluster.addCapacityProvider('CP', {


A more general question: is it possible to merge addCapacityProviderConfiguration into addCapacityProvider so that users don't need to create a capacity provider then register it to a cluster? Because I would expect cluster.addCapacityProvider already create and register a capacity provider to that cluster and return the created capacity provider (which is similar to what users would expect for cluster.addCapacity)

A more general question: is it possible to merge addCapacityProviderConfiguration into addCapacityProvider so that users don't need to create a capacity provider then register it to a cluster? Because I would expect cluster.addCapacityProvider already create and register a capacity provider to that cluster and return the created capacity provider (which is similar to what users would expect for cluster.addCapacity)

Hi @iamhopaul123

I have been exploring on this but haven't make it yet. Let me explain my current status and I am looking for your advice.

addCapacityProvider() is easy as we just need to create the AWS::ECS::CapacityProvider resource, but registering multiple capacity providers to the cluster is still challenging. We probably have two options here:

simply associate the cp to the cluster in the CapacityProvider and DefaultCapacityProviderStrategy properties of the AWS::ECS::Cluster
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ecs-cluster.html#cfn-ecs-cluster-capacityproviders

However, we'll have circular dependency(aws/containers-roadmap#631 (comment)). An option is like this aws/containers-roadmap#631 (comment) but it might cause breaking changes.

Another solution could be to get the cluster name from the parameter store with the parameter name like ${stackName}-${id}-clusterName and run the aws ssm get-parameters in the UserData of the instance node. But I haven't tried this yet. Still a breaking change though but might be a better solution than above.

The second options is what I've been working on now. I am creating a new CapacityProviderConfiguration custom resource. When we addCapacityProvider(), the cp will be added into capacityProviders: CapacityProviders[] of the cluster instance and on cluster creation, we synthesize all the added capacity providers and run the new CapacityProviderConfiguration inside the Cluster class, but in this case the CapacityProviderConfiguration custom resource can hardly addDependency on all the capacity providers, which will lead to the error below:

Failed to delete resource. Error: An error occurred (UpdateInProgressException) when calling the PutClusterCapac
ityProviders operation: The specified cluster is in a busy state. Cluster attachments must be in UPDATE_COMPLETE
or UPDATE_FAILED state before they can be updated. Wait and try again.

I believe the 2nd option is the way to go, but I am still struggling with how to addDependency on the capacity providers for the CapacityProviderConfiguration resource to ensure all CPs are ready before we configure them to the cluster.

I am not sure if I am on the right track and would be appreciated if you can share your feedbacks. What do you think?

RE the circular dependency.

I believe we need break the following 2 circular dependencies before we can simply register the CP to the cluster from native cfn's perspective.

cluster -> cp -> addCapacity() -> addAutoScalingGroup() -> cluster

aws-cdk/packages/@aws-cdk/aws-ecs/lib/cluster.ts

Lines 198 to 199 in d31d4db

// Tie instances to cluster

autoScalingGroup.addUserData(`echo ECS_CLUSTER=${this.clusterName} >> /etc/ecs/ecs.config`);

cluster -> cp -> addCapacity() -> addAutoScalingGroup() -> InstanceDrainHook -> cluster

aws-cdk/packages/@aws-cdk/aws-ecs/lib/cluster.ts

Lines 258 to 265 in d31d4db

if (!options.taskDrainTime || options.taskDrainTime.toSeconds() !== 0) {

new InstanceDrainHook(autoScalingGroup, 'DrainECSHook', {

autoScalingGroup,

cluster: this,

drainTime: options.taskDrainTime,

topicEncryptionKey: options.topicEncryptionKey,

});

}

OK. Now I am able to resolve the two circular dependencies.

Will add comments inline for discussion.

iamhopaul123 · 2020-08-11T19:41:25Z

packages/@aws-cdk/aws-ecs/lib/capacity-provider.ts

+
+  /**
+   * Whether to enable the managed scaling. This value will be overrided to be True
+   * if the `managedTerminationProtection` is enabled.


i wonder if we should instead of overriding it to be true, we throw an error saying when managedTerminationProtection is enabled managedScaling cannot be disabled sth like that. And also have a test case for this error.

The reason is because default values for both managedTerminationProtection and managedScaling are true, and when a user set managedScaling to be false which means they don't want it to be enabled. Given that if we override managedScaling to be true then it might result in sth that users don't expect (or contradictory to their intention).

iamhopaul123 · 2020-08-11T19:57:47Z

Update the properties from AWS::ECS::CapacityProvider with the same AutoscalingGroup will encounter the error:

Invalid request provided: CreateCapacityProvider error: The specified Auto Scaling group
ARN is already being used by another capacity provider. Specify a unique Auto Scaling gr
oup ARN and try again. (Service: AmazonECS; Status Code: 400; Error Code: ClientExceptio
n; Request ID: 46641521-15db-40e1-83b8-90bd3dc75a3e; Proxy: null)

Looks like when updating the AWS::ECS::CapacityProvider resource, a new AWS::ECS::CapacityProvider with the same ASG provider will be created first for replacement and will immediately fail because the same ASG is not allowed for two CapacityProvider. No idea how to work it around at this moment.

@MrArnoldPalmer do you have any idea on how to resolve the resource updating issue?

Co-authored-by: Penghao He <[email protected]>

packages/@aws-cdk/aws-ecs/lib/cluster.ts

packages/@aws-cdk/aws-ecs/lib/drain-hook/instance-drain-hook.ts

pahud · 2020-08-17T13:43:07Z

packages/@aws-cdk/aws-ecs/test/ec2/integ.capacity-provider.ts

+const cluster = new ecs.Cluster(stack, 'Cluster', { vpc });
+
+// create the 1st capacity provider with on-demand t3.large instances
+cluster.addCapacityProvider('CP', {
+  capacityOptions: {
+    instanceType: new ec2.InstanceType('t3.large'),
+    minCapacity: 2,
+  },
+  managedScaling: true,
+  managedTerminationProtection: true,
+  defaultStrategy: { base: 1, weight: 1 },
+});
+
+// create the 2nd capacity provider with ec2 spot t3.large instances
+cluster.addCapacityProvider('CPSpot', {
+  capacityOptions: {
+    instanceType: new ec2.InstanceType('t3.large'),
+    minCapacity: 1,
+    spotPrice: '0.1',
+  },
+  managedScaling: true,
+  managedTerminationProtection: true,
+  defaultStrategy: { weight: 3 },
+});


The integ testing is doing great now. But fails to destroy with the errors:

Looks like there are some constraints:

When deleting the AWS::ECS::Cluster, all container instances should not be active or draining.

Cannot delete a capacity provider while it is associated with a cluster.

It's interesting as the two conditions look like a mutual conflict.

can we dissociate the cp with the cluster then try to delete the cp then delete the cluster?

When deleting the AWS::ECS::Cluster, all container instances should not be active or draining.

Do you think this has sth to do with removing dependency for InstanceDrainHook on ecs cluster?

packages/@aws-cdk/aws-ecs/lib/capacity-provider.ts

packages/@aws-cdk/aws-ecs/lib/cluster.ts

packages/@aws-cdk/aws-ecs/lib/instance-protection-handler/index.py

packages/@aws-cdk/aws-ecs/test/ec2/integ.capacity-provider.ts

iamhopaul123 · 2020-08-17T17:53:44Z

packages/@aws-cdk/aws-ecs/test/ec2/integ.capacity-provider.ts

+const cluster = new ecs.Cluster(stack, 'Cluster', { vpc });
+
+// create the 1st capacity provider with on-demand t3.large instances
+cluster.addCapacityProvider('CP', {
+  capacityOptions: {
+    instanceType: new ec2.InstanceType('t3.large'),
+    minCapacity: 2,
+  },
+  managedScaling: true,
+  managedTerminationProtection: true,
+  defaultStrategy: { base: 1, weight: 1 },
+});
+
+// create the 2nd capacity provider with ec2 spot t3.large instances
+cluster.addCapacityProvider('CPSpot', {
+  capacityOptions: {
+    instanceType: new ec2.InstanceType('t3.large'),
+    minCapacity: 1,
+    spotPrice: '0.1',
+  },
+  managedScaling: true,
+  managedTerminationProtection: true,
+  defaultStrategy: { weight: 3 },
+});


can we dissociate the cp with the cluster then try to delete the cp then delete the cluster?

pahud · 2020-08-18T01:07:47Z

Encountered circular dependency because an iam policy is referring to this.clusterArn which is formatted from the cluster.physicalName. Will try fix and mitigate it.

aws-cdk/packages/@aws-cdk/aws-ecs/lib/cluster.ts

Lines 115 to 119 in 64798b1

    
           this.clusterArn = this.getResourceArnAttribute(cluster.attrArn, { 
        
             service: 'ecs', 
        
             resource: 'cluster', 
        
             resourceName: this.physicalName, 
        
           });

pahud · 2020-08-18T02:44:51Z

I am holding this PR because of this and will not continue until further notice from the service team.

Summary for current status

cluster and CP can be deployed with no error
can't destroy the stack due to the dependency described here

gitpod-io · 2020-11-03T23:14:09Z

aws-cdk-automation · 2020-11-03T23:30:18Z

AWS CodeBuild CI Report

CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
Commit ID: be38d99
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

ghost · 2020-12-02T21:59:57Z

Hi! When would this be available?

SoManyHs · 2021-06-09T06:47:34Z

This was addressed by #14386. Closing.

pahud added 5 commits June 22, 2020 00:11

support CapacityProvider construct

a64f5a1

Merge branch 'master' of https://github.com/aws/aws-cdk into ecs-capa…

33a1696

…city-provider-l2

feat(ecs): Add Capacity Provider support

dc95a3f

update integ test

a5ad29d

clean up

61c6032

pahud changed the title ~~feat(ecs): Capacity Provider for Auto Scaling Group~~ feat(ecs): Autoscaling Group Capacity Provider Jul 21, 2020

pahud added 2 commits July 22, 2020 00:19

Merge branch 'master' into ecs-capacity-provider-l2

28aca21

fix integ

f894b26

SomayaB assigned uttarasridhar Jul 21, 2020

SomayaB added the @aws-cdk/aws-ecs Related to Amazon Elastic Container label Jul 21, 2020

pahud marked this pull request as draft July 22, 2020 02:47

pahud added 2 commits July 22, 2020 13:01

fix integ

3a3d220

Merge branch 'master' into ecs-capacity-provider-l2

07d0895

pahud marked this pull request as ready for review July 22, 2020 05:05

pahud added 2 commits July 22, 2020 13:25

minor

0932b33

Merge branch 'ecs-capacity-provider-l2' of github.com:pahud/aws-cdk i…

d113dfd

…nto ecs-capacity-provider-l2

This was referenced Jul 23, 2020

[ECS] Full support for Capacity Providers in CloudFormation. aws/containers-roadmap#631

Open

Create ECS Capacity Provider cdk-cosmos/cosmos#89

Closed

pahud marked this pull request as draft July 24, 2020 18:02

pahud added 4 commits July 28, 2020 22:40

minor

226bcf8

Merge branch 'master' into ecs-capacity-provider-l2

65ce125

Merge branch 'ecs-capacity-provider-l2' of github.com:pahud/aws-cdk i…

709efb5

…nto ecs-capacity-provider-l2

add CapacityProviderConfiguration construct to break the circular dep…

0128cc7

…endency

fix test

69aee15

iamhopaul123 reviewed Aug 11, 2020

View reviewed changes

pahud marked this pull request as draft August 12, 2020 01:50

pahud and others added 5 commits August 12, 2020 09:51

Update packages/@aws-cdk/aws-ecs/lib/capacity-provider.ts

5842e67

Co-authored-by: Penghao He <[email protected]>

Update packages/@aws-cdk/aws-ecs/lib/capacity-provider.ts

9fe5b8a

Co-authored-by: Penghao He <[email protected]>

configure the capacity providers under the hood

bd1a1f5

resolved the circular dependencies

9a0375f

Merge branch 'master' into ecs-capacity-provider-l2

2d7cd9e