-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ecs): Autoscaling Group Capacity Provider #9192
Conversation
I am ready for the first round. Please take a look. thanks. |
…nto ecs-capacity-provider-l2
converted to draft. Working on the circular dependency issue addressed in aws/containers-roadmap#631 (comment) |
Question: In order for #5471 to be considered closed, does CloudFormation need to support setting a custom capacity provider on a service? |
It would be great if AWS::ECS::Service can support
And behind the scene the |
To break the circular dependency, maybe we should read the cluster name from parameter store rather than the aws-cdk/packages/@aws-cdk/aws-ecs/lib/cluster.ts Lines 197 to 198 in dac9bb3
|
parameter store doesn't seem to work. Trying to use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for contributing❤️ I believe this will be very good news for our ECS customers!
const cluster = new ecs.Cluster(stack, 'Cluster', { vpc }); | ||
|
||
// create the 1st capacity provider with on-demand t3.large instances | ||
const cp = cluster.addCapacityProvider('CP', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more general question: is it possible to merge addCapacityProviderConfiguration
into addCapacityProvider
so that users don't need to create a capacity provider then register it to a cluster? Because I would expect cluster.addCapacityProvider
already create and register a capacity provider to that cluster and return the created capacity provider (which is similar to what users would expect for cluster.addCapacity
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more general question: is it possible to merge
addCapacityProviderConfiguration
intoaddCapacityProvider
so that users don't need to create a capacity provider then register it to a cluster? Because I would expectcluster.addCapacityProvider
already create and register a capacity provider to that cluster and return the created capacity provider (which is similar to what users would expect forcluster.addCapacity
)
I have been exploring on this but haven't make it yet. Let me explain my current status and I am looking for your advice.
addCapacityProvider()
is easy as we just need to create the AWS::ECS::CapacityProvider
resource, but registering multiple capacity providers to the cluster is still challenging. We probably have two options here:
- simply associate the cp to the cluster in the
CapacityProvider
andDefaultCapacityProviderStrategy
properties of theAWS::ECS::Cluster
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ecs-cluster.html#cfn-ecs-cluster-capacityproviders
However, we'll have circular dependency(aws/containers-roadmap#631 (comment)). An option is like this aws/containers-roadmap#631 (comment) but it might cause breaking changes.
Another solution could be to get the cluster name from the parameter store with the parameter name like ${stackName}-${id}-clusterName
and run the aws ssm get-parameters
in the UserData of the instance node. But I haven't tried this yet. Still a breaking change though but might be a better solution than above.
- The second options is what I've been working on now. I am creating a new
CapacityProviderConfiguration
custom resource. When weaddCapacityProvider()
, the cp will be added intocapacityProviders: CapacityProviders[]
of the cluster instance and on cluster creation, we synthesize all the added capacity providers and run thenew CapacityProviderConfiguration
inside theCluster
class, but in this case theCapacityProviderConfiguration
custom resource can hardly addDependency on all the capacity providers, which will lead to the error below:
Failed to delete resource. Error: An error occurred (UpdateInProgressException) when calling the PutClusterCapac
ityProviders operation: The specified cluster is in a busy state. Cluster attachments must be in UPDATE_COMPLETE
or UPDATE_FAILED state before they can be updated. Wait and try again.
I believe the 2nd option is the way to go, but I am still struggling with how to addDependency on the capacity providers for the CapacityProviderConfiguration
resource to ensure all CPs are ready before we configure them to the cluster.
I am not sure if I am on the right track and would be appreciated if you can share your feedbacks. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE the circular dependency.
I believe we need break the following 2 circular dependencies before we can simply register the CP to the cluster from native cfn's perspective.
cluster -> cp -> addCapacity() -> addAutoScalingGroup() -> cluster
aws-cdk/packages/@aws-cdk/aws-ecs/lib/cluster.ts
Lines 198 to 199 in d31d4db
// Tie instances to cluster | |
autoScalingGroup.addUserData(`echo ECS_CLUSTER=${this.clusterName} >> /etc/ecs/ecs.config`); |
cluster -> cp -> addCapacity() -> addAutoScalingGroup() -> InstanceDrainHook -> cluster
aws-cdk/packages/@aws-cdk/aws-ecs/lib/cluster.ts
Lines 258 to 265 in d31d4db
if (!options.taskDrainTime || options.taskDrainTime.toSeconds() !== 0) { | |
new InstanceDrainHook(autoScalingGroup, 'DrainECSHook', { | |
autoScalingGroup, | |
cluster: this, | |
drainTime: options.taskDrainTime, | |
topicEncryptionKey: options.topicEncryptionKey, | |
}); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Now I am able to resolve the two circular dependencies.
Will add comments inline for discussion.
|
||
/** | ||
* Whether to enable the managed scaling. This value will be overrided to be True | ||
* if the `managedTerminationProtection` is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if we should instead of overriding it to be true, we throw an error saying when managedTerminationProtection
is enabled managedScaling
cannot be disabled sth like that. And also have a test case for this error.
The reason is because default values for both managedTerminationProtection
and managedScaling
are true, and when a user set managedScaling
to be false which means they don't want it to be enabled. Given that if we override managedScaling
to be true then it might result in sth that users don't expect (or contradictory to their intention).
@MrArnoldPalmer do you have any idea on how to resolve the resource updating issue? |
Co-authored-by: Penghao He <[email protected]>
Co-authored-by: Penghao He <[email protected]>
const cluster = new ecs.Cluster(stack, 'Cluster', { vpc }); | ||
|
||
// create the 1st capacity provider with on-demand t3.large instances | ||
cluster.addCapacityProvider('CP', { | ||
capacityOptions: { | ||
instanceType: new ec2.InstanceType('t3.large'), | ||
minCapacity: 2, | ||
}, | ||
managedScaling: true, | ||
managedTerminationProtection: true, | ||
defaultStrategy: { base: 1, weight: 1 }, | ||
}); | ||
|
||
// create the 2nd capacity provider with ec2 spot t3.large instances | ||
cluster.addCapacityProvider('CPSpot', { | ||
capacityOptions: { | ||
instanceType: new ec2.InstanceType('t3.large'), | ||
minCapacity: 1, | ||
spotPrice: '0.1', | ||
}, | ||
managedScaling: true, | ||
managedTerminationProtection: true, | ||
defaultStrategy: { weight: 3 }, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integ testing is doing great now. But fails to destroy with the errors:
Looks like there are some constraints:
- When deleting the
AWS::ECS::Cluster
, all container instances should not beactive
ordraining
. - Cannot delete a capacity provider while it is associated with a cluster.
It's interesting as the two conditions look like a mutual conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we dissociate the cp with the cluster then try to delete the cp then delete the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When deleting the AWS::ECS::Cluster, all container instances should not be active or draining.
Do you think this has sth to do with removing dependency for InstanceDrainHook on ecs cluster?
packages/@aws-cdk/aws-ecs/lib/instance-protection-handler/index.py
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-ecs/lib/instance-protection-handler/index.py
Outdated
Show resolved
Hide resolved
const cluster = new ecs.Cluster(stack, 'Cluster', { vpc }); | ||
|
||
// create the 1st capacity provider with on-demand t3.large instances | ||
cluster.addCapacityProvider('CP', { | ||
capacityOptions: { | ||
instanceType: new ec2.InstanceType('t3.large'), | ||
minCapacity: 2, | ||
}, | ||
managedScaling: true, | ||
managedTerminationProtection: true, | ||
defaultStrategy: { base: 1, weight: 1 }, | ||
}); | ||
|
||
// create the 2nd capacity provider with ec2 spot t3.large instances | ||
cluster.addCapacityProvider('CPSpot', { | ||
capacityOptions: { | ||
instanceType: new ec2.InstanceType('t3.large'), | ||
minCapacity: 1, | ||
spotPrice: '0.1', | ||
}, | ||
managedScaling: true, | ||
managedTerminationProtection: true, | ||
defaultStrategy: { weight: 3 }, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we dissociate the cp with the cluster then try to delete the cp then delete the cluster?
Encountered circular dependency because an iam policy is referring to aws-cdk/packages/@aws-cdk/aws-ecs/lib/cluster.ts Lines 115 to 119 in 64798b1
|
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Hi! When would this be available? |
This was addressed by #14386. Closing. |
feat(ecs): Autoscaling Group Capacity Provider
This PR adds the Autoscaling Group Capacity Provider support for Amazon ECS
Closes: #5471
Considerations
addCapacityProvider
to create theCapacityProvider
resource. Behind the scene, itaddCapacity
for this cluster and use the retunedAutoscalingGroup
to provision theCapacityProvider
so we can leverage the existingCapacityOptions
.managedTerminationProtection
is enabled, which is the default behavior, behind the scene theCapacityProvider
construct will create aEnforcedInstanceProtection
custom resource which will do the following required steps on create:a. set
NewInstancesProtectedFromScaleIn=True
for this ASGb. set
ProtectedFromScaleIn=True
for all existing instances in this ASGmanagedTerminationProtection
is enabled, onCapacityProvider
resource deletion, theEnforcedInstanceProtection
custom resource which will do the following things:a. set
NewInstancesProtectedFromScaleIn=False
for this ASGb. set
ProtectedFromScaleIn=False
for all existing instances in this ASGIn this case if the ASG is going to terminate with the
CapacityProvider
, the instances can successfully be terminated otherwise the whole stack will be pending.CapacityProviderConfiguration
to break the circular dependency and configure the capacity providers as well as the strategies for the clusterKnown Issues
AWS::ECS::CapacityProvider
with the sameAutoscalingGroup
will encounter the error:Looks like when updating the
AWS::ECS::CapacityProvider
resource, a newAWS::ECS::CapacityProvider
with the same ASG provider will be created first for replacement and will immediately fail because the same ASG is not allowed for two CapacityProvider. No idea how to work it around at this moment.Sample
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license