Skip to content

Conversation

@Bio2hazard
Copy link
Contributor

@Bio2hazard Bio2hazard commented Mar 12, 2021

fixes #13506

Description

After the container instance is set to draining, the tasks running on it transition from RUNNING > DEACTIVATING > STOPPING > DEPROVISIONING > STOPPED.

The current way of counting running tasks via instance['runningTasksCount'] + instance['pendingTasksCount'] does not include tasks in those transitional states, leading to the EC2 instance being terminated prematurely.

Verification

I have verified the change by manually updating the automatically created drain hook lambda and then running a ASG refresh.

I ran the test with additional debug output to compare the old logic of runningTasksCount + pendingTasksCount and the new logic that fetches the status of the tasks.

I interleaved the logs from the ECS events, application running in the task and the drain hook lambda:

2021-03-11T15:56:52.608-08:00	Instance i-1234567890abcdefg has container instance ARN arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv
2021-03-11T15:56:52.649-08:00	Instance ARN arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has task ARNs arn:aws:ecs:us-west-2:123456789012:task/fooservice/1234567890abcdefghijklmnopqrstuv
2021-03-11T15:57:03.018-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:03.051-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:13.215-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:13.280-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:15.280-08:00 	service fooservice has stopped 1 running tasks: task 1234567890abcdefghijklmnopqrstuv.
2021-03-11T15:57:23.438-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:23.490-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:33.632-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:33.690-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:43.853-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:43.890-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:46.000-08:00 	service fooservice has started 1 tasks: task 1234567890abcdefghijklmnopqrstuv.
2021-03-11T15:57:46.000-08:00 	(service fooservice, taskSet ecs-svc/1234567890abcdefghi) has begun draining connections on 2 tasks.
2021-03-11T15:57:46.000-08:00 	service fooservice deregistered 1 targets in target-group fooservice-vpce-target
2021-03-11T15:57:46.000-08:00 	service fooservice deregistered 1 targets in target-group fooservice-target
2021-03-11T15:57:54.032-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:54.090-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:58.000-08:00 	service fooservice registered 1 targets in target-group fooservice-vpce-target
2021-03-11T15:57:58.000-08:00 	service fooservice registered 1 targets in target-group fooservice-target
2021-03-11T15:58:04.242-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:04.270-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:14.430-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:14.470-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:24.611-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:24.650-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:34.796-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:34.850-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:44.999-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:45.030-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:49.000-08:00 	app received SIGTERM
2021-03-11T15:58:54.000-08:00 	service fooservice has reached a steady state.
2021-03-11T15:58:55.170-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:55.210-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:55.210-08:00	Terminating instance i-1234567890abcdefg

The logs show that the new approach allows ecs to drain connections, deregister the target and respect the deregistrationDelay ( set to 1 minute in this case ).

The old approach would have terminated the EC2 instance 23 seconds prior to ECS even deregistering the target, leading to 502 errors.

Pull Request Checklist

  • Testing
    I was not able to find any tests validating the functionality of the lambda. However, I have updated expected.json files to expect the new lambda function code.
  • Docs - Not Applicable
    No previously documented behavior has changed
  • Title and Description
  • Sensitive Modules (requires 2 PR approvers) - Not Applicable

Impact

End users utilizing ECS on EC2 with capacity provided by an ASG will see an increase in instance termination time, however the process is now much safer, respects the ALBs deregistrationDelay and will reduce connection errors.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

…3506

fixes aws#13506

After the container instance is set to draining, the tasks running on it transition from RUNNING > DEACTIVATING > STOPPING > DEPROVISIONING > STOPPED.

The current way of counting running tasks via `instance['runningTasksCount'] + instance['pendingTasksCount']` does not include tasks in those transitional states, leading to the EC2 instance being terminated prematurely.

No unit tests were added as the lambda code is not currently covered by any tests I could find.

I have verified the change by manually updating the automatically created drain hook lambda and then running a ASG refresh.

I ran the test with additional debug output to compare the old logic of `runningTasksCount + pendingTasksCount` and the new logic that fetches the status of the tasks.

I interleaved the logs from the ECS events, application running in the task and the drain hook lambda:

```
2021-03-11T15:56:52.608-08:00	Instance i-1234567890abcdefg has container instance ARN arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv
2021-03-11T15:56:52.649-08:00	Instance ARN arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has task ARNs arn:aws:ecs:us-west-2:123456789012:task/fooservice/1234567890abcdefghijklmnopqrstuv
2021-03-11T15:57:03.018-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:03.051-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:13.215-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:13.280-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:15.280-08:00 	service fooservice has stopped 1 running tasks: task 1234567890abcdefghijklmnopqrstuv.
2021-03-11T15:57:23.438-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:23.490-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:33.632-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:33.690-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:43.853-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:43.890-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:46.000-08:00 	service fooservice has started 1 tasks: task 1234567890abcdefghijklmnopqrstuv.
2021-03-11T15:57:46.000-08:00 	(service fooservice, taskSet ecs-svc/1234567890abcdefghi) has begun draining connections on 2 tasks.
2021-03-11T15:57:46.000-08:00 	service fooservice deregistered 1 targets in target-group fooservice-vpce-target
2021-03-11T15:57:46.000-08:00 	service fooservice deregistered 1 targets in target-group fooservice-target
2021-03-11T15:57:54.032-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:54.090-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:58.000-08:00 	service fooservice registered 1 targets in target-group fooservice-vpce-target
2021-03-11T15:57:58.000-08:00 	service fooservice registered 1 targets in target-group fooservice-target
2021-03-11T15:58:04.242-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:04.270-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:14.430-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:14.470-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:24.611-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:24.650-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:34.796-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:34.850-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:44.999-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:45.030-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:49.000-08:00 	app received SIGTERM
2021-03-11T15:58:54.000-08:00 	service fooservice has reached a steady state.
2021-03-11T15:58:55.170-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:55.210-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:55.210-08:00	Terminating instance i-1234567890abcdefg
```
@gitpod-io
Copy link

gitpod-io bot commented Mar 12, 2021

@github-actions github-actions bot added the @aws-cdk/aws-lambda Related to AWS Lambda label Mar 12, 2021
I noticed that the expected output tests contained the lambda code, so I updated them accordingly.
@nija-at
Copy link
Contributor

nija-at commented Mar 17, 2021

Hello -

Sorry about the delay in reviewing PRs. We are experiencing an increased backlog of items that need our attention.
We will get to your PR as soon as we are able to. Apologies for the delay.

@nija-at nija-at added @aws-cdk/aws-ecs Related to Amazon Elastic Container and removed @aws-cdk/aws-lambda Related to AWS Lambda labels Mar 18, 2021
@nija-at nija-at assigned MrArnoldPalmer and unassigned nija-at Mar 18, 2021
@SoManyHs SoManyHs changed the title fix(aws-ecs): drain hook lambda allows tasks to stop gracefully #13506 fix(aws-ecs): drain hook lambda allows tasks to stop gracefully Mar 18, 2021
@github-actions github-actions bot added the @aws-cdk/aws-lambda Related to AWS Lambda label Mar 18, 2021
@nija-at nija-at removed the @aws-cdk/aws-lambda Related to AWS Lambda label Mar 19, 2021
@nija-at nija-at removed their assignment Mar 19, 2021
SoManyHs
SoManyHs previously approved these changes Mar 19, 2021
Copy link
Contributor

@SoManyHs SoManyHs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, thank you so much for this fix! Definitely an oversight to not count any task in a transitional state -- I think that some of those states were added the original lambda integration was added to the ECS construct library. Your contribution is very much appreciated!

@SoManyHs
Copy link
Contributor

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Mar 19, 2021

Command update: success

Branch already up to date

@mergify mergify bot dismissed SoManyHs’s stale review March 19, 2021 23:42

Pull request has been modified.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
  • Commit ID: 4164737
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Mar 20, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 3e1148e into aws:master Mar 20, 2021
eladb pushed a commit that referenced this pull request Mar 24, 2021
fixes #13506

### Description 

After the container instance is set to draining, the tasks running on it transition from RUNNING > DEACTIVATING > STOPPING > DEPROVISIONING > STOPPED.

The current way of counting running tasks via `instance['runningTasksCount'] + instance['pendingTasksCount']` does not include tasks in those transitional states, leading to the EC2 instance being terminated prematurely.


### Verification 

I have verified the change by manually updating the automatically created drain hook lambda and then running a ASG refresh.

I ran the test with additional debug output to compare the old logic of `runningTasksCount + pendingTasksCount` and the new logic that fetches the status of the tasks.

I interleaved the logs from the ECS events, application running in the task and the drain hook lambda:

```
2021-03-11T15:56:52.608-08:00	Instance i-1234567890abcdefg has container instance ARN arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv
2021-03-11T15:56:52.649-08:00	Instance ARN arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has task ARNs arn:aws:ecs:us-west-2:123456789012:task/fooservice/1234567890abcdefghijklmnopqrstuv
2021-03-11T15:57:03.018-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:03.051-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:13.215-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:13.280-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:15.280-08:00 	service fooservice has stopped 1 running tasks: task 1234567890abcdefghijklmnopqrstuv.
2021-03-11T15:57:23.438-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:23.490-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:33.632-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:33.690-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:43.853-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:43.890-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:46.000-08:00 	service fooservice has started 1 tasks: task 1234567890abcdefghijklmnopqrstuv.
2021-03-11T15:57:46.000-08:00 	(service fooservice, taskSet ecs-svc/1234567890abcdefghi) has begun draining connections on 2 tasks.
2021-03-11T15:57:46.000-08:00 	service fooservice deregistered 1 targets in target-group fooservice-vpce-target
2021-03-11T15:57:46.000-08:00 	service fooservice deregistered 1 targets in target-group fooservice-target
2021-03-11T15:57:54.032-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:54.090-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:58.000-08:00 	service fooservice registered 1 targets in target-group fooservice-vpce-target
2021-03-11T15:57:58.000-08:00 	service fooservice registered 1 targets in target-group fooservice-target
2021-03-11T15:58:04.242-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:04.270-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:14.430-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:14.470-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:24.611-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:24.650-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:34.796-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:34.850-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:44.999-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:45.030-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:49.000-08:00 	app received SIGTERM
2021-03-11T15:58:54.000-08:00 	service fooservice has reached a steady state.
2021-03-11T15:58:55.170-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:55.210-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:55.210-08:00	Terminating instance i-1234567890abcdefg
```

The logs show that the new approach allows ecs to drain connections, deregister the target and respect the `deregistrationDelay` ( set to 1 minute in this case ). 

The old approach would have terminated the EC2 instance 23 seconds prior to ECS even deregistering the target, leading to 502 errors.

### Pull Request Checklist
- [x]  Testing 
    I was not able to find any tests validating the functionality of the lambda. However, I have updated `expected.json` files to expect the new lambda function code.
- [ ] Docs - *Not Applicable*
    No previously documented behavior has changed
- [x] Title and Description 
- [ ] Sensitive Modules (requires 2 PR approvers) - *Not Applicable*

### Impact
End users utilizing ECS on EC2 with capacity provided by an ASG will see an increase in instance termination time, however the process is now much safer, respects the ALBs `deregistrationDelay` and will reduce connection errors.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
hollanddd pushed a commit to hollanddd/aws-cdk that referenced this pull request Aug 26, 2021
…13559)

fixes aws#13506

### Description 

After the container instance is set to draining, the tasks running on it transition from RUNNING > DEACTIVATING > STOPPING > DEPROVISIONING > STOPPED.

The current way of counting running tasks via `instance['runningTasksCount'] + instance['pendingTasksCount']` does not include tasks in those transitional states, leading to the EC2 instance being terminated prematurely.


### Verification 

I have verified the change by manually updating the automatically created drain hook lambda and then running a ASG refresh.

I ran the test with additional debug output to compare the old logic of `runningTasksCount + pendingTasksCount` and the new logic that fetches the status of the tasks.

I interleaved the logs from the ECS events, application running in the task and the drain hook lambda:

```
2021-03-11T15:56:52.608-08:00	Instance i-1234567890abcdefg has container instance ARN arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv
2021-03-11T15:56:52.649-08:00	Instance ARN arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has task ARNs arn:aws:ecs:us-west-2:123456789012:task/fooservice/1234567890abcdefghijklmnopqrstuv
2021-03-11T15:57:03.018-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:03.051-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:13.215-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:13.280-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:15.280-08:00 	service fooservice has stopped 1 running tasks: task 1234567890abcdefghijklmnopqrstuv.
2021-03-11T15:57:23.438-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:23.490-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:33.632-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:33.690-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:43.853-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:43.890-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:46.000-08:00 	service fooservice has started 1 tasks: task 1234567890abcdefghijklmnopqrstuv.
2021-03-11T15:57:46.000-08:00 	(service fooservice, taskSet ecs-svc/1234567890abcdefghi) has begun draining connections on 2 tasks.
2021-03-11T15:57:46.000-08:00 	service fooservice deregistered 1 targets in target-group fooservice-vpce-target
2021-03-11T15:57:46.000-08:00 	service fooservice deregistered 1 targets in target-group fooservice-target
2021-03-11T15:57:54.032-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:54.090-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:58.000-08:00 	service fooservice registered 1 targets in target-group fooservice-vpce-target
2021-03-11T15:57:58.000-08:00 	service fooservice registered 1 targets in target-group fooservice-target
2021-03-11T15:58:04.242-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:04.270-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:14.430-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:14.470-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:24.611-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:24.650-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:34.796-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:34.850-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:44.999-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:45.030-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:49.000-08:00 	app received SIGTERM
2021-03-11T15:58:54.000-08:00 	service fooservice has reached a steady state.
2021-03-11T15:58:55.170-08:00	OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:55.210-08:00	NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:55.210-08:00	Terminating instance i-1234567890abcdefg
```

The logs show that the new approach allows ecs to drain connections, deregister the target and respect the `deregistrationDelay` ( set to 1 minute in this case ). 

The old approach would have terminated the EC2 instance 23 seconds prior to ECS even deregistering the target, leading to 502 errors.

### Pull Request Checklist
- [x]  Testing 
    I was not able to find any tests validating the functionality of the lambda. However, I have updated `expected.json` files to expect the new lambda function code.
- [ ] Docs - *Not Applicable*
    No previously documented behavior has changed
- [x] Title and Description 
- [ ] Sensitive Modules (requires 2 PR approvers) - *Not Applicable*

### Impact
End users utilizing ECS on EC2 with capacity provided by an ASG will see an increase in instance termination time, however the process is now much safer, respects the ALBs `deregistrationDelay` and will reduce connection errors.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

@aws-cdk/aws-ecs Related to Amazon Elastic Container

Projects

None yet

Development

Successfully merging this pull request may close these issues.

aws-ecs: Drain Hook Lambda does not wait for instances to finish draining

5 participants