Commit 3e1148e
authored
fix(aws-ecs): drain hook lambda allows tasks to stop gracefully (#13559)
fixes #13506
### Description
After the container instance is set to draining, the tasks running on it transition from RUNNING > DEACTIVATING > STOPPING > DEPROVISIONING > STOPPED.
The current way of counting running tasks via `instance['runningTasksCount'] + instance['pendingTasksCount']` does not include tasks in those transitional states, leading to the EC2 instance being terminated prematurely.
### Verification
I have verified the change by manually updating the automatically created drain hook lambda and then running a ASG refresh.
I ran the test with additional debug output to compare the old logic of `runningTasksCount + pendingTasksCount` and the new logic that fetches the status of the tasks.
I interleaved the logs from the ECS events, application running in the task and the drain hook lambda:
```
2021-03-11T15:56:52.608-08:00 Instance i-1234567890abcdefg has container instance ARN arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv
2021-03-11T15:56:52.649-08:00 Instance ARN arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has task ARNs arn:aws:ecs:us-west-2:123456789012:task/fooservice/1234567890abcdefghijklmnopqrstuv
2021-03-11T15:57:03.018-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:03.051-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:13.215-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:13.280-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:15.280-08:00 service fooservice has stopped 1 running tasks: task 1234567890abcdefghijklmnopqrstuv.
2021-03-11T15:57:23.438-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:23.490-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:33.632-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:33.690-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:43.853-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:43.890-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:46.000-08:00 service fooservice has started 1 tasks: task 1234567890abcdefghijklmnopqrstuv.
2021-03-11T15:57:46.000-08:00 (service fooservice, taskSet ecs-svc/1234567890abcdefghi) has begun draining connections on 2 tasks.
2021-03-11T15:57:46.000-08:00 service fooservice deregistered 1 targets in target-group fooservice-vpce-target
2021-03-11T15:57:46.000-08:00 service fooservice deregistered 1 targets in target-group fooservice-target
2021-03-11T15:57:54.032-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:57:54.090-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:57:58.000-08:00 service fooservice registered 1 targets in target-group fooservice-vpce-target
2021-03-11T15:57:58.000-08:00 service fooservice registered 1 targets in target-group fooservice-target
2021-03-11T15:58:04.242-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:04.270-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:14.430-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:14.470-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:24.611-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:24.650-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:34.796-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:34.850-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:44.999-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:45.030-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks
2021-03-11T15:58:49.000-08:00 app received SIGTERM
2021-03-11T15:58:54.000-08:00 service fooservice has reached a steady state.
2021-03-11T15:58:55.170-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:55.210-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks
2021-03-11T15:58:55.210-08:00 Terminating instance i-1234567890abcdefg
```
The logs show that the new approach allows ecs to drain connections, deregister the target and respect the `deregistrationDelay` ( set to 1 minute in this case ).
The old approach would have terminated the EC2 instance 23 seconds prior to ECS even deregistering the target, leading to 502 errors.
### Pull Request Checklist
- [x] Testing
I was not able to find any tests validating the functionality of the lambda. However, I have updated `expected.json` files to expect the new lambda function code.
- [ ] Docs - *Not Applicable*
No previously documented behavior has changed
- [x] Title and Description
- [ ] Sensitive Modules (requires 2 PR approvers) - *Not Applicable*
### Impact
End users utilizing ECS on EC2 with capacity provided by an ASG will see an increase in instance termination time, however the process is now much safer, respects the ALBs `deregistrationDelay` and will reduce connection errors.
----
*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*1 parent 62a91b7 commit 3e1148e
File tree
17 files changed
+44
-24
lines changed- packages/@aws-cdk
- aws-ecs-patterns/test/ec2
- aws-ecs
- lib/drain-hook/lambda-source
- test/ec2
- aws-events-targets/test/ecs
- aws-stepfunctions-tasks/test/ecs
17 files changed
+44
-24
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
717 | 717 | | |
718 | 718 | | |
719 | 719 | | |
720 | | - | |
| 720 | + | |
721 | 721 | | |
722 | 722 | | |
723 | 723 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
534 | 534 | | |
535 | 535 | | |
536 | 536 | | |
537 | | - | |
| 537 | + | |
538 | 538 | | |
539 | 539 | | |
540 | 540 | | |
| |||
Lines changed: 27 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
24 | 29 | | |
25 | 30 | | |
26 | 31 | | |
| |||
40 | 45 | | |
41 | 46 | | |
42 | 47 | | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
43 | 52 | | |
44 | | - | |
| 53 | + | |
45 | 54 | | |
46 | 55 | | |
47 | 56 | | |
| |||
53 | 62 | | |
54 | 63 | | |
55 | 64 | | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
61 | 81 | | |
62 | 82 | | |
63 | 83 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
696 | 696 | | |
697 | 697 | | |
698 | 698 | | |
699 | | - | |
| 699 | + | |
700 | 700 | | |
701 | 701 | | |
702 | 702 | | |
| |||
0 commit comments