Adding PartOf in ECS Service's Docker Service dependency #4277
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Update ECS Service to be closer to pre-1.85.0 behavior while maintaining critical aspects of resiliency introduced here: #4233
Implementation details
Added
PartOf=docker.service
to the following files:packaging/amazon-linux-ami-integrated/ecs.service
packaging/generic-deb-integrated/debian/ecs.service
packaging/generic-rpm-integrated/ecs.service
This addition ensures that
ecs.service
stops when docker is stopped manually and does not automatically come back up. It will also ensure that restarting docker will restartecs.service
as it used to before 1.85.0.Testing
Setup
amzn2-ami-ecs-hvm-2.0.20230428-x86_64-ebs
as a template.Behavior
Verified that behavior was changed to pre-1.85.0 behavior [without reverting resiliency]:
Resiliency
Simulated race condition that would cause the issue mentioned/addressed in this pull request: #4233.
Simulated a docker socket failure while launching ecs with the following command on a test instance with the specified custom AMI:
sudo systemctl start ecs & sudo systemctl restart docker.socket &
This simulates a failure because occasionally, since these are running in parallel,
ecs.service
will fail whendocker.socket
is restarting right while ecs is checking for the docker dependency.The result was a success, as
ecs.service
came back up automatically.In comparison, this old AMI without the updated package, in the same failure scenario, was not able to bring
ecs.service
back up:Load Testing
Launched 15000 instances with the specified custom AMI and verified that all 15000 of them were able to register to ECS.
New tests cover the changes: yes
Description for the changelog
Revert ecs.service behavior while maintaining resiliency
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.