-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug where pause container was not always cleaned up #2824
Fix bug where pause container was not always cleaned up #2824
Conversation
8f97f8b
to
23db43c
Compare
// checkTearDownPauseContainer idempotently tears down the pause container network when the pause container's known | ||
//or desired status is stopped. | ||
func (engine *DockerTaskEngine) checkTearDownPauseContainer(task *apitask.Task) { | ||
for _, container := range task.Containers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of iterating through all containers, can we check if task is not awsvpc enabled (GetENI == nil) and return quickly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it time or readability that would be improved? In terms of time, there can be a max of 10 containers so we would gain a few nanoseconds :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a practice, just like to return quickly wherever possible :) also, containers count could be increased in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. changed.
23db43c
to
4229491
Compare
Summary
This fixes a bug that can prevent the pause container network to be cleaned up when ECS Agent is restarted due to
dockerd
restarts. Failing to clean the pause container network might cause issues for future tasks to start.When
dockerd
is restarted, all running containers (along with ECS Agent) are stopped. Afterdockerd
and ECS Agent restart successfully, the latter proceeds to stop all the tasks that were running previous to the restart since it realizes the respective containers are no longer running. When this happens, the pause container network is not cleaned up since the workflow is different to the normal operations.Implementation details
Do a check after the managed tasks is stopped to see if the pause container still needs to be cleaned up. Most of the times the pause container has already been cleaned up by the time the managed task stops, but as mentioned above, there are scenarios where this might not happen.
Since now we are invoking container cleanup in more than one place (previously it only happened in
stopContainer
method), the operation was made idempotent by storing a flag in the container to indicate whether that container was torn down.For now that flag is only used for the pause container, but might be useful for other containers that need explicit teardown in the future.
Testing
There are a number of unit and integration tests that check for proper pause container cleanup. Those tests should still pass since the previous logic is not modified and a few new assertions are added to check if the new tear down flag was set.
In addition, manually tested that pause container network is cleaned under normal circumstances as well as after docker restarts.
New tests cover the changes: yes
Description for the changelog
Fix bug where pause container was not always cleaned up
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.