Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue where agent could create the same container twice #939

Merged
merged 1 commit into from
Sep 13, 2017

Conversation

richardpen
Copy link

@richardpen richardpen commented Aug 23, 2017

Summary

Fix the issue if agent was restarted before the docker create API returned, the agent could create duplicate containers for the task.

Implementation details

Originally the agent stores the generated container name in memory before actually calling CreateContainer API, see here. And this information was only stored in the taskToID struct, due to lacking of docker ID here. But the taskToID struct wasn't stored in the state file by statemanager, so when the agent restart it will lose this information and then it will try to create the container with another name again.

This PR make sure the the generated container name can be restored from agent state file if agent restarted before the CreateContainer API returned. Thus ensure the container won't be created twice. Also the agent would detect the container if it was created during the period of restarting, see here.

Testing

  • Builds on Linux (make release)
  • Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
  • Unit tests on Linux (make test) pass
  • Unit tests on Windows (go test -timeout=25s ./agent/...) pass
  • Integration tests on Linux (make run-integ-tests) pass
  • Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
  • Functional tests on Linux (make run-functional-tests) pass
  • Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass
  • Manually verified the state when agent was restarted at the create.

New tests cover the changes:
yes

Description for the changelog

Bug - Fix an issue where a container could be created twice.

Licensing

This contribution is under the terms of the Apache 2.0 License:
yes

@@ -236,6 +236,7 @@ func (engine *DockerTaskEngine) synchronizeState() {
log.Warn("Could not find matching container for expected", "name", cont.DockerName)
} else {
cont.DockerID = describedCont.ID
cont.Container.SetKnownStatus(api.ContainerCreated)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to look here and set the state based on what comes back. It could be created, running, or stopped.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it falls into this condition dockerID=="", the container status shouldn't be > created. Otherwise the dockerID field won't be empty. But have a check here will definitely add some confidence.

@@ -562,9 +563,17 @@ func (engine *DockerTaskEngine) createContainer(task *api.Task, container *api.C
// we have to do this in create, not start, because docker no longer handles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this comment about HostConfig above line 579 instead of here? It's less clear what the comment is applying to now.

Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. please address @samuelkarp's comment.

for _, v := range containerMap {
if v.Container.Name == container.Name {
dockerContainerName = v.DockerName
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should/can add a break here as well.

@richardpen richardpen force-pushed the fix-duplicate-create branch 4 times, most recently from 7bb0ffa to 1bec8f9 Compare August 29, 2017 20:50
@@ -250,18 +250,25 @@ func (state *DockerTaskEngineState) AddContainer(container *api.DockerContainer,
}

if container.DockerID != "" {
// Update the container id to the state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a unit test covering the if and else if code paths?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

@jhaynes
Copy link
Contributor

jhaynes commented Aug 31, 2017

Can you update the CHANGELOG.md also as part of this PR?

@richardpen richardpen force-pushed the fix-duplicate-create branch from 1bec8f9 to bd3883a Compare August 31, 2017 23:26
CHANGELOG.md Outdated
@@ -1,5 +1,8 @@
# Changelog

##
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave this as blank? We could consider adding a tentative version string with a * annotation to indicate that this is not yet released (Example: 1.14.5* or 1.15.0*). Or, we can add a placeholder release tag (Example: UNRELEASED). What do you think?

cc @jhaynes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with a placeholder for now, and we'll fill in the actual version as part of the release process. "UNRELEASED", "IN-PROGRESS", or "PENDING" would be my suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote "UNRELEASED"


// TestAddContainer tests first add container with docker name and
// then add the container with dockerID
func TestAddContainer(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename the test to be more descriptive? You can use what you have written in the comments as well. Example:TestAddContainerNameAndID.

@richardpen richardpen force-pushed the fix-duplicate-create branch 2 times, most recently from 0a62385 to b18e674 Compare September 1, 2017 17:52
Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the minor comments I made before merging this.

CHANGELOG.md Outdated
@@ -1,5 +1,8 @@
# Changelog

## UNRELEASED
* Bug - Fix a race condition where a container can be created twice when agent restart. [#939](https://github.com/aws/amazon-ecs-agent/pull/939)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: restarts

// we die before 'createContainer' returns because we can inspect by
// name
engine.state.AddContainer(&api.DockerContainer{DockerName: dockerContainerName, Container: container}, task)
seelog.Infof("Created container name mapping for task %s - %s -> %s", task, container, dockerContainerName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change task to task.Arn? We don't need the whole task to be logged here.

    Save the generated docker container name and when agent restart
    restore the agent state, when agent try to create the container it
    will reuse the name so that no duplicate container will be created.
@richardpen richardpen merged commit 49c36c7 into aws:dev Sep 13, 2017
@jhaynes jhaynes mentioned this pull request Sep 30, 2017
@richardpen richardpen deleted the fix-duplicate-create branch November 21, 2017 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants