Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engine: wire context into methods that invoke Docker API call #1329

Merged
merged 1 commit into from
Apr 13, 2018

Conversation

haikuoliu
Copy link
Contributor

@haikuoliu haikuoliu commented Apr 4, 2018

This pr is to fix issue #1212.

Summary

Pass the context into the Docker API call instead of creating context inside Docker API call.

Testing

  • Builds on Linux (make release)
  • Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
  • Unit tests on Linux (make test) pass
  • Unit tests on Windows (go test -timeout=25s ./agent/...) pass
  • Integration tests on Linux (make run-integ-tests) pass
  • Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
  • Functional tests on Linux (make run-functional-tests) pass
  • Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass

New tests cover the changes: yes

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@petderek petderek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

ctx, cancel := context.WithTimeout(context.TODO(), timeout)
defer cancel()

timeout time.Duration,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this 'timeout' we have the context. Its only used in a log function -- lets remove it from the function signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When returning a timeout error, the timeout value is needed, how to get this value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want to print the timeout as part of the error, we should assemble it in the caller (which already has both of these pieces of information).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think logging the timeout is important, because most of the timeout values are not defined by users, so they have no idea how long they've waited for the API call in the most of the time.

But assembling the error from the caller requires at least three lines of the same code, and this same piece of code will be spread wherever the Dock client API is called, and will require more code change, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I looked at the calling class (which is embedded in the transition function map) and I don't think it makes sense to change that at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, cool, I'll make the change accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, lets update the documentation for all of these methods that timeout is only used to generate the error and shouldn't be used for anything else. We can also add a TODO for changing these methods to instead accept an error type or error string as a param to use when timeout error occurs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good.

defer cancel()

timeout time.Duration,
ctx context.Context) DockerContainerMetadata {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think switching on this error type is a good abstraction. The caller of this function holds onto the context -- they already know if the context was cancelled instead of timed out.

We should just return the same kind of error here -- that will simplify your testing in this case.

 	case <-ctx.Done():
 		err := ctx.Err()
 		if err == context.DeadlineExceeded {
 			return DockerContainerMetadata{Error: &DockerTimeoutError{timeout, "created"}}
 		}
                return DockerContainerMetadata{Error: &CannotCreateContainerError{err}}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable, so can I just return the timeout error here without checking the error type from the context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precisely!

StartContainer(string, time.Duration) DockerContainerMetadata
// StartContainer starts the container identified by the name provided. A timeout value and a context should be
// provided for the request.
StartContainer(string, time.Duration, context.Context) DockerContainerMetadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is all repetitive as far as I can tell. Lets get rid of time.Duration in all of the signatures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comments above.

Copy link

@richardpen richardpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain which go routine won't return?

defer mockCtrl.Finish()
mockDone := make(chan struct{})
mockContext := NewMockContext(mockCtrl)
mockContext.EXPECT().Done().Return(mockDone).AnyTimes()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why those are AnyTimes, shouldn't this be called just once or twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be called twice, because the CreateContainer will also call InspectContainer where a new child context will be generated, but test using mock context will be deleted anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion: The number of times context is invoked is an implementation detail and not part of tested function's contract. IE, its not something we would want the test to fail for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with Derek's opinion, I didn't imagine that I may have this kind of problem before implementation.

@haikuoliu
Copy link
Contributor Author

haikuoliu commented Apr 5, 2018

Situation when the go routine problem happens in test:
Let's take the TestCreateContainerTimeout for example.

In CreateContainer (call it goroutine1) it will create another go routine (call it goroutine2) to execute createContainer, and goroutine2 will execute the real Docker API call, the one that we've mocked out. The problem is goroutine2 will continue to execute after the mocked real Docker API call is done, and thus call containerMetadata and InspectContainer, which will result in a missing call error if we don't handle this correctly.

The previous implementation is use a channel to make the goroutine2 hang there forever, and eventually killed by GC (I guess), but it's not a good thing to keep so many go routines, so my following commit will handle this to make sure the goroutine2 exit appropriately.

@haikuoliu haikuoliu changed the title [WIP] engine: wire context into methods that invokes Docker API call [WIP] engine: wire context into methods that invoke Docker API call Apr 5, 2018
@haikuoliu haikuoliu force-pushed the branch_context_timeout branch from 5319a93 to 26a5aaf Compare April 6, 2018 02:18
@haikuoliu
Copy link
Contributor Author

Update: use real context in the test; for docker api call, return timeout error directly when context is done

@haikuoliu haikuoliu changed the title [WIP] engine: wire context into methods that invoke Docker API call engine: wire context into methods that invoke Docker API call Apr 6, 2018
CHANGELOG.md Outdated
@@ -1,6 +1,7 @@
# Changelog

## 1.17.3-dev
* Enhancement - Wire context into methods that invoke Docker API call to fix flaky tests [#1329](https://github.com/aws/amazon-ecs-agent/pull/1321)
Copy link
Contributor

@yhlee-aws yhlee-aws Apr 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we normally update Changelog for internal test fixes like this. Also, 1.17.3 just got released so it shouldn't be under 1.17.3-dev.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, only things that impact the customer go into the changelog. I don't think this warrants an entry in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will remove it.

@@ -117,8 +118,10 @@ func (manager *metadataManager) Create(config *docker.Config, hostConfig *docker

// Update updates the metadata file after container starts and dynamic metadata is available
func (manager *metadataManager) Update(dockerID string, task *api.Task, containerName string) error {
ctx, cancel := context.WithTimeout(context.TODO(), inspectContainerTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should modify Update( to expect an arg for context.Context as well. The DockerTaskEngine struct already has a context, which it can pass in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will change accordingly.

ctx, cancel := context.WithTimeout(context.TODO(), timeout)
defer cancel()

timeout time.Duration,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, lets update the documentation for all of these methods that timeout is only used to generate the error and shouldn't be used for anything else. We can also add a TODO for changing these methods to instead accept an error type or error string as a param to use when timeout error occurs.

// Context was canceled even though there was no timeout. Send
// back an error.
return DockerContainerMetadata{Error: &CannotCreateContainerError{err}}
// Context has expired, send back the DockerTimeoutError
Copy link
Contributor

@aaithal aaithal Apr 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this modified? I think there's a case to be made for the previous code, where we distinguish between cancelled and timed out contexts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Derek said, when the caller wants to cancel the context, the caller already know the reason, the logic to handle the timeout or cancel is nothing but report the reason of the error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not strictly true. If you look at https://github.com/aws/amazon-ecs-agent/pull/1329/files#diff-b9aa12c3425c34d6f330f28407f779f6R835, you'll see that the caller (owner of the context) is not the entity that's dealing with the error. I think there's value in the way we were previously handling errors until we figure out a better alternative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, cool, will change accordingly.

StopContainer(string, time.Duration) DockerContainerMetadata
// StopContainer stops the container identified by the name provided. A timeout value and a context should be provided
// for the request.
StopContainer(string, time.Duration, context.Context) DockerContainerMetadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, and in all methods, please make sure that context.Context is the first parameter to all the methods (which is the convention in golang)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@haikuoliu haikuoliu force-pushed the branch_context_timeout branch from 26a5aaf to 0d56a84 Compare April 8, 2018 06:07
@haikuoliu
Copy link
Contributor Author

Update: some function signatures and comments of Docker client API, change log, Update function in manager.go

@@ -361,18 +360,20 @@ func TestCreateContainerTimeout(t *testing.T) {
mockDocker, client, _, _, _, done := dockerClientSetup(t)
defer done()

warp := make(chan time.Time)
ctx, cancel := context.WithTimeout(context.TODO(), xContainerLongTimeout)
defer cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we expect the mock call to happen and we do a cancel() there, this cancel() can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will remove it.

if err == context.DeadlineExceeded {
return &DockerTimeoutError{removeContainerTimeout, "removing"}
}
return &CannotRemoveContainerError{err}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are not using this error type anymore, it can be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving it some thought, you can capture the error here and do

if err != nil {
    return &CannotRemoveContainerError{err}
}

@sharanyad sharanyad added this to the 1.18.0 milestone Apr 9, 2018
@haikuoliu haikuoliu force-pushed the branch_context_timeout branch from 0d56a84 to 25f0391 Compare April 10, 2018 17:13
@haikuoliu
Copy link
Contributor Author

haikuoliu commented Apr 10, 2018

Update: as suggested by @aaithal, add if check for timeout error back in docker client, and go with the old timeout tests because we don't want to simulate context timeout error using cancel. Also rebase the dev branch.

@@ -346,8 +348,9 @@ func (engine *DockerTaskEngine) synchronizeContainerStatus(container *api.Docker
}
return
}

currentState, metadata := engine.client.DescribeContainer(container.DockerID)
ctx, cancel := context.WithCancel(engine.ctx)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DescribeContainer is basically calling InspectContainer, you may need to pass inspectContainerTimeout to the context. Same below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the update for this PR is done, there is no need to pass the inspectContainerTimeout anymore.

LoadImage(io.Reader, time.Duration) error
// value and a context should be provided for the request.
RemoveImage(context.Context, string, time.Duration) error
LoadImage(context.Context, io.Reader, time.Duration) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: This'd also be a good opportunity add documentation for LoadImage, which is missing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added.

@@ -471,18 +472,13 @@ func (dg *dockerGoClient) getAuthdata(image string, authData *ecr.RegistryAuthen
return authConfig, nil
}

func (dg *dockerGoClient) CreateContainer(config *docker.Config,
// The timeout parameter here is only used to assemble the return error message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lint: Comment should start with // CreateContainer ... This applies to multiple methods in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done.

@@ -345,7 +345,9 @@ func (imageManager *dockerImageManager) deleteImage(imageID string, imageState *
return
}
seelog.Infof("Removing Image: %s", imageID)
err := imageManager.client.RemoveImage(imageID, removeImageTimeout)
ctx, cancel := context.WithTimeout(context.Background(), removeImageTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't be using context.Background() here. Instead, it should be wired all the way from performPeriodicImageCleanup().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, in the RemoveImage function of docker client, it uses context.Background() directly,
so I use context.Background() in the caller, you want me to do the change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please

@@ -346,8 +348,9 @@ func (engine *DockerTaskEngine) synchronizeContainerStatus(container *api.Docker
}
return
}

currentState, metadata := engine.client.DescribeContainer(container.DockerID)
ctx, cancel := context.WithCancel(engine.ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you doing this? Can't you just use engine.ctx directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change them all to engine.ctx.

@@ -392,7 +395,9 @@ func (engine *DockerTaskEngine) CheckTaskState(task *api.Task) {
if !ok {
continue
}
status, metadata := engine.client.DescribeContainer(dockerContainer.DockerID)
ctx, cancel := context.WithCancel(engine.ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above. Can't you just use engine.ctx directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above.

@@ -825,8 +830,9 @@ func (engine *DockerTaskEngine) createContainer(task *api.Task, container *api.C
task.Arn, container.Name, mderr)
}
}

metadata := client.CreateContainer(config, hostConfig, dockerContainerName, createContainerTimeout)
ctx, cancel := context.WithTimeout(engine.ctx, createContainerTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an optimal way of doing this. All of the responsibility of creating the child context for timing out should be happening within the docker_client methods. Please change all of these methods so that context creation happens within docker_client methods. Just that the parent for those contexts need to flow from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above.

@@ -48,7 +49,9 @@ func loadFromFile(path string, dockerClient engine.DockerClient, fs os.FileSyste
return errors.Wrapf(err,
"pause container load: failed to read pause container image: %s", path)
}
if err := dockerClient.LoadImage(pauseContainerReader, engine.LoadImageTimeout); err != nil {
ctx, cancel := context.WithTimeout(context.TODO(), engine.LoadImageTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to create a new context here, using context.TODO(). You should instead wire it from the "app" package (via loader. LoadImage)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sounds good.

@@ -123,7 +124,9 @@ func NewDockerStatsEngine(cfg *config.Config, client ecsengine.DockerClient, con

// synchronizeState goes through all the containers on the instance to synchronize the state on agent start
func (engine *DockerStatsEngine) synchronizeState() error {
listContainersResponse := engine.client.ListContainers(false, ecsengine.ListContainersTimeout)
ctx, cancel := context.WithTimeout(context.Background(), ecsengine.ListContainersTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, you shouldn't need to create a new context. Just use engine.ctx as the parent and let the child context creation happen in ListContainers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no ctx attribute for stats engine, and I think creating a context for just one method seems to be redundant. So instead of creating a timeout context from background context, I will pass the background context directly to the method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create one :) Let it be set from app.ctx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, cool

Copy link
Contributor

@petderek petderek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, thanks for doing this 😄

@haikuoliu haikuoliu force-pushed the branch_context_timeout branch from 25f0391 to ce2ea51 Compare April 11, 2018 03:47
@haikuoliu
Copy link
Contributor Author

Update: add ctx attribute to the stats engine; move the creation of timeout context from caller to docker client API; wire context from caller instead of using context.TODO() or context.Background().

Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I have some comments related to tests.

if err != nil {
return api.ContainerStatusNone, DockerContainerMetadata{Error: CannotDescribeContainerError{err}}
}
return dockerStateToState(dockerContainer.State), metadataFromContainer(dockerContainer)
}

func (dg *dockerGoClient) InspectContainer(dockerID string, timeout time.Duration) (*docker.Container, error) {
// InspectContainer uses the timeout parameter only to assemble the return error message
// TODO: replace the timeout with a string type error message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get rid of this TODO (and others that were added)? It's out of date now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one that I added, we have talked about changing the timeout parameter into some kind of error message, and add a TODO for this purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that was when timeout was only being used for generating error messages. But, now its actually used for creating the context. So, that's not needed anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, you are right!

@@ -957,7 +955,7 @@ func (engine *DockerTaskEngine) buildCNIConfigFromTaskContainer(task *api.Task,
if !ok {
return nil, errors.New("engine: failed to find the pause container")
}
containerInspectOutput, err := engine.client.InspectContainer(pauseContainer.DockerName, inspectContainerTimeout)
containerInspectOutput, err := engine.client.InspectContainer(engine.ctx, pauseContainer.DockerName, inspectContainerTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be split into multiple lines to improve readability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -363,28 +362,19 @@ func TestCreateContainerTimeout(t *testing.T) {
defer done()

warp := make(chan time.Time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is warp used/effective anymore? Can we get rid of that as well (here and in other methods)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used to make the go routine not stop so that we don't have to handle the logic after the mocked function return. I will remove the warp add return value for the mocked function instead.

@@ -479,21 +468,17 @@ func TestStopContainerTimeout(t *testing.T) {
defer done()

warp := make(chan time.Time)
wait := &sync.WaitGroup{}
wait.Add(1)
mockDocker.EXPECT().StopContainerWithContext("id", uint(client.config.DockerStopTimeout/time.Second), gomock.Any()).Do(func(x, y, z interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs .MaxTimes(1) as well. So do all other EXPECT(), where we are actually letting these timeouts time out (wherever xContainerShortTimeout is used).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, added .MaxTimes(1) to all timeout tests in docker client.

}).MaxTimes(1)
metadata := client.CreateContainer(config.Config, nil, config.Name, xContainerShortTimeout)
metadata := client.CreateContainer(context.TODO(), config.Config, nil, config.Name, xContainerShortTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all of the tests, where you have passed context.TODO() directly, please modify those to be something like this:

ctx, cancel := context.WithCancel(context.TODO())
defer cancel()

// pass ctx

Otherwise, we'll have go routine leaks because of uncancelled context in edge cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sure.

imageManager.client.RemoveImage(test1Image1Name, imageRemovalTimeout)
imageManager.client.RemoveImage(test1Image2Name, imageRemovalTimeout)
imageManager.client.RemoveImage(test1Image3Name, imageRemovalTimeout)
imageManager.client.RemoveContainer(context.TODO(), "test1", removeContainerTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just create context once as per my earlier comments and use it everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

imageManager.client.RemoveImage(test2Image1Name, imageRemovalTimeout)
imageManager.client.RemoveImage(test2Image2Name, imageRemovalTimeout)
imageManager.client.RemoveImage(test2Image3Name, imageRemovalTimeout)
imageManager.client.RemoveContainer(context.TODO(), "test1", removeContainerTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just create context once as per my earlier comments and use it everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

client.EXPECT().RemoveImage(container.Image, removeImageTimeout).Return(errors.New("error removing image")).AnyTimes()
imageManager.removeUnusedImages()
client.EXPECT().RemoveImage(gomock.Any(), container.Image, removeImageTimeout).Return(errors.New("error removing image")).AnyTimes()
imageManager.removeUnusedImages(context.TODO())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in the other file about using cancellable context everywhere in this file

@@ -52,7 +53,7 @@ func TestLoadFromFileWithReaderError(t *testing.T) {
mockfs := mock_os.NewMockFileSystem(ctrl)
mockfs.EXPECT().Open(pauseTarballPath).Return(nil, errors.New("Dummy Reader Error"))

err = loadFromFile(pauseTarballPath, client, mockfs)
err = loadFromFile(context.TODO(), pauseTarballPath, client, mockfs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in the other file about using cancellable context here

@@ -86,7 +87,7 @@ func TestStatsEngineWithExistingContainersWithoutHealth(t *testing.T) {

// Simulate container start prior to listener initialization.
time.Sleep(checkPointSleep)
err = engine.MustInit(taskEngine, defaultCluster, defaultContainerInstance)
err = engine.MustInit(context.TODO(), taskEngine, defaultCluster, defaultContainerInstance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in the other file about using cancellable context here

@@ -52,6 +52,7 @@ func TestStatsEngineAddRemoveContainers(t *testing.T) {
mockDockerClient.EXPECT().Stats(gomock.Any(), gomock.Any()).Return(mockStatsChannel, nil).AnyTimes()

engine := NewDockerStatsEngine(&cfg, nil, eventStream("TestStatsEngineAddRemoveContainers"))
engine.ctx = context.TODO()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in the other file about using cancellable context here

@haikuoliu
Copy link
Contributor Author

Update: addressed all comments about the comments and tests.

Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. Still have a couple of comments. Also, can you make sure that conflicts are resolved and all the tests are passing?

@@ -71,7 +71,9 @@ func TestImagePullRemoveDeadlock(t *testing.T) {
}()

go func() {
imageManager.(*dockerImageManager).removeUnusedImages()
ctx, cancel := context.WithCancel(context.TODO())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move the context creation and defer cancel() methods outside of the go routine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -1055,7 +1073,9 @@ func TestConcurrentRemoveUnusedImages(t *testing.T) {
for i := 0; i < numRoutines; i++ {
go func() {
<-ok
imageManager.removeUnusedImages()
ctx, cancel := context.WithCancel(context.TODO())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move the context creation and defer cancel() methods outside of the for loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, here only one context should created and used.

@@ -143,7 +146,7 @@ func TestStatsEngineWithNewContainersWithoutHealth(t *testing.T) {
},
testTask)

err = engine.MustInit(taskEngine, defaultCluster, defaultContainerInstance)
err = engine.MustInit(context.TODO(), taskEngine, defaultCluster, defaultContainerInstance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use a cancellable context here similar to other test files (applies throughout this file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, there are too many context creations in the test, I missed some, I will double check.

@haikuoliu haikuoliu force-pushed the branch_context_timeout branch from d073713 to 7fc5ffe Compare April 13, 2018 01:17
@haikuoliu
Copy link
Contributor Author

Update: minor refactor and resolve the conflicts, all tests (including functional tests and integration tests) are passed.

Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super glad that we paid this particular tech debt! :shipit: 🚀 🚀

@haikuoliu haikuoliu merged commit b251bca into aws:dev Apr 13, 2018
@haikuoliu
Copy link
Contributor Author

Just pushed the commit, thank you guys!👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants