-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
engine: wire context into methods that invoke Docker API call #1329
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this!
agent/engine/docker_client.go
Outdated
ctx, cancel := context.WithTimeout(context.TODO(), timeout) | ||
defer cancel() | ||
|
||
timeout time.Duration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this 'timeout' we have the context. Its only used in a log function -- lets remove it from the function signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When returning a timeout error, the timeout value is needed, how to get this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really want to print the timeout as part of the error, we should assemble it in the caller (which already has both of these pieces of information).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think logging the timeout is important, because most of the timeout values are not defined by users, so they have no idea how long they've waited for the API call in the most of the time.
But assembling the error from the caller requires at least three lines of the same code, and this same piece of code will be spread wherever the Dock client API is called, and will require more code change, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I looked at the calling class (which is embedded in the transition function map) and I don't think it makes sense to change that at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, cool, I'll make the change accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, lets update the documentation for all of these methods that timeout
is only used to generate the error and shouldn't be used for anything else. We can also add a TODO
for changing these methods to instead accept an error type or error string as a param to use when timeout error occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sounds good.
agent/engine/docker_client.go
Outdated
defer cancel() | ||
|
||
timeout time.Duration, | ||
ctx context.Context) DockerContainerMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think switching on this error type is a good abstraction. The caller of this function holds onto the context -- they already know if the context was cancelled instead of timed out.
We should just return the same kind of error here -- that will simplify your testing in this case.
case <-ctx.Done():
err := ctx.Err()
if err == context.DeadlineExceeded {
return DockerContainerMetadata{Error: &DockerTimeoutError{timeout, "created"}}
}
return DockerContainerMetadata{Error: &CannotCreateContainerError{err}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable, so can I just return the timeout error here without checking the error type from the context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precisely!
agent/engine/docker_client.go
Outdated
StartContainer(string, time.Duration) DockerContainerMetadata | ||
// StartContainer starts the container identified by the name provided. A timeout value and a context should be | ||
// provided for the request. | ||
StartContainer(string, time.Duration, context.Context) DockerContainerMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is all repetitive as far as I can tell. Lets get rid of time.Duration in all of the signatures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the comments above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain which go routine won't return?
agent/engine/docker_client_test.go
Outdated
defer mockCtrl.Finish() | ||
mockDone := make(chan struct{}) | ||
mockContext := NewMockContext(mockCtrl) | ||
mockContext.EXPECT().Done().Return(mockDone).AnyTimes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why those are AnyTimes
, shouldn't this be called just once or twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be called twice, because the CreateContainer
will also call InspectContainer
where a new child context will be generated, but test using mock context will be deleted anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion: The number of times context is invoked is an implementation detail and not part of tested function's contract. IE, its not something we would want the test to fail for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with Derek's opinion, I didn't imagine that I may have this kind of problem before implementation.
Situation when the go routine problem happens in test: In The previous implementation is use a channel to make the |
5319a93
to
26a5aaf
Compare
Update: use real context in the test; for docker api call, return timeout error directly when context is done |
CHANGELOG.md
Outdated
@@ -1,6 +1,7 @@ | |||
# Changelog | |||
|
|||
## 1.17.3-dev | |||
* Enhancement - Wire context into methods that invoke Docker API call to fix flaky tests [#1329](https://github.com/aws/amazon-ecs-agent/pull/1321) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we normally update Changelog for internal test fixes like this. Also, 1.17.3 just got released so it shouldn't be under 1.17.3-dev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, only things that impact the customer go into the changelog. I don't think this warrants an entry in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will remove it.
agent/containermetadata/manager.go
Outdated
@@ -117,8 +118,10 @@ func (manager *metadataManager) Create(config *docker.Config, hostConfig *docker | |||
|
|||
// Update updates the metadata file after container starts and dynamic metadata is available | |||
func (manager *metadataManager) Update(dockerID string, task *api.Task, containerName string) error { | |||
ctx, cancel := context.WithTimeout(context.TODO(), inspectContainerTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should modify Update(
to expect an arg for context.Context
as well. The DockerTaskEngine
struct already has a context, which it can pass in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will change accordingly.
agent/engine/docker_client.go
Outdated
ctx, cancel := context.WithTimeout(context.TODO(), timeout) | ||
defer cancel() | ||
|
||
timeout time.Duration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, lets update the documentation for all of these methods that timeout
is only used to generate the error and shouldn't be used for anything else. We can also add a TODO
for changing these methods to instead accept an error type or error string as a param to use when timeout error occurs.
agent/engine/docker_client.go
Outdated
// Context was canceled even though there was no timeout. Send | ||
// back an error. | ||
return DockerContainerMetadata{Error: &CannotCreateContainerError{err}} | ||
// Context has expired, send back the DockerTimeoutError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this modified? I think there's a case to be made for the previous code, where we distinguish between cancelled and timed out contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Derek said, when the caller wants to cancel the context, the caller already know the reason, the logic to handle the timeout or cancel is nothing but report the reason of the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not strictly true. If you look at https://github.com/aws/amazon-ecs-agent/pull/1329/files#diff-b9aa12c3425c34d6f330f28407f779f6R835, you'll see that the caller (owner of the context) is not the entity that's dealing with the error. I think there's value in the way we were previously handling errors until we figure out a better alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, cool, will change accordingly.
agent/engine/docker_client.go
Outdated
StopContainer(string, time.Duration) DockerContainerMetadata | ||
// StopContainer stops the container identified by the name provided. A timeout value and a context should be provided | ||
// for the request. | ||
StopContainer(string, time.Duration, context.Context) DockerContainerMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, and in all methods, please make sure that context.Context
is the first parameter to all the methods (which is the convention in golang)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
26a5aaf
to
0d56a84
Compare
Update: some function signatures and comments of Docker client API, change log, Update function in manager.go |
agent/engine/docker_client_test.go
Outdated
@@ -361,18 +360,20 @@ func TestCreateContainerTimeout(t *testing.T) { | |||
mockDocker, client, _, _, _, done := dockerClientSetup(t) | |||
defer done() | |||
|
|||
warp := make(chan time.Time) | |||
ctx, cancel := context.WithTimeout(context.TODO(), xContainerLongTimeout) | |||
defer cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we expect the mock call to happen and we do a cancel()
there, this cancel()
can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will remove it.
agent/engine/docker_client.go
Outdated
if err == context.DeadlineExceeded { | ||
return &DockerTimeoutError{removeContainerTimeout, "removing"} | ||
} | ||
return &CannotRemoveContainerError{err} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are not using this error type anymore, it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving it some thought, you can capture the error here and do
if err != nil {
return &CannotRemoveContainerError{err}
}
0d56a84
to
25f0391
Compare
Update: as suggested by @aaithal, add if check for timeout error back in docker client, and go with the old timeout tests because we don't want to simulate context timeout error using cancel. Also rebase the dev branch. |
agent/engine/docker_task_engine.go
Outdated
@@ -346,8 +348,9 @@ func (engine *DockerTaskEngine) synchronizeContainerStatus(container *api.Docker | |||
} | |||
return | |||
} | |||
|
|||
currentState, metadata := engine.client.DescribeContainer(container.DockerID) | |||
ctx, cancel := context.WithCancel(engine.ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DescribeContainer
is basically calling InspectContainer
, you may need to pass inspectContainerTimeout
to the context. Same below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the update for this PR is done, there is no need to pass the inspectContainerTimeout anymore.
agent/engine/docker_client.go
Outdated
LoadImage(io.Reader, time.Duration) error | ||
// value and a context should be provided for the request. | ||
RemoveImage(context.Context, string, time.Duration) error | ||
LoadImage(context.Context, io.Reader, time.Duration) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: This'd also be a good opportunity add documentation for LoadImage
, which is missing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, added.
agent/engine/docker_client.go
Outdated
@@ -471,18 +472,13 @@ func (dg *dockerGoClient) getAuthdata(image string, authData *ecr.RegistryAuthen | |||
return authConfig, nil | |||
} | |||
|
|||
func (dg *dockerGoClient) CreateContainer(config *docker.Config, | |||
// The timeout parameter here is only used to assemble the return error message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lint: Comment should start with // CreateContainer ...
This applies to multiple methods in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done.
agent/engine/docker_image_manager.go
Outdated
@@ -345,7 +345,9 @@ func (imageManager *dockerImageManager) deleteImage(imageID string, imageState * | |||
return | |||
} | |||
seelog.Infof("Removing Image: %s", imageID) | |||
err := imageManager.client.RemoveImage(imageID, removeImageTimeout) | |||
ctx, cancel := context.WithTimeout(context.Background(), removeImageTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't be using context.Background()
here. Instead, it should be wired all the way from performPeriodicImageCleanup()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, in the RemoveImage function of docker client, it uses context.Background() directly,
so I use context.Background() in the caller, you want me to do the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please
agent/engine/docker_task_engine.go
Outdated
@@ -346,8 +348,9 @@ func (engine *DockerTaskEngine) synchronizeContainerStatus(container *api.Docker | |||
} | |||
return | |||
} | |||
|
|||
currentState, metadata := engine.client.DescribeContainer(container.DockerID) | |||
ctx, cancel := context.WithCancel(engine.ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you doing this? Can't you just use engine.ctx
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change them all to engine.ctx.
agent/engine/docker_task_engine.go
Outdated
@@ -392,7 +395,9 @@ func (engine *DockerTaskEngine) CheckTaskState(task *api.Task) { | |||
if !ok { | |||
continue | |||
} | |||
status, metadata := engine.client.DescribeContainer(dockerContainer.DockerID) | |||
ctx, cancel := context.WithCancel(engine.ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above. Can't you just use engine.ctx
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments above.
agent/engine/docker_task_engine.go
Outdated
@@ -825,8 +830,9 @@ func (engine *DockerTaskEngine) createContainer(task *api.Task, container *api.C | |||
task.Arn, container.Name, mderr) | |||
} | |||
} | |||
|
|||
metadata := client.CreateContainer(config, hostConfig, dockerContainerName, createContainerTimeout) | |||
ctx, cancel := context.WithTimeout(engine.ctx, createContainerTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an optimal way of doing this. All of the responsibility of creating the child context for timing out should be happening within the docker_client methods. Please change all of these methods so that context creation happens within docker_client methods. Just that the parent for those contexts need to flow from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments above.
agent/eni/pause/pause_linux.go
Outdated
@@ -48,7 +49,9 @@ func loadFromFile(path string, dockerClient engine.DockerClient, fs os.FileSyste | |||
return errors.Wrapf(err, | |||
"pause container load: failed to read pause container image: %s", path) | |||
} | |||
if err := dockerClient.LoadImage(pauseContainerReader, engine.LoadImageTimeout); err != nil { | |||
ctx, cancel := context.WithTimeout(context.TODO(), engine.LoadImageTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to create a new context here, using context.TODO()
. You should instead wire it from the "app" package (via loader. LoadImage)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, sounds good.
agent/stats/engine.go
Outdated
@@ -123,7 +124,9 @@ func NewDockerStatsEngine(cfg *config.Config, client ecsengine.DockerClient, con | |||
|
|||
// synchronizeState goes through all the containers on the instance to synchronize the state on agent start | |||
func (engine *DockerStatsEngine) synchronizeState() error { | |||
listContainersResponse := engine.client.ListContainers(false, ecsengine.ListContainersTimeout) | |||
ctx, cancel := context.WithTimeout(context.Background(), ecsengine.ListContainersTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too, you shouldn't need to create a new context. Just use engine.ctx
as the parent and let the child context creation happen in ListContainers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no ctx attribute for stats engine, and I think creating a context for just one method seems to be redundant. So instead of creating a timeout context from background context, I will pass the background context directly to the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create one :) Let it be set from app.ctx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, cool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, thanks for doing this 😄
25f0391
to
ce2ea51
Compare
Update: add ctx attribute to the stats engine; move the creation of timeout context from caller to docker client API; wire context from caller instead of using context.TODO() or context.Background(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. I have some comments related to tests.
agent/engine/docker_client.go
Outdated
if err != nil { | ||
return api.ContainerStatusNone, DockerContainerMetadata{Error: CannotDescribeContainerError{err}} | ||
} | ||
return dockerStateToState(dockerContainer.State), metadataFromContainer(dockerContainer) | ||
} | ||
|
||
func (dg *dockerGoClient) InspectContainer(dockerID string, timeout time.Duration) (*docker.Container, error) { | ||
// InspectContainer uses the timeout parameter only to assemble the return error message | ||
// TODO: replace the timeout with a string type error message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you get rid of this TODO (and others that were added)? It's out of date now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one that I added, we have talked about changing the timeout parameter into some kind of error message, and add a TODO for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that was when timeout was only being used for generating error messages. But, now its actually used for creating the context. So, that's not needed anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, you are right!
agent/engine/docker_task_engine.go
Outdated
@@ -957,7 +955,7 @@ func (engine *DockerTaskEngine) buildCNIConfigFromTaskContainer(task *api.Task, | |||
if !ok { | |||
return nil, errors.New("engine: failed to find the pause container") | |||
} | |||
containerInspectOutput, err := engine.client.InspectContainer(pauseContainer.DockerName, inspectContainerTimeout) | |||
containerInspectOutput, err := engine.client.InspectContainer(engine.ctx, pauseContainer.DockerName, inspectContainerTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can be split into multiple lines to improve readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
agent/engine/docker_client_test.go
Outdated
@@ -363,28 +362,19 @@ func TestCreateContainerTimeout(t *testing.T) { | |||
defer done() | |||
|
|||
warp := make(chan time.Time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is warp
used/effective anymore? Can we get rid of that as well (here and in other methods)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used to make the go routine not stop so that we don't have to handle the logic after the mocked function return. I will remove the warp add return value for the mocked function instead.
agent/engine/docker_client_test.go
Outdated
@@ -479,21 +468,17 @@ func TestStopContainerTimeout(t *testing.T) { | |||
defer done() | |||
|
|||
warp := make(chan time.Time) | |||
wait := &sync.WaitGroup{} | |||
wait.Add(1) | |||
mockDocker.EXPECT().StopContainerWithContext("id", uint(client.config.DockerStopTimeout/time.Second), gomock.Any()).Do(func(x, y, z interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs .MaxTimes(1)
as well. So do all other EXPECT()
, where we are actually letting these timeouts time out (wherever xContainerShortTimeout
is used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, added .MaxTimes(1)
to all timeout tests in docker client.
agent/engine/docker_client_test.go
Outdated
}).MaxTimes(1) | ||
metadata := client.CreateContainer(config.Config, nil, config.Name, xContainerShortTimeout) | ||
metadata := client.CreateContainer(context.TODO(), config.Config, nil, config.Name, xContainerShortTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all of the tests, where you have passed context.TODO()
directly, please modify those to be something like this:
ctx, cancel := context.WithCancel(context.TODO())
defer cancel()
// pass ctx
Otherwise, we'll have go routine leaks because of uncancelled context in edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sure.
imageManager.client.RemoveImage(test1Image1Name, imageRemovalTimeout) | ||
imageManager.client.RemoveImage(test1Image2Name, imageRemovalTimeout) | ||
imageManager.client.RemoveImage(test1Image3Name, imageRemovalTimeout) | ||
imageManager.client.RemoveContainer(context.TODO(), "test1", removeContainerTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just create context once as per my earlier comments and use it everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
imageManager.client.RemoveImage(test2Image1Name, imageRemovalTimeout) | ||
imageManager.client.RemoveImage(test2Image2Name, imageRemovalTimeout) | ||
imageManager.client.RemoveImage(test2Image3Name, imageRemovalTimeout) | ||
imageManager.client.RemoveContainer(context.TODO(), "test1", removeContainerTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just create context once as per my earlier comments and use it everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
client.EXPECT().RemoveImage(container.Image, removeImageTimeout).Return(errors.New("error removing image")).AnyTimes() | ||
imageManager.removeUnusedImages() | ||
client.EXPECT().RemoveImage(gomock.Any(), container.Image, removeImageTimeout).Return(errors.New("error removing image")).AnyTimes() | ||
imageManager.removeUnusedImages(context.TODO()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in the other file about using cancellable context everywhere in this file
agent/eni/pause/pause_linux_test.go
Outdated
@@ -52,7 +53,7 @@ func TestLoadFromFileWithReaderError(t *testing.T) { | |||
mockfs := mock_os.NewMockFileSystem(ctrl) | |||
mockfs.EXPECT().Open(pauseTarballPath).Return(nil, errors.New("Dummy Reader Error")) | |||
|
|||
err = loadFromFile(pauseTarballPath, client, mockfs) | |||
err = loadFromFile(context.TODO(), pauseTarballPath, client, mockfs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in the other file about using cancellable context here
agent/stats/engine_integ_test.go
Outdated
@@ -86,7 +87,7 @@ func TestStatsEngineWithExistingContainersWithoutHealth(t *testing.T) { | |||
|
|||
// Simulate container start prior to listener initialization. | |||
time.Sleep(checkPointSleep) | |||
err = engine.MustInit(taskEngine, defaultCluster, defaultContainerInstance) | |||
err = engine.MustInit(context.TODO(), taskEngine, defaultCluster, defaultContainerInstance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in the other file about using cancellable context here
agent/stats/engine_test.go
Outdated
@@ -52,6 +52,7 @@ func TestStatsEngineAddRemoveContainers(t *testing.T) { | |||
mockDockerClient.EXPECT().Stats(gomock.Any(), gomock.Any()).Return(mockStatsChannel, nil).AnyTimes() | |||
|
|||
engine := NewDockerStatsEngine(&cfg, nil, eventStream("TestStatsEngineAddRemoveContainers")) | |||
engine.ctx = context.TODO() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in the other file about using cancellable context here
Update: addressed all comments about the comments and tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there. Still have a couple of comments. Also, can you make sure that conflicts are resolved and all the tests are passing?
@@ -71,7 +71,9 @@ func TestImagePullRemoveDeadlock(t *testing.T) { | |||
}() | |||
|
|||
go func() { | |||
imageManager.(*dockerImageManager).removeUnusedImages() | |||
ctx, cancel := context.WithCancel(context.TODO()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move the context creation and defer cancel()
methods outside of the go routine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
@@ -1055,7 +1073,9 @@ func TestConcurrentRemoveUnusedImages(t *testing.T) { | |||
for i := 0; i < numRoutines; i++ { | |||
go func() { | |||
<-ok | |||
imageManager.removeUnusedImages() | |||
ctx, cancel := context.WithCancel(context.TODO()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move the context creation and defer cancel()
methods outside of the for
loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, here only one context should created and used.
agent/stats/engine_integ_test.go
Outdated
@@ -143,7 +146,7 @@ func TestStatsEngineWithNewContainersWithoutHealth(t *testing.T) { | |||
}, | |||
testTask) | |||
|
|||
err = engine.MustInit(taskEngine, defaultCluster, defaultContainerInstance) | |||
err = engine.MustInit(context.TODO(), taskEngine, defaultCluster, defaultContainerInstance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use a cancellable context here similar to other test files (applies throughout this file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, there are too many context creations in the test, I missed some, I will double check.
d073713
to
7fc5ffe
Compare
Update: minor refactor and resolve the conflicts, all tests (including functional tests and integration tests) are passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super glad that we paid this particular tech debt! 🚀 🚀
Just pushed the commit, thank you guys!👍 |
This pr is to fix issue #1212.
Summary
Pass the context into the Docker API call instead of creating context inside Docker API call.
Testing
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=25s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes: yes
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.