-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout for Docker Version API #1363
Conversation
Updated go-dockerclient to get the VersionWithContext() method, which helps with adding timeouts against the Docker Version() API, which can occasionally timeout.
This change adds a timeout for Docker client's Version API. Additionally, the Docker version returned in saved in the task engine. This helps with avoiding additional unnecessarily calls to the Docker Version API everytime the agent connects to the backend. This should help with issues seen in aws#1331 Finally, the dockerclient package has been refactored to extract all timeout related constants to dockerclient package from the dockerapi package to avoid circular dependencies.
@@ -292,7 +292,6 @@ | |||
name = "golang.org/x/net" | |||
packages = [ | |||
"context", | |||
"context/ctxhttp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice, I think go docker client finally moved to builtin context
. Out of this PR: probably we can get rid of golang.org/x/
dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of this PR: probably we can get rid of
golang.org/x/
dependency?
Not sure, there are many sub-packages that we have a dependency on. Which one in particular you want to get rid of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the golang.org/x/net
can be removed.
agent/engine/docker_task_engine.go
Outdated
@@ -1126,7 +1131,18 @@ func (engine *DockerTaskEngine) State() dockerstate.TaskEngineState { | |||
|
|||
// Version returns the underlying docker version. | |||
func (engine *DockerTaskEngine) Version() (string, error) { | |||
return engine.client.Version() | |||
if engine.dockerVersion != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we need a lock for this method? Also, do we need a retry for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs a retry. Looking at the usage, there's a remote possibility of a race condition. I'll add a scoped down mutex just for this for now. We can expand it's scope to the whole of DockerTaskEngine
if needed.
client, err := dg.dockerClient() | ||
if err != nil { | ||
return "", err | ||
} | ||
info, err := client.Version() | ||
info, err := client.VersionWithContext(derivedCtx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not listening to the ctx.Done
channel like the other Docker APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other APIs require special error handling on timeouts. They all require the output from Docker APIs to be massaged in some way and are invoked in an internal async method. Version()
doesn't need any such semantics.
// if the client version returns a MinAPIVersion and APIVersion, then use it to return | ||
// all the Docker clients between MinAPIVersion and APIVersion, else try pinging | ||
// the clients in getKnownAPIVersions | ||
var minAPIVersion, apiVersion string | ||
// get a Docker client with the default supported version | ||
client, err := newVersionedClient(endpoint, string(minDockerAPIVersion)) | ||
if err == nil { | ||
clientVersion, err := client.Version() | ||
derivedCtx, cancel := context.WithTimeout(ctx, dockerclient.VersionTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't just call client.Version
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Version()
is defined on dockerGoClient
. This one is dockeriface.Client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not calling dockerGoClient
's Version
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's not there. dockerGoClient
belongs to the engine
package and is only used by DockerTaskEngine
CHANGELOG.md
Outdated
@@ -1,5 +1,8 @@ | |||
# Changelog | |||
|
|||
## 1.17.4-dev | |||
* Bug - Fixed a bug where Docker Version() API takes never returns by adding a timeout [#1363](https://github.com/aws/amazon-ecs-agent/pull/1363) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: API never returns by..
Moved the logic of caching Docker daemon version string out of the DockerTaskEngine to dockerGoClient struct. This makes it much more reusable across the agent. Also, got rid of the unused SetDockerClient() method in the DockerTaskEngine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -1,5 +1,8 @@ | |||
# Changelog | |||
|
|||
## 1.17.4-dev | |||
* Bug - Fixed a bug where Docker Version() API never returns by adding a timeout [#1363](https://github.com/aws/amazon-ecs-agent/pull/1363) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put Version() in `` ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters. I can modify it if you really insist (prefer not to 😃 ).
// if the client version returns a MinAPIVersion and APIVersion, then use it to return | ||
// all the Docker clients between MinAPIVersion and APIVersion, else try pinging | ||
// the clients in getKnownAPIVersions | ||
var minAPIVersion, apiVersion string | ||
// get a Docker client with the default supported version | ||
client, err := newVersionedClient(endpoint, string(minDockerAPIVersion)) | ||
if err == nil { | ||
clientVersion, err := client.Version() | ||
derivedCtx, cancel := context.WithTimeout(ctx, dockerclient.VersionTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not calling dockerGoClient
's Version
instead?
Summary
Add timeout for Docker Version API
Implementation details
b1dae3d dockerclient: add timeout for Version API
This change adds a timeout for Docker client's Version API.
Additionally, the Docker version returned in saved in the task engine.
This helps with avoiding additional unnecessarily calls to the Docker
Version API everytime the agent connects to the backend. This should
help with issues seen in ecs-agent 1.17.2 stop connection to AWS and start reporting expired credential to task #1331
Finally, the dockerclient package has been refactored to extract all
timeout related constants to dockerclient package from the dockerapi
package to avoid circular dependencies.
63f0650 updated go-dickerclient for VersionWithContext
Updated go-dockerclient to get the VersionWithContext() method, which
helps with adding timeouts against the Docker Version() API, which can
occasionally timeout.
Testing
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=25s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes:
Description for the changelog
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.