-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinct startContainerTimouts for windows/linux #1321
Conversation
agent/config/config.go
Outdated
parsedStartTimeout := parseEnvVariableDuration("ECS_CONTAINER_START_TIMEOUT") | ||
if parsedStartTimeout >= minimumContainerStartTimeout { | ||
containerStartTimeout = parsedStartTimeout | ||
} else if parsedStartTimeout != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the check parsedStartTimeout != 0
needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parsedStartTimeout
is initialized by calling parseEnvVariableDuration
. parseEnvVariableDuration
will parse the ECS_CONTAINER_START_TIMEOUT
, if ECS_CONTAINER_START_TIMEOUT
is in the correct format, then the function will return the parsed time duration, otherwise it will return a time duration of 0s, and log a warning at the same time. So if parsedStartTimeout
is 0s (note that parsedStartTimeout == 0
will return true if parsedStartTimeout
is 0s), then we don't need to log it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't quite understand what's happening here. Might be simpler to do something like this:
if parsedStartTimeout < minimumContainerStartTimeout {
log.Infof(...)
parsedStartTimeout = parsedStartTimeout
}
return containerStartTimeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things will go wrong when ECS_CONTAINER_START_TIMEOUT
is set to be 0s, I will delete the parsedStartTimeout != 0
, thanks!
@@ -779,6 +779,27 @@ func TestDockerStopTimeout(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestContainerStartTimeout(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this test is necessary since this is already tested in config_test.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will remove this test then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please update README file as well?
CHANGELOG.md
Outdated
@@ -1,6 +1,7 @@ | |||
# Changelog | |||
|
|||
## 1.17.3-dev | |||
* Enhancement - Distinct startContainerTimouts for windows/linux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also state that a new config variable for this has been introduced as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sure, I will update both the change log and README file
agent/config/config.go
Outdated
@@ -563,7 +577,11 @@ func (cfg *Config) validateAndOverrideBounds() error { | |||
} | |||
|
|||
if cfg.DockerStopTimeout < minimumDockerStopTimeout { | |||
return fmt.Errorf("Invalid negative DockerStopTimeout: %v", cfg.DockerStopTimeout.String()) | |||
return fmt.Errorf("Invalid DockerStopTimeout: %v", cfg.DockerStopTimeout.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please follow the convention on error strings from https://github.com/golang/go/wiki/CodeReviewComments#error-strings (here and everywhere else)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for the reference, I will revise it accordingly.
agent/config/config.go
Outdated
parsedStartTimeout := parseEnvVariableDuration("ECS_CONTAINER_START_TIMEOUT") | ||
if parsedStartTimeout >= minimumContainerStartTimeout { | ||
containerStartTimeout = parsedStartTimeout | ||
} else if parsedStartTimeout != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't quite understand what's happening here. Might be simpler to do something like this:
if parsedStartTimeout < minimumContainerStartTimeout {
log.Infof(...)
parsedStartTimeout = parsedStartTimeout
}
return containerStartTimeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also make sure that copyright headers of all the files you've touched has year set to 2018?
README.md
Outdated
@@ -162,6 +162,7 @@ additional details on each available environment variable. | |||
| `ECS_APPARMOR_CAPABLE` | `true` | Whether AppArmor is available on the container instance. | `false` | `false` | | |||
| `ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION` | 10m | Time to wait to delete containers for a stopped task. If set to less than 1 minute, the value is ignored. | 3h | 3h | | |||
| `ECS_CONTAINER_STOP_TIMEOUT` | 10m | Time to wait for the container to exit normally before being forcibly killed. | 30s | 30s | | |||
| `ECS_CONTAINER_START_TIMEOUT` | 10m | Time to wait for the container to start normally before being forcibly killed. | 3m | 8m | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time to wait for the container to start normally before being forcibly killed
is a bit misleading. How about Timeout before giving up on starting a container
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sounds good.
seelog.Warnf("Discarded invalid value for docker stop timeout, parsed as: %v", parsedStopTimeout) | ||
} | ||
return dockerStopTimeout | ||
} | ||
|
||
func getContainerStartTimeout() time.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still not clear to me what's happening in this method. If I specify the value as "-1m", this method returns 0
, correct? IMO, this method should never return a value that is less than minimumContainerStartTimeout
. Can you please modify this to be something like this: #1321 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will make the change according to that comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update and add comments to getContainerStartTimeout and getDockerStopTimeout functions.
agent/config/config.go
Outdated
@@ -563,7 +577,11 @@ func (cfg *Config) validateAndOverrideBounds() error { | |||
} | |||
|
|||
if cfg.DockerStopTimeout < minimumDockerStopTimeout { | |||
return fmt.Errorf("Invalid negative DockerStopTimeout: %v", cfg.DockerStopTimeout.String()) | |||
return fmt.Errorf("invalid DockerStopTimeout: %v", cfg.DockerStopTimeout.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this more human readable? "config: invalid value for stop timeout..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sure
agent/config/config.go
Outdated
} | ||
|
||
if cfg.ContainerStartTimeout < minimumContainerStartTimeout { | ||
return fmt.Errorf("invalid ContainerStartTimeout: %v", cfg.ContainerStartTimeout.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this more human readable? "config: invalid value for container start timeout..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
agent/config/config_test.go
Outdated
assert.Zero(t, conf.ContainerStartTimeout) | ||
} | ||
|
||
func TestInvalidContainerStartTimeout(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we actually need this test as it does not add much value. Creating a config object should always be from the DefaultConfig() or envConfig() or other methods similar to those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this test will be removed.
CHANGELOG.md
Outdated
@@ -1,6 +1,7 @@ | |||
# Changelog | |||
|
|||
## 1.17.3-dev | |||
* Enhancement - Distinct startContainerTimouts for windows/linux, introduce a new environment variable `ECS_CONTAINER_START_TIMEOUT` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"introduce a new environment variable ECS_CONTAINER_START_TIMEOUT
to make it configurable" - might be better
nit: typo startContainerTimeouts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
agent/engine/engine_integ_test.go
Outdated
@@ -41,6 +41,7 @@ import ( | |||
|
|||
const ( | |||
testDockerStopTimeout = 2 * time.Second | |||
testContainerStartTimeout = 4 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sure
agent/config/config_unix.go
Outdated
@@ -31,6 +32,10 @@ const ( | |||
// Default cgroup memory system root path, this is the default used if the | |||
// path has not been configured through ECS_CGROUP_PATH | |||
defaultCgroupPath = "/sys/fs/cgroup" | |||
// DefaultContainerStartTimeout specifies the value for container start timeout duration | |||
DefaultContainerStartTimeout = 3 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a private variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will make it private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test the start container time cost on windows instances with various parameters (instance type, container memory, container cpu, etc.), and set the default startContainerTimeouts value for windows accordingly. Also make the startContainerTimeouts configurable (via ECS_CONTAINER_START_TIMEOUT environment variable) by users.
3fc6520
to
291b5cb
Compare
Summary
This PR aims to resolve issue #1243.
Test the start container time cost on windows instances with various parameters (instance type, container memory, container cpu, etc.), set the DefaultContainerStartTimeout and minimumContainerStartTimeout (the purpose of having this value is to be consistent with current design, for more info, please refer to minimumDockerStopTimeout in agent/config/config.go) for windows and minimumContainerStartTimeout for linux accordingly.
Also make the ContainerStartTimeout configurable (via ECS_CONTAINER_START_TIMEOUT environment variable) by users.
Implementation details
Remove the startContainerTimeout in the engine package, add ContainerStartTimeout in config package with distincted default value for windows and linux, initialize ContainerStartTimeout from ECS_CONTAINER_START_TIMEOUT environment variable.
The details of the tests are listed below:
The experiment is basically run a microsoft/windowsservercore image and record the executing time of calling the start container function. Below is some stats data collected from tests running on two types of windows ECS instances:
For instance with type of t2.2xlarge (8vCpus, 32GB Memory):
For instance with type of t2.small (1vCpu, 2GB Memory):
According to the test results, the DefaultContainerStartTimeout is set to be 8min (we see the data point greater than 10min as an outlier here), and minimumContainerStartTimeout is set to be 2min for windows.
As for Linux, the minimumContainerStartTimeout is evaluated to be 0.75min by doing a simple math: 8/2=3/x, where x is the minimumContainerStartTimeout.
Testing
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=25s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes: yes
Description for the changelog
Distinct startContainerTimouts for windows/linux, introduce a new environment variable
ECS_CONTAINER_START_TIMEOUT
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.