Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinct startContainerTimouts for windows/linux #1321

Merged
merged 1 commit into from
Mar 30, 2018

Conversation

haikuoliu
Copy link
Contributor

@haikuoliu haikuoliu commented Mar 27, 2018

Summary

This PR aims to resolve issue #1243.

Test the start container time cost on windows instances with various parameters (instance type, container memory, container cpu, etc.), set the DefaultContainerStartTimeout and minimumContainerStartTimeout (the purpose of having this value is to be consistent with current design, for more info, please refer to minimumDockerStopTimeout in agent/config/config.go) for windows and minimumContainerStartTimeout for linux accordingly.

Also make the ContainerStartTimeout configurable (via ECS_CONTAINER_START_TIMEOUT environment variable) by users.

Implementation details

Remove the startContainerTimeout in the engine package, add ContainerStartTimeout in config package with distincted default value for windows and linux, initialize ContainerStartTimeout from ECS_CONTAINER_START_TIMEOUT environment variable.

The details of the tests are listed below:
The experiment is basically run a microsoft/windowsservercore image and record the executing time of calling the start container function. Below is some stats data collected from tests running on two types of windows ECS instances:

For instance with type of t2.2xlarge (8vCpus, 32GB Memory):

Test No Container Type (n MB Memory, m CPU units) Number of Containers Min Time cost Max Time cost
1 500, 5 10 17.8625275s 47.4986617s
2 500, 50 10 16.2744884s 29.9281325s
3 500, 5 50 16.3174464s 1m4.6216574s
4 500, 50 50 15.4874602s 2m27.371695s

For instance with type of t2.small (1vCpu, 2GB Memory):

Test No Container Type (n MB Memory, m CPU units) Number of Containers Min Time cost Max Time cost
1 500, 5 2 2m6.280508s 4m54.6708702s
2 500, 50 2 581.9843ms 26.2772675s
3 200, 5 5 1m56.2047989s >10min
4 200, 10 5 1m42.5461838s 3m58.6534033s
5 200, 20 5 5m12.3617754s 6m42.608242s
6 200, 30 5 51.607131s 1m11.1757965s
7 200, 40 5 36.9133632s 1m30.3344617s
8 200, 50 5 25.5652818s 30.2141681s

According to the test results, the DefaultContainerStartTimeout is set to be 8min (we see the data point greater than 10min as an outlier here), and minimumContainerStartTimeout is set to be 2min for windows.
As for Linux, the minimumContainerStartTimeout is evaluated to be 0.75min by doing a simple math: 8/2=3/x, where x is the minimumContainerStartTimeout.

Testing

  • Builds on Linux (make release)
  • Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
  • Unit tests on Linux (make test) pass
  • Unit tests on Windows (go test -timeout=25s ./agent/...) pass
  • Integration tests on Linux (make run-integ-tests) pass
  • Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
  • Functional tests on Linux (make run-functional-tests) pass
  • Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass

New tests cover the changes: yes

Description for the changelog

Distinct startContainerTimouts for windows/linux, introduce a new environment variable ECS_CONTAINER_START_TIMEOUT

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@haikuoliu haikuoliu added this to the 1.17.3 milestone Mar 27, 2018
parsedStartTimeout := parseEnvVariableDuration("ECS_CONTAINER_START_TIMEOUT")
if parsedStartTimeout >= minimumContainerStartTimeout {
containerStartTimeout = parsedStartTimeout
} else if parsedStartTimeout != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the check parsedStartTimeout != 0 needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parsedStartTimeout is initialized by calling parseEnvVariableDuration. parseEnvVariableDuration will parse the ECS_CONTAINER_START_TIMEOUT, if ECS_CONTAINER_START_TIMEOUT is in the correct format, then the function will return the parsed time duration, otherwise it will return a time duration of 0s, and log a warning at the same time. So if parsedStartTimeout is 0s (note that parsedStartTimeout == 0 will return true if parsedStartTimeout is 0s), then we don't need to log it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't quite understand what's happening here. Might be simpler to do something like this:

if parsedStartTimeout < minimumContainerStartTimeout {
 log.Infof(...) 
 parsedStartTimeout = parsedStartTimeout
}

return containerStartTimeout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things will go wrong when ECS_CONTAINER_START_TIMEOUT is set to be 0s, I will delete the parsedStartTimeout != 0, thanks!

@@ -779,6 +779,27 @@ func TestDockerStopTimeout(t *testing.T) {
}
}

func TestContainerStartTimeout(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test is necessary since this is already tested in config_test.go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will remove this test then.

Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update README file as well?

CHANGELOG.md Outdated
@@ -1,6 +1,7 @@
# Changelog

## 1.17.3-dev
* Enhancement - Distinct startContainerTimouts for windows/linux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also state that a new config variable for this has been introduced as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sure, I will update both the change log and README file

@@ -563,7 +577,11 @@ func (cfg *Config) validateAndOverrideBounds() error {
}

if cfg.DockerStopTimeout < minimumDockerStopTimeout {
return fmt.Errorf("Invalid negative DockerStopTimeout: %v", cfg.DockerStopTimeout.String())
return fmt.Errorf("Invalid DockerStopTimeout: %v", cfg.DockerStopTimeout.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please follow the convention on error strings from https://github.com/golang/go/wiki/CodeReviewComments#error-strings (here and everywhere else)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for the reference, I will revise it accordingly.

parsedStartTimeout := parseEnvVariableDuration("ECS_CONTAINER_START_TIMEOUT")
if parsedStartTimeout >= minimumContainerStartTimeout {
containerStartTimeout = parsedStartTimeout
} else if parsedStartTimeout != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't quite understand what's happening here. Might be simpler to do something like this:

if parsedStartTimeout < minimumContainerStartTimeout {
 log.Infof(...) 
 parsedStartTimeout = parsedStartTimeout
}

return containerStartTimeout

Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also make sure that copyright headers of all the files you've touched has year set to 2018?

README.md Outdated
@@ -162,6 +162,7 @@ additional details on each available environment variable.
| `ECS_APPARMOR_CAPABLE` | `true` | Whether AppArmor is available on the container instance. | `false` | `false` |
| `ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION` | 10m | Time to wait to delete containers for a stopped task. If set to less than 1 minute, the value is ignored. | 3h | 3h |
| `ECS_CONTAINER_STOP_TIMEOUT` | 10m | Time to wait for the container to exit normally before being forcibly killed. | 30s | 30s |
| `ECS_CONTAINER_START_TIMEOUT` | 10m | Time to wait for the container to start normally before being forcibly killed. | 3m | 8m |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to wait for the container to start normally before being forcibly killed is a bit misleading. How about Timeout before giving up on starting a container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good.

seelog.Warnf("Discarded invalid value for docker stop timeout, parsed as: %v", parsedStopTimeout)
}
return dockerStopTimeout
}

func getContainerStartTimeout() time.Duration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still not clear to me what's happening in this method. If I specify the value as "-1m", this method returns 0, correct? IMO, this method should never return a value that is less than minimumContainerStartTimeout. Can you please modify this to be something like this: #1321 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will make the change according to that comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update and add comments to getContainerStartTimeout and getDockerStopTimeout functions.

@@ -563,7 +577,11 @@ func (cfg *Config) validateAndOverrideBounds() error {
}

if cfg.DockerStopTimeout < minimumDockerStopTimeout {
return fmt.Errorf("Invalid negative DockerStopTimeout: %v", cfg.DockerStopTimeout.String())
return fmt.Errorf("invalid DockerStopTimeout: %v", cfg.DockerStopTimeout.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this more human readable? "config: invalid value for stop timeout..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sure

}

if cfg.ContainerStartTimeout < minimumContainerStartTimeout {
return fmt.Errorf("invalid ContainerStartTimeout: %v", cfg.ContainerStartTimeout.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this more human readable? "config: invalid value for container start timeout..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

assert.Zero(t, conf.ContainerStartTimeout)
}

func TestInvalidContainerStartTimeout(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we actually need this test as it does not add much value. Creating a config object should always be from the DefaultConfig() or envConfig() or other methods similar to those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this test will be removed.

CHANGELOG.md Outdated
@@ -1,6 +1,7 @@
# Changelog

## 1.17.3-dev
* Enhancement - Distinct startContainerTimouts for windows/linux, introduce a new environment variable `ECS_CONTAINER_START_TIMEOUT`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"introduce a new environment variable ECS_CONTAINER_START_TIMEOUT to make it configurable" - might be better
nit: typo startContainerTimeouts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -41,6 +41,7 @@ import (

const (
testDockerStopTimeout = 2 * time.Second
testContainerStartTimeout = 4 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sure

@@ -31,6 +32,10 @@ const (
// Default cgroup memory system root path, this is the default used if the
// path has not been configured through ECS_CGROUP_PATH
defaultCgroupPath = "/sys/fs/cgroup"
// DefaultContainerStartTimeout specifies the value for container start timeout duration
DefaultContainerStartTimeout = 3 * time.Minute

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a private variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will make it private

@adnxn adnxn mentioned this pull request Mar 30, 2018
8 tasks
Copy link
Contributor

@sharanyad sharanyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Test the start container time cost on windows instances with
various parameters (instance type, container memory, container
cpu, etc.), and set the default startContainerTimeouts value for
windows accordingly.

Also make the startContainerTimeouts configurable (via ECS_CONTAINER_START_TIMEOUT
environment variable) by users.
@haikuoliu haikuoliu force-pushed the branch_startcontainertimeout branch from 3fc6520 to 291b5cb Compare March 30, 2018 18:22
@haikuoliu haikuoliu merged commit eb9560b into aws:dev Mar 30, 2018
@haikuoliu haikuoliu deleted the branch_startcontainertimeout branch April 12, 2018 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants