V1.26.0 stage #1917

petderek · 2019-02-28T23:48:00Z

Summary

Updating Agent to 1.26.0. Changelog:

Feature - Startup order can now be explicitly set via DependsOn field in the Task Definition #1904
Feature - Containers in a task can now have individual start and stop timeouts #1904
Feature - AWS App Mesh CNI plugin support #1898
Enhancement - Containers with links and volumes defined will now shutdown in the correct order #1904
Bug - Image cleanup errors fixed #1897

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Merge 'dev' branch into 'container-ordering' branch

* start timeout is governed by the agent and is the context timeout for the StartContainer api call * stop timeout is the parameter passed to StopContainer and is time the docker daemon waits after a StopContainer call to issue a SIGKILL to the container

Add log driver secret to LogConfig and agent capability

go vet fixes for unkeyed fields and copying lock value

Mongo images are replaced with redis and crux images. Using Mongo images were leading to unpredictable results in functional and integration tests everytime Mongo images were updated on docker hub.

'Success' condition on a dependency container will allow a target container to start only when the dependency container has exitted successfully with exitcode 0. 'Complete' condition on a dependency container will allow a target container to start only when the dependency container has exitted. 'Healthy' condition on a dependency container will allow a target container to start only when the dependency container is reported to be healthy.

…to pending test process

Prior to this commit, we only tracked state we explicitly tried to change when the task was starting. We did not respond to the event stream or any other source of information from Docker. This means that when we are waiting for certain dependency conditions ("SUCCESS", "COMPLETE", or "HEALTHY") the task progression logic does not update the agent-internal model of container state. Since we rely on that state for determining when conditions are met, tasks would get stuck in infinite startup loops. This change adds a call to engine.checkTaskState(), which explicity updates any changed container state. We only call this function if we know that we are waiting on the aforementioned subset of dependency conditions. Co-authored-by: Utsa Bhattacharjya <[email protected]>

engine: adding poll function during progressTask

We now apply shutdown order in any dependency case, including dependsOn directives, links, or volumes. What this means is that agent will now make a best effort attempt to shutdown containers in the inverse order they were created. For example, a container using a link for communication will wait until the linked container has terminated before terminating itself. Likewise, a container named in another container's dependsOn directive will wait until that dependent container terminates. One note about the current implementation is that the dependencies aren't assumed to be transitive, so if a chain exists such that: A -> B -> C Container "C" will shutdown before "B", but it won't check status against container "A" explicity. If A depends on C, we expect: A -> B -> C A -> C The lack of transitive dependency logic applies is consistent with startup order as of this commit.

The link / volume dependency tests are now affected by shutdown order, so the tests now take longer. Previously, it would take a max of 30s (the default docker stop timeout for agent). Now, since the containers stop in order, it will take a max of 30s * n, where n is the number of containers. Increasing the test timeout is a short term fix until we have granular start/stop timeouts plumbed through the ecs service.

Instead of explicitly checking against many conditions, we now validate that the expected condition has progressed beyond started This mirrors prior behavior in the codebase, and reduces cyclo complexity.

dependencygraph: Enforce shutdown order

The ‘StartTimeout’ now will only serve as the the time duration after which if a container has a dependency on another container and the conditions are ‘SUCCESS’, ‘HEALTHY’ and ‘COMPLETE’, then the dependency will not be resolved. For example: • If a container A has a dependency on container B and the condition is ‘START’, the StartTimeout for container B will roughly be the time required for it to exit successfully with exit code 0 • If a container A has a dependency on container B and the condition is ‘COMPLETE’, the StartTimeout for container B will roughly be the time required for it to exit. • If a container A has a dependency on container B and the condition is ‘HEALTHY’, the StartTimeout for container B will roughly be the time required for it to emit a ‘HEALTHY’ status. If the StartTimeout exceeds in any of the above cases, container A will not be able to transition to ‘CREATED’ status. It effectively reverts the implementation of StartTimeout in commit: 79bd517

This is the first batch of integration tests for container ordering. The tests handle the basic use cases for each of the conditions that introduces new behavior into agent (HEALTHY,COMPLETE,SUCCESS).

Container ordering integ tests

* "START" Dependency condition has been changed to "CREATE" as it waits for the dependency to atleast get created * "RUNNING" Dependency Condition has been changed to "START" as it waits for the dependency to get started.

Here, the time duration(StartTimeout) mentioned by the user for a container is expired or not is checked before resolving the dependency for target container. For example, * if a target container 'A' has dependency on container 'B' and the dependency condiiton is 'SUCCESS', then the dependency will not be resolved if B times out before exitting successfully with exit code 0. * if a target container 'A' has dependency on container 'B' and the dependency condiiton is 'COMPLETE', then the dependency will not be resolved if B times out before exitting. * if a target container 'A' has dependency on container 'B' and the dependency condiiton is 'HEALTHY', then the dependency will not be resolved if B times out before emtting 'Healthy' status. The advantage of this is that the user will get to know that something is wrong with the task if the task is stuck in pending..

Dependency Condition Naming change:

Remove the functionality of StartTimeout as Docker API Start Timeout

* Remove need to pull 'latest' server core By removing the :latest tag from all windowsservercore containers, we will have the tests use the container thats already baked into the AMI. * Remove depdency on golang and python containers We are removing the need to use any containers other than servercore and nanoserver. This reduces the number of downloads needed and the number of builds that happen before the tests start running. * Explicit timeouts on order tests The ordering tests are broken at the moment, so we are capping them with a fixed timeout.

Faster windows test

Fix amazon-vpc-cni-plugins sha check.

Adding functional tests for container ordering

increase linux integ tests timeout

update amazon-vpc-cni-plugins sha

yumex93 · 2019-02-28T23:51:41Z

The cni plugin name is aws-appmesh. I think we should use either AWS App Mesh or aws-appmesh.

yumex93 · 2019-03-01T00:25:25Z

TestStateManager failed in linux unit test

petderek · 2019-03-01T02:06:03Z

Unit tests are still flaking even after updating to faster drives. Unit tests are still passing in travis.

This timeout was fine when working on a dev box but its not quite as happy in automated testing.

See changelog entries for complete changes.

ubhattacharjya · 2019-03-01T06:14:47Z

Windows Flaky Functional Tests 'TestOOMContainer', 'TestV3TaskEndpointTags', 'TestV3TaskEndpointDefaultNetworkMode' passed in local ec2 windows instance when I tried reproducing it

Reference: #1869

ubhattacharjya · 2019-03-01T06:32:13Z

Arm function tests failed because of flaky test 'TestTelemetry' and I have tested in an arm instance with latest code.

Reference: #1903

petderek · 2019-03-06T17:55:20Z

Windows test was fixed in #1919

ubhattacharjya and others added 30 commits January 30, 2019 13:31

Model changes for ACS for container ordering

e36e2c8

Adding adapter to have volumes/links use DependsOn field

5fbfab0

Refactor dependency graph code for volume/links dependency resolution

5c3fa51

Merge branch 'dev' into container-ordering feature

680db40

Add log driver secret to LogConfig and agent capability

4e818ef

Merge pull request aws#1853 from ubhattacharjya/mergeBranch

2cf4734

Merge 'dev' branch into 'container-ordering' branch

statemanager: add container start/stop statemanger

3265bf9

Merge pull request aws#1818 from tommyhahn/branch_logging_driver

e325804

Add log driver secret to LogConfig and agent capability

Merge pull request aws#1849 from adnxn/dev-granular-timeouts

f4da625

readme: update config level container timeouts

0e8a379

cleaned travis config format, and upgraded to go 1.11

9ab5eff

go vet fixes for unkeyed fields and copying lock value

Replace mongo docker images with crux and redis

7b9b9fc

Mongo images are replaced with redis and crux images. Using Mongo images were leading to unpredictable results in functional and integration tests everytime Mongo images were updated on docker hub.

Revert "Add log driver secret to LogConfig and agent capability" due …

9708e51

…to pending test process

Merge pull request aws#1876 from petderek/container-ordering-task-sync

3bc095b

engine: adding poll function during progressTask

dependencygraph: simplify container start logic

89bb2e8

Instead of explicitly checking against many conditions, we now validate that the expected condition has progressed beyond started This mirrors prior behavior in the codebase, and reduces cyclo complexity.

Merge pull request aws#1866 from petderek/container-ordering-feature

582327c

dependencygraph: Enforce shutdown order

engine: add ordering integration tests

5bac35e

This is the first batch of integration tests for container ordering. The tests handle the basic use cases for each of the conditions that introduces new behavior into agent (HEALTHY,COMPLETE,SUCCESS).

Merge pull request aws#1881 from petderek/container-ordering-integ-tests

af9e1f5

Container ordering integ tests

Dependency Condition Naming change:

4ac84e4

* "START" Dependency condition has been changed to "CREATE" as it waits for the dependency to atleast get created * "RUNNING" Dependency Condition has been changed to "START" as it waits for the dependency to get started.

Merge pull request aws#1882 from ubhattacharjya/Naming

7ba5947

Dependency Condition Naming change:

Merge pull request aws#1880 from ubhattacharjya/ChangeStartTimeout

83c928b

Remove the functionality of StartTimeout as Docker API Start Timeout

Merge pull request aws#1886 from petderek/fast-windows-test

ee7a419

Faster windows test

ubhattacharjya and others added 7 commits February 28, 2019 19:10

Increase docker stop timeout to 5 seconds

f1f4453

increase linux integ tests timeout

f60b03c

Merge pull request aws#1911 from suneyz/dev

d73de56

Fix amazon-vpc-cni-plugins sha check.

Merge pull request aws#1889 from ubhattacharjya/FunctionalTests

e39f661

Adding functional tests for container ordering

update amazon-vpc-cni-plugins sha

78d1921

Merge pull request aws#1913 from yumex93/dev

7642368

increase linux integ tests timeout

Merge pull request aws#1914 from suneyz/dev

e7af32a

update amazon-vpc-cni-plugins sha

petderek force-pushed the v1.26.0-stage branch from 0be3255 to a032c87 Compare February 28, 2019 23:53

yumex93 approved these changes Feb 28, 2019

View reviewed changes

petderek force-pushed the v1.26.0-stage branch from a032c87 to 73e50a1 Compare March 1, 2019 00:10

petderek added bot/test and removed bot/test labels Mar 1, 2019

petderek added 2 commits February 28, 2019 19:54

Adjusting timeouts on a windows test

9a351b8

This timeout was fine when working on a dev box but its not quite as happy in automated testing.

Update to 1.26.0

ebac220

See changelog entries for complete changes.

petderek force-pushed the v1.26.0-stage branch from 73e50a1 to ebac220 Compare March 1, 2019 03:58

petderek added this to the 1.26.0 milestone Mar 1, 2019

petderek added staging Trigger staging workflow and removed bot/test labels Mar 1, 2019

petderek merged commit ebac220 into aws:master Mar 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V1.26.0 stage #1917

V1.26.0 stage #1917

petderek commented Feb 28, 2019 •

edited

Loading

yumex93 commented Feb 28, 2019

yumex93 commented Mar 1, 2019

petderek commented Mar 1, 2019

ubhattacharjya commented Mar 1, 2019 •

edited

Loading

ubhattacharjya commented Mar 1, 2019

petderek commented Mar 6, 2019

V1.26.0 stage #1917

V1.26.0 stage #1917

Conversation

petderek commented Feb 28, 2019 • edited Loading

Summary

Licensing

yumex93 commented Feb 28, 2019

yumex93 commented Mar 1, 2019

petderek commented Mar 1, 2019

ubhattacharjya commented Mar 1, 2019 • edited Loading

ubhattacharjya commented Mar 1, 2019

petderek commented Mar 6, 2019

petderek commented Feb 28, 2019 •

edited

Loading

ubhattacharjya commented Mar 1, 2019 •

edited

Loading