-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to 1.26.0 #1916
Closed
Closed
Update to 1.26.0 #1916
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Merge 'dev' branch into 'container-ordering' branch
* start timeout is governed by the agent and is the context timeout for the StartContainer api call * stop timeout is the parameter passed to StopContainer and is time the docker daemon waits after a StopContainer call to issue a SIGKILL to the container
Add log driver secret to LogConfig and agent capability
go vet fixes for unkeyed fields and copying lock value
Mongo images are replaced with redis and crux images. Using Mongo images were leading to unpredictable results in functional and integration tests everytime Mongo images were updated on docker hub.
'Success' condition on a dependency container will allow a target container to start only when the dependency container has exitted successfully with exitcode 0. 'Complete' condition on a dependency container will allow a target container to start only when the dependency container has exitted. 'Healthy' condition on a dependency container will allow a target container to start only when the dependency container is reported to be healthy.
…to pending test process
Prior to this commit, we only tracked state we explicitly tried to change when the task was starting. We did not respond to the event stream or any other source of information from Docker. This means that when we are waiting for certain dependency conditions ("SUCCESS", "COMPLETE", or "HEALTHY") the task progression logic does not update the agent-internal model of container state. Since we rely on that state for determining when conditions are met, tasks would get stuck in infinite startup loops. This change adds a call to engine.checkTaskState(), which explicity updates any changed container state. We only call this function if we know that we are waiting on the aforementioned subset of dependency conditions. Co-authored-by: Utsa Bhattacharjya <[email protected]>
engine: adding poll function during progressTask
We now apply shutdown order in any dependency case, including dependsOn directives, links, or volumes. What this means is that agent will now make a best effort attempt to shutdown containers in the inverse order they were created. For example, a container using a link for communication will wait until the linked container has terminated before terminating itself. Likewise, a container named in another container's dependsOn directive will wait until that dependent container terminates. One note about the current implementation is that the dependencies aren't assumed to be transitive, so if a chain exists such that: A -> B -> C Container "C" will shutdown before "B", but it won't check status against container "A" explicity. If A depends on C, we expect: A -> B -> C A -> C The lack of transitive dependency logic applies is consistent with startup order as of this commit.
The link / volume dependency tests are now affected by shutdown order, so the tests now take longer. Previously, it would take a max of 30s (the default docker stop timeout for agent). Now, since the containers stop in order, it will take a max of 30s * n, where n is the number of containers. Increasing the test timeout is a short term fix until we have granular start/stop timeouts plumbed through the ecs service.
Instead of explicitly checking against many conditions, we now validate that the expected condition has progressed beyond started This mirrors prior behavior in the codebase, and reduces cyclo complexity.
dependencygraph: Enforce shutdown order
The ‘StartTimeout’ now will only serve as the the time duration after which if a container has a dependency on another container and the conditions are ‘SUCCESS’, ‘HEALTHY’ and ‘COMPLETE’, then the dependency will not be resolved. For example: • If a container A has a dependency on container B and the condition is ‘START’, the StartTimeout for container B will roughly be the time required for it to exit successfully with exit code 0 • If a container A has a dependency on container B and the condition is ‘COMPLETE’, the StartTimeout for container B will roughly be the time required for it to exit. • If a container A has a dependency on container B and the condition is ‘HEALTHY’, the StartTimeout for container B will roughly be the time required for it to emit a ‘HEALTHY’ status. If the StartTimeout exceeds in any of the above cases, container A will not be able to transition to ‘CREATED’ status. It effectively reverts the implementation of StartTimeout in commit: 79bd517
This is the first batch of integration tests for container ordering. The tests handle the basic use cases for each of the conditions that introduces new behavior into agent (HEALTHY,COMPLETE,SUCCESS).
Container ordering integ tests
* "START" Dependency condition has been changed to "CREATE" as it waits for the dependency to atleast get created * "RUNNING" Dependency Condition has been changed to "START" as it waits for the dependency to get started.
Here, the time duration(StartTimeout) mentioned by the user for a container is expired or not is checked before resolving the dependency for target container. For example, * if a target container 'A' has dependency on container 'B' and the dependency condiiton is 'SUCCESS', then the dependency will not be resolved if B times out before exitting successfully with exit code 0. * if a target container 'A' has dependency on container 'B' and the dependency condiiton is 'COMPLETE', then the dependency will not be resolved if B times out before exitting. * if a target container 'A' has dependency on container 'B' and the dependency condiiton is 'HEALTHY', then the dependency will not be resolved if B times out before emtting 'Healthy' status. The advantage of this is that the user will get to know that something is wrong with the task if the task is stuck in pending..
Dependency Condition Naming change:
Remove the functionality of StartTimeout as Docker API Start Timeout
* Remove need to pull 'latest' server core By removing the :latest tag from all windowsservercore containers, we will have the tests use the container thats already baked into the AMI. * Remove depdency on golang and python containers We are removing the need to use any containers other than servercore and nanoserver. This reduces the number of downloads needed and the number of builds that happen before the tests start running. * Explicit timeouts on order tests The ordering tests are broken at the moment, so we are capping them with a fixed timeout.
Faster windows test
Checking dependency resolution after timeout and successful exit check
1. Add proxy config into acs model 2. Convert acs model to app mesh config in task 3. Pass app mesh config from task to app mesh plugin and invoke add, del command on network setup and clean
1. Add dockerfile to build amazon-vpc-cni-plugins 2. Add build target in makefile for amazon-vpc-cni-plugins
If user didn't input task metadata endpoint ip of agent or instance metadata endpoint ip in appmesh enabled task, agent should add these two default IPs into appmesh egressIgnoredIPs field as we don't want them to be redirect to envoy proxy
This should eventually make windows tests faster to run. Fixes a bug where task context cancel causes an infinite steady state loop. Previously if the context expired, waitSteady() will spin forever since the timeout no longer works. This introduces a check for context expiration earlier in the code.
ACS Model change for container ordering
The default helper function will now allocate 1024 cpu shares or 100% cpu-percent on windows. This will enable the windows based tests to finish in more predictable ways. When Windows tests were constrained, simple tasks liek "sleep 10" were taking much longer than the expected 10 seconds.
Adding Integ Tests for Granular Stop Timeout
If we set "prefer-cached" as the pull behavior, then the PullStartedAt and PullStoppedAt fields may not be present in the endpoint. This causes tests to fail. This change logs an error but prevents that specific failure case.
Merge branch 'container-ordering-feature' into dev
Change task server endpoint to 127.0.0.1
Fix windows validator test
Merge appmesh to dev branch
update nginx version
aws#1890 was merged using rebase and merge which is wrong, and dev is still "56 commits ahead, 1 commit behind master." Fixing by merging again.
Fix merge master to dev "update to 1.25.3"
See changelog entries for complete changes.
sharanyad
reviewed
Feb 28, 2019
@@ -1,5 +1,12 @@ | |||
# Changelog | |||
|
|||
## 1.26.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the changelog items more descriptive?
something like:
Enable start and stop ordering for containers in a task
Container level configurable start and stop container timeouts
not sure what this is : Shutdown order is now observed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Updating Agent to 1.26.0. Changelog:
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.