-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment successful even though pulling the new image failed #413
Comments
Hi @fabfuel, Thank you for reporting this issue. The behavior of the ECS Agent where a container is started even when the pull fails for the image and the image already exists on the host is by design. The intent here is to reduce the impact of not being able to pull an image in case of failures (such as networking events). At this point, we recommend updating your task definition with a new image tag per deployment as a way to ensure the correct image is used. However, we do recognize that this workaround could lead to an explosion in the number of tags that need to be maintained. Could you please let us know if the proposal listed below helps you in resolving your issue? Pull Behavior ProposalIt should be possible to configure the ECS Agent to customize the
The default behavior when this configuration is not set will be the same as setting |
Hi @aaithal, On production I only deploy specific version tags based on the VCS version tags, but the reported issue causes problems on our testing and staging environments and breaks our continuous deployment workflow. The proposal is great. It's exactly what would solve this issue. Best |
Just saw, that |
@fabfuel yes, you're right. I'll mark this as a feature request for now. We'll let you know when we have an update on the issue. |
@fabfuel I think I may be experiencing a related issue, but I think it is more of a bug, not a feature request. In my case I have an ECS instance (Amazon ECS-Optimized Amazon Linux AMI 2016.03.b) that does not have my-app:my-docker-tag in its local cache. It definitely does exist in ECR and other ECS instances in the same ASG are running it successfully. I get the same series of log messages repeatedly in the ECS agent log and no container ever successfully starts on the affected instance. So far my solution has just been to terminate instances in this state but not a great solution. Example of 2 repetitions in the ECS agent log:
|
Hello @jhovell, Thanks for reporting this! Could you open a new issue for this? It doesn't appear to be related to the initial reported issue so I think we should track this separately. |
@jhovell I have one suggestion that may help. When the instance gets into this state, can you attempt do a Let us know once you've performed that test. Feel free to add these comments to the new issue you create. Thanks! |
@juanrhenals I opened a separate issue & tried your suggestion. Details here: |
+1 |
An additional use-case to add to @aaithal's proposal:
|
Any movement on this? We build Docker images using tags that are unique per version, and so if a xxx:nnn is successfully pulled at one time, it will always be the same. We have a workload scenario where we often want to run tens of thousands of small tasks, and have noticed that the ECS agent is pretty slow in starting them; one of the things holding it up seems to be that it keeps checking if there are any updates to the image, which there is never. The suggested "once" setting would make sense for us. |
An additional use-case to add to @aaithal's proposal:
|
We've released this feature with |
Merge branch 'feature/ecs-anywhere-ga' into dev
I redeployed a service on two instances. On one instance, one of the container images could not be pulled (what is actually a ECR problem), but the ECS agent considered the deployment as successful.
The result is, that the service has reached a steady state, but it's delivering two different versions. That's a blocker in my opinion, because the deployment actually failed and you do not notice.
instance A:
XXX.dkr.ecr.us-east-1.amazonaws.com/XXX/nginx develop c4c0b6fa9ad2 14 minutes ago 182.7 MB
instance B:
XXX.dkr.ecr.us-east-1.amazonaws.com/XXX/nginx develop e29ed85a1df4 8 days ago 182.7 MB
This is from the ecs agent log:
Both instances are running:
ECS agent: 1.9.0
Docker: 1.9.1
Is this the expected behavior?
Do you need any more information?
Thanks
Fabian
The text was updated successfully, but these errors were encountered: