Deployment successful even though pulling the new image failed #413

fabfuel · 2016-05-26T10:26:05Z

I redeployed a service on two instances. On one instance, one of the container images could not be pulled (what is actually a ECR problem), but the ECS agent considered the deployment as successful.
The result is, that the service has reached a steady state, but it's delivering two different versions. That's a blocker in my opinion, because the deployment actually failed and you do not notice.

instance A:
XXX.dkr.ecr.us-east-1.amazonaws.com/XXX/nginx develop c4c0b6fa9ad2 14 minutes ago 182.7 MB

instance B:
XXX.dkr.ecr.us-east-1.amazonaws.com/XXX/nginx develop e29ed85a1df4 8 days ago 182.7 MB

This is from the ecs agent log:

2016-05-26T10:00:09Z [INFO] Error while pulling container; will try to run anyways module="TaskEngine" task="XXX-staging:38 arn:aws:ecs:eu-central-1:XXX:task/8ce9da0c-51a9-483f-9185-c2c76e4c5dd1, Status: (NONE->RUNNING) Containers: [nginx (PULLED->RUNNING),app (RUNNING->RUNNING),]" err="Error: image XXX/nginx:develop not found"

Both instances are running:
ECS agent: 1.9.0
Docker: 1.9.1

Is this the expected behavior?
Do you need any more information?

Thanks
Fabian

The text was updated successfully, but these errors were encountered:

aaithal · 2016-05-27T00:04:36Z

Hi @fabfuel,

Thank you for reporting this issue. The behavior of the ECS Agent where a container is started even when the pull fails for the image and the image already exists on the host is by design. The intent here is to reduce the impact of not being able to pull an image in case of failures (such as networking events). At this point, we recommend updating your task definition with a new image tag per deployment as a way to ensure the correct image is used.

However, we do recognize that this workaround could lead to an explosion in the number of tags that need to be maintained. Could you please let us know if the proposal listed below helps you in resolving your issue?

Pull Behavior Proposal

It should be possible to configure the ECS Agent to customize the pull behavior. The ECS_AGENT_PULL_BEHAVIOR environment variable would be used for this purpose. The following values can be used to modify this behavior:

default: Every time a container is started as a result of task launch, ECS Agent attempts to pull the container. If a pull fails, ECS Agent tries to start from the Docker image cache anyway, assuming that the image has not changed.
always: Every time a container is started, ECS Agent attempts to pull the container. If the pull fails, the task fails.
never: ECS Agent never attempts to pull the container. If the image is not already present in the Docker image cache, the task fails.

The default behavior when this configuration is not set will be the same as setting ECS_AGENT_PULL_BEHAVIOR to default.

fabfuel · 2016-05-27T05:01:33Z

Hi @aaithal,
thanks for the quick response!

On production I only deploy specific version tags based on the VCS version tags, but the reported issue causes problems on our testing and staging environments and breaks our continuous deployment workflow.

The proposal is great. It's exactly what would solve this issue.
The setting could also be set in /etc/ecs/ecs.config, right?

Best
Fabian

fabfuel · 2016-05-27T05:04:11Z

Just saw, that /etc/ecs/ecs.config is used as --env-file. So yes! 😄

aaithal · 2016-05-27T05:19:38Z

@fabfuel yes, you're right. I'll mark this as a feature request for now. We'll let you know when we have an update on the issue.

jhovell · 2016-06-02T02:00:43Z

@fabfuel I think I may be experiencing a related issue, but I think it is more of a bug, not a feature request. In my case I have an ECS instance (Amazon ECS-Optimized Amazon Linux AMI 2016.03.b) that does not have my-app:my-docker-tag in its local cache. It definitely does exist in ECR and other ECS instances in the same ASG are running it successfully. I get the same series of log messages repeatedly in the ECS agent log and no container ever successfully starts on the affected instance. So far my solution has just been to terminate instances in this state but not a great solution.

Example of 2 repetitions in the ECS agent log:

2016-06-02T01:00:05Z [INFO] Error while pulling container; will try to run anyways module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]" err="Error: image my-app:my-docker-tag not found"
2016-06-02T01:00:05Z [INFO] Creating container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING)"
2016-06-02T01:00:05Z [INFO] Created container name mapping for task my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),] - my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING) -> ecs-my-app-178-my-app-94ae8b9ee9818bca2f00
2016-06-02T01:00:05Z [INFO] Saving state! module="statemanager"
2016-06-02T01:00:06Z [INFO] Created docker container for task my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]: my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING) -> 
2016-06-02T01:00:06Z [INFO] Error transitioning container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING)" state="CREATED"
2016-06-02T01:00:06Z [WARN] Error with docker; stopping container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c, Status: (NONE->RUNNING) Containers: [my-app (CREATED->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (CREATED->RUNNING)" err="no such image"
2016-06-02T01:00:06Z [INFO] Task change event module="TaskEngine" event="{TaskArn:arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c Status:STOPPED Reason: SentStatus:NONE}"
2016-06-02T01:00:06Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c my-app -> STOPPED, Reason CannotPullContainerError: Error: image my-app:my-docker-tag not found, Known Sent: NONE"
2016-06-02T01:00:06Z [INFO] Adding event module="eventhandler" change="TaskChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c -> STOPPED, Known Sent: NONE"
2016-06-02T01:00:06Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c my-app -> STOPPED, Reason CannotPullContainerError: Error: image my-app:my-docker-tag not found, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c my-app -> STOPPED, Reason CannotPullContainerError: Error: image my-app:my-docker-tag not found, Known Sent: NONE"
2016-06-02T01:00:06Z [INFO] Sending task change module="eventhandler" event="TaskChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c -> STOPPED, Known Sent: NONE" change="TaskChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c -> STOPPED, Known Sent: NONE"
2016-06-02T01:00:07Z [INFO] Saving state! module="statemanager"
2016-06-02T01:00:11Z [INFO] Pulling container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (NONE->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (NONE->RUNNING)"
2016-06-02T01:00:14Z [INFO] Removing container module="TaskEngine" task="my-app:177 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/e58288b6-257c-4ced-9cfc-c52be1a91a8e, Status: (STOPPED->STOPPED) Containers: [my-app (STOPPED->STOPPED),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:136-2016.21.1-9215214) (STOPPED->STOPPED)"
2016-06-02T01:00:16Z [INFO] Error transitioning container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (NONE->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (NONE->RUNNING)" state="PULLED"
2016-06-02T01:00:16Z [INFO] Error while pulling container; will try to run anyways module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]" err="Error: image my-app:my-docker-tag not found"
2016-06-02T01:00:16Z [INFO] Creating container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING)"
2016-06-02T01:00:16Z [INFO] Created container name mapping for task my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),] - my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING) -> ecs-my-app-178-my-app-ce93b6eacdcaeac4e901
2016-06-02T01:00:16Z [INFO] Saving state! module="statemanager"
2016-06-02T01:00:16Z [INFO] Created docker container for task my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]: my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING) -> 
2016-06-02T01:00:16Z [INFO] Error transitioning container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING)" state="CREATED"
2016-06-02T01:00:16Z [WARN] Error with docker; stopping container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (CREATED->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (CREATED->RUNNING)" err="no such image"
2016-06-02T01:00:16Z [INFO] Task change event module="TaskEngine" event="{TaskArn:arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 Status:STOPPED Reason: SentStatus:NONE}"
2016-06-02T01:00:16Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 my-app -> STOPPED, Reason CannotPullContainerError: Error: image my-app:my-docker-tag not found, Known Sent: NONE"
2016-06-02T01:00:16Z [INFO] Adding event module="eventhandler" change="TaskChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 -> STOPPED, Known Sent: NONE"
2016-06-02T01:00:16Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 my-app -> STOPPED, Reason CannotPullContainerError: Error: image my-app:my-docker-tag not found, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 my-app -> STOPPED, Reason CannotPullContainerError: Error: image my-app:my-docker-tag not found, Known Sent: NONE"
2016-06-02T01:00:16Z [INFO] Sending task change module="eventhandler" event="TaskChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 -> STOPPED, Known Sent: NONE" change="TaskChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 -> STOPPED, Known Sent: NONE"
2016-06-02T01:00:17Z [INFO] Saving state! module="statemanager"
2016-06-02T01:00:21Z [INFO] Removing container module="TaskEngine" task="sunsim:39 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/9d1136bf-7cf8-4105-a033-6dce097e869d, Status: (STOPPED->STOPPED) Containers: [sunsim (STOPPED->STOPPED),]" container="sunsim(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/sunsim:19-1.7.0-50b9426) (STOPPED->STOPPED)"
2016-06-02T01:00:24Z [INFO] Pulling container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/0802ee37-87ad-48ab-a4bf-57a929a62e70, Status: (NONE->RUNNING) Containers: [my-app (NONE->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (NONE->RUNNING)"
2016-06-02T01:00:27Z [INFO] Saving state! module="statemanager"
2016-06-02T01:00:29Z [INFO] Error transitioning container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/0802ee37-87ad-48ab-a4bf-57a929a62e70, Status: (NONE->RUNNING) Containers: [my-app (NONE->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (NONE->RUNNING)" state="PULLED"

juanrhenals · 2016-06-02T15:34:57Z

Hello @jhovell,

Thanks for reporting this! Could you open a new issue for this? It doesn't appear to be related to the initial reported issue so I think we should track this separately.

juanrhenals · 2016-06-02T16:25:11Z

@jhovell I have one suggestion that may help. When the instance gets into this state, can you attempt do a docker pull for the image is failing to be pulled via the agent? If the pull succeeds, it points to a potential issue with the Agent which we need to investigate.

Let us know once you've performed that test.

Feel free to add these comments to the new issue you create.

Thanks!

jhovell · 2016-06-02T17:00:30Z

@juanrhenals I opened a separate issue & tried your suggestion. Details here:

#415

bo1984 · 2016-11-08T23:58:03Z

+1

samuelkarp · 2016-12-20T05:11:15Z

An additional use-case to add to @aaithal's proposal:

once: The first time an image referenced, it is pulled. Additional times the image is referenced (while it has not been removed), no pull is performed. This covers customers who create a new tag for every build, but tags never change.

perhallstroem · 2017-10-18T22:28:44Z

Any movement on this? We build Docker images using tags that are unique per version, and so if a xxx:nnn is successfully pulled at one time, it will always be the same. We have a workload scenario where we often want to run tens of thousands of small tasks, and have noticed that the ECS agent is pretty slow in starting them; one of the things holding it up seems to be that it keeps checking if there are any updates to the image, which there is never. The suggested "once" setting would make sense for us.

haikuoliu · 2018-04-27T21:45:51Z

An additional use-case to add to @aaithal's proposal:

prefer-cached: If there is cached image, no pull is performed, otherwise pull the image. This covers customers who pre-install the image and don't want to send too much requests to the registry.

haikuoliu · 2018-05-28T18:55:22Z

We've released this feature with default, always, once and prefer-cached behavior. See details here. This issue will be closed.

Merge branch 'feature/ecs-anywhere-ga' into dev

aaithal added the kind/feature request label May 27, 2016

richardpen mentioned this issue Mar 31, 2017

Error: image xxx:latest not found #744

Closed

aaithal added the scope/ECS Agent label Apr 28, 2017

samuelkarp added the contributions welcome label May 12, 2017

jhaynes mentioned this issue Sep 28, 2017

Always pull latest image version (digest) of the specified tag #995

Closed

jhaynes mentioned this issue Dec 7, 2017

ECS Agent starting from local cache when tag no longer exists upstream #1136

Closed

haikuoliu self-assigned this Apr 12, 2018

haikuoliu mentioned this issue Apr 17, 2018

engine: introduce a new env var to distinct pull image behavior #1348

Merged

8 tasks

haikuoliu added this to the 1.18.0 milestone May 28, 2018

haikuoliu closed this as completed May 28, 2018

fabfuel mentioned this issue Sep 15, 2018

Deploy without timeout or wait fabfuel/ecs-deploy#60

Closed

edibble21 pushed a commit to edibble21/amazon-ecs-agent that referenced this issue Jul 9, 2021

Merge pull request aws#413 from fenxiong/dev-merge

13bf84d

Merge branch 'feature/ecs-anywhere-ga' into dev

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment successful even though pulling the new image failed #413

Deployment successful even though pulling the new image failed #413

fabfuel commented May 26, 2016

aaithal commented May 27, 2016

fabfuel commented May 27, 2016 •

edited

Loading

fabfuel commented May 27, 2016

aaithal commented May 27, 2016

jhovell commented Jun 2, 2016 •

edited

Loading

juanrhenals commented Jun 2, 2016

juanrhenals commented Jun 2, 2016

jhovell commented Jun 2, 2016

bo1984 commented Nov 8, 2016

samuelkarp commented Dec 20, 2016

perhallstroem commented Oct 18, 2017

haikuoliu commented Apr 27, 2018

haikuoliu commented May 28, 2018

Deployment successful even though pulling the new image failed #413

Deployment successful even though pulling the new image failed #413

Comments

fabfuel commented May 26, 2016

aaithal commented May 27, 2016

Pull Behavior Proposal

fabfuel commented May 27, 2016 • edited Loading

fabfuel commented May 27, 2016

aaithal commented May 27, 2016

jhovell commented Jun 2, 2016 • edited Loading

juanrhenals commented Jun 2, 2016

juanrhenals commented Jun 2, 2016

jhovell commented Jun 2, 2016

bo1984 commented Nov 8, 2016

samuelkarp commented Dec 20, 2016

perhallstroem commented Oct 18, 2017

haikuoliu commented Apr 27, 2018

haikuoliu commented May 28, 2018

fabfuel commented May 27, 2016 •

edited

Loading

jhovell commented Jun 2, 2016 •

edited

Loading