Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment successful even though pulling the new image failed #413

Closed
fabfuel opened this issue May 26, 2016 · 13 comments
Closed

Deployment successful even though pulling the new image failed #413

fabfuel opened this issue May 26, 2016 · 13 comments

Comments

@fabfuel
Copy link

fabfuel commented May 26, 2016

I redeployed a service on two instances. On one instance, one of the container images could not be pulled (what is actually a ECR problem), but the ECS agent considered the deployment as successful.
The result is, that the service has reached a steady state, but it's delivering two different versions. That's a blocker in my opinion, because the deployment actually failed and you do not notice.

instance A:
XXX.dkr.ecr.us-east-1.amazonaws.com/XXX/nginx develop c4c0b6fa9ad2 14 minutes ago 182.7 MB

instance B:
XXX.dkr.ecr.us-east-1.amazonaws.com/XXX/nginx develop e29ed85a1df4 8 days ago 182.7 MB

This is from the ecs agent log:

2016-05-26T10:00:09Z [INFO] Error while pulling container; will try to run anyways module="TaskEngine" task="XXX-staging:38 arn:aws:ecs:eu-central-1:XXX:task/8ce9da0c-51a9-483f-9185-c2c76e4c5dd1, Status: (NONE->RUNNING) Containers: [nginx (PULLED->RUNNING),app (RUNNING->RUNNING),]" err="Error: image XXX/nginx:develop not found"

Both instances are running:
ECS agent: 1.9.0
Docker: 1.9.1

Is this the expected behavior?
Do you need any more information?

Thanks
Fabian

@aaithal
Copy link
Contributor

aaithal commented May 27, 2016

Hi @fabfuel,

Thank you for reporting this issue. The behavior of the ECS Agent where a container is started even when the pull fails for the image and the image already exists on the host is by design. The intent here is to reduce the impact of not being able to pull an image in case of failures (such as networking events). At this point, we recommend updating your task definition with a new image tag per deployment as a way to ensure the correct image is used.

However, we do recognize that this workaround could lead to an explosion in the number of tags that need to be maintained. Could you please let us know if the proposal listed below helps you in resolving your issue?

Pull Behavior Proposal

It should be possible to configure the ECS Agent to customize the pull behavior. The ECS_AGENT_PULL_BEHAVIOR environment variable would be used for this purpose. The following values can be used to modify this behavior:

  • default: Every time a container is started as a result of task launch, ECS Agent attempts to pull the container. If a pull fails, ECS Agent tries to start from the Docker image cache anyway, assuming that the image has not changed.
  • always: Every time a container is started, ECS Agent attempts to pull the container. If the pull fails, the task fails.
  • never: ECS Agent never attempts to pull the container. If the image is not already present in the Docker image cache, the task fails.

The default behavior when this configuration is not set will be the same as setting ECS_AGENT_PULL_BEHAVIOR to default.

@fabfuel
Copy link
Author

fabfuel commented May 27, 2016

Hi @aaithal,
thanks for the quick response!

On production I only deploy specific version tags based on the VCS version tags, but the reported issue causes problems on our testing and staging environments and breaks our continuous deployment workflow.

The proposal is great. It's exactly what would solve this issue.
The setting could also be set in /etc/ecs/ecs.config, right?

Best
Fabian

@fabfuel
Copy link
Author

fabfuel commented May 27, 2016

Just saw, that /etc/ecs/ecs.config is used as --env-file. So yes! 😄

@aaithal
Copy link
Contributor

aaithal commented May 27, 2016

@fabfuel yes, you're right. I'll mark this as a feature request for now. We'll let you know when we have an update on the issue.

@jhovell
Copy link

jhovell commented Jun 2, 2016

@fabfuel I think I may be experiencing a related issue, but I think it is more of a bug, not a feature request. In my case I have an ECS instance (Amazon ECS-Optimized Amazon Linux AMI 2016.03.b) that does not have my-app:my-docker-tag in its local cache. It definitely does exist in ECR and other ECS instances in the same ASG are running it successfully. I get the same series of log messages repeatedly in the ECS agent log and no container ever successfully starts on the affected instance. So far my solution has just been to terminate instances in this state but not a great solution.

Example of 2 repetitions in the ECS agent log:

2016-06-02T01:00:05Z [INFO] Error while pulling container; will try to run anyways module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]" err="Error: image my-app:my-docker-tag not found"
2016-06-02T01:00:05Z [INFO] Creating container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING)"
2016-06-02T01:00:05Z [INFO] Created container name mapping for task my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),] - my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING) -> ecs-my-app-178-my-app-94ae8b9ee9818bca2f00
2016-06-02T01:00:05Z [INFO] Saving state! module="statemanager"
2016-06-02T01:00:06Z [INFO] Created docker container for task my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]: my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING) -> 
2016-06-02T01:00:06Z [INFO] Error transitioning container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING)" state="CREATED"
2016-06-02T01:00:06Z [WARN] Error with docker; stopping container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c, Status: (NONE->RUNNING) Containers: [my-app (CREATED->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (CREATED->RUNNING)" err="no such image"
2016-06-02T01:00:06Z [INFO] Task change event module="TaskEngine" event="{TaskArn:arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c Status:STOPPED Reason: SentStatus:NONE}"
2016-06-02T01:00:06Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c my-app -> STOPPED, Reason CannotPullContainerError: Error: image my-app:my-docker-tag not found, Known Sent: NONE"
2016-06-02T01:00:06Z [INFO] Adding event module="eventhandler" change="TaskChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c -> STOPPED, Known Sent: NONE"
2016-06-02T01:00:06Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c my-app -> STOPPED, Reason CannotPullContainerError: Error: image my-app:my-docker-tag not found, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c my-app -> STOPPED, Reason CannotPullContainerError: Error: image my-app:my-docker-tag not found, Known Sent: NONE"
2016-06-02T01:00:06Z [INFO] Sending task change module="eventhandler" event="TaskChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c -> STOPPED, Known Sent: NONE" change="TaskChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/33d42123-4132-452b-bfb3-4741af658e7c -> STOPPED, Known Sent: NONE"
2016-06-02T01:00:07Z [INFO] Saving state! module="statemanager"
2016-06-02T01:00:11Z [INFO] Pulling container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (NONE->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (NONE->RUNNING)"
2016-06-02T01:00:14Z [INFO] Removing container module="TaskEngine" task="my-app:177 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/e58288b6-257c-4ced-9cfc-c52be1a91a8e, Status: (STOPPED->STOPPED) Containers: [my-app (STOPPED->STOPPED),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:136-2016.21.1-9215214) (STOPPED->STOPPED)"
2016-06-02T01:00:16Z [INFO] Error transitioning container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (NONE->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (NONE->RUNNING)" state="PULLED"
2016-06-02T01:00:16Z [INFO] Error while pulling container; will try to run anyways module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]" err="Error: image my-app:my-docker-tag not found"
2016-06-02T01:00:16Z [INFO] Creating container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING)"
2016-06-02T01:00:16Z [INFO] Created container name mapping for task my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),] - my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING) -> ecs-my-app-178-my-app-ce93b6eacdcaeac4e901
2016-06-02T01:00:16Z [INFO] Saving state! module="statemanager"
2016-06-02T01:00:16Z [INFO] Created docker container for task my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]: my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING) -> 
2016-06-02T01:00:16Z [INFO] Error transitioning container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (PULLED->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (PULLED->RUNNING)" state="CREATED"
2016-06-02T01:00:16Z [WARN] Error with docker; stopping container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455, Status: (NONE->RUNNING) Containers: [my-app (CREATED->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (CREATED->RUNNING)" err="no such image"
2016-06-02T01:00:16Z [INFO] Task change event module="TaskEngine" event="{TaskArn:arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 Status:STOPPED Reason: SentStatus:NONE}"
2016-06-02T01:00:16Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 my-app -> STOPPED, Reason CannotPullContainerError: Error: image my-app:my-docker-tag not found, Known Sent: NONE"
2016-06-02T01:00:16Z [INFO] Adding event module="eventhandler" change="TaskChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 -> STOPPED, Known Sent: NONE"
2016-06-02T01:00:16Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 my-app -> STOPPED, Reason CannotPullContainerError: Error: image my-app:my-docker-tag not found, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 my-app -> STOPPED, Reason CannotPullContainerError: Error: image my-app:my-docker-tag not found, Known Sent: NONE"
2016-06-02T01:00:16Z [INFO] Sending task change module="eventhandler" event="TaskChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 -> STOPPED, Known Sent: NONE" change="TaskChange: arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/d586ef99-28a7-405e-acef-71f60e57c455 -> STOPPED, Known Sent: NONE"
2016-06-02T01:00:17Z [INFO] Saving state! module="statemanager"
2016-06-02T01:00:21Z [INFO] Removing container module="TaskEngine" task="sunsim:39 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/9d1136bf-7cf8-4105-a033-6dce097e869d, Status: (STOPPED->STOPPED) Containers: [sunsim (STOPPED->STOPPED),]" container="sunsim(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/sunsim:19-1.7.0-50b9426) (STOPPED->STOPPED)"
2016-06-02T01:00:24Z [INFO] Pulling container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/0802ee37-87ad-48ab-a4bf-57a929a62e70, Status: (NONE->RUNNING) Containers: [my-app (NONE->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (NONE->RUNNING)"
2016-06-02T01:00:27Z [INFO] Saving state! module="statemanager"
2016-06-02T01:00:29Z [INFO] Error transitioning container module="TaskEngine" task="my-app:178 arn:aws:ecs:us-west-2:ecs-cluster-account-id:task/0802ee37-87ad-48ab-a4bf-57a929a62e70, Status: (NONE->RUNNING) Containers: [my-app (NONE->RUNNING),]" container="my-app(ecr-repository-account-id.dkr.ecr.us-east-1.amazonaws.com/my-app:my-docker-tag) (NONE->RUNNING)" state="PULLED"

@juanrhenals
Copy link
Contributor

Hello @jhovell,

Thanks for reporting this! Could you open a new issue for this? It doesn't appear to be related to the initial reported issue so I think we should track this separately.

@juanrhenals
Copy link
Contributor

@jhovell I have one suggestion that may help. When the instance gets into this state, can you attempt do a docker pull for the image is failing to be pulled via the agent? If the pull succeeds, it points to a potential issue with the Agent which we need to investigate.

Let us know once you've performed that test.

Feel free to add these comments to the new issue you create.

Thanks!

@jhovell
Copy link

jhovell commented Jun 2, 2016

@juanrhenals I opened a separate issue & tried your suggestion. Details here:

#415

@bo1984
Copy link

bo1984 commented Nov 8, 2016

+1

@samuelkarp
Copy link
Contributor

An additional use-case to add to @aaithal's proposal:

  • once: The first time an image referenced, it is pulled. Additional times the image is referenced (while it has not been removed), no pull is performed. This covers customers who create a new tag for every build, but tags never change.

@perhallstroem
Copy link

Any movement on this? We build Docker images using tags that are unique per version, and so if a xxx:nnn is successfully pulled at one time, it will always be the same. We have a workload scenario where we often want to run tens of thousands of small tasks, and have noticed that the ECS agent is pretty slow in starting them; one of the things holding it up seems to be that it keeps checking if there are any updates to the image, which there is never. The suggested "once" setting would make sense for us.

@haikuoliu
Copy link
Contributor

An additional use-case to add to @aaithal's proposal:

  • prefer-cached: If there is cached image, no pull is performed, otherwise pull the image. This covers customers who pre-install the image and don't want to send too much requests to the registry.

@haikuoliu
Copy link
Contributor

We've released this feature with default, always, once and prefer-cached behavior. See details here. This issue will be closed.

@haikuoliu haikuoliu added this to the 1.18.0 milestone May 28, 2018
edibble21 pushed a commit to edibble21/amazon-ecs-agent that referenced this issue Jul 9, 2021
Merge branch 'feature/ecs-anywhere-ga' into dev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants