Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awsvpc task / agent 1.29.1 / eni - ecs failed to start task no errors in console #2193

Closed
FLavalliere opened this issue Sep 6, 2019 · 12 comments
Labels

Comments

@FLavalliere
Copy link

I had commented in ecs-cni but should of been here maybe ?

aws/amazon-ecs-cni-plugins#93

Observed Behavior

ECS Task failed to start without error wording ? (awsvpctrunk wasnt enabled on the vm)

ecs agent version 1.29.1

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xbb32fd]

goroutine 209 [running]:
github.com/aws/amazon-ecs-agent/agent/api/eni.(*ENIAttachment).StopAckTimer(0xc000400ab0)
        /go/src/github.com/aws/amazon-ecs-agent/agent/api/eni/eniattachment.go:99 +0x6d
github.com/aws/amazon-ecs-agent/agent/eventhandler.setTaskAttachmentSent(0xc0005184b0)
        /go/src/github.com/aws/amazon-ecs-agent/agent/eventhandler/task_handler_types.go:217 +0x50
github.com/aws/amazon-ecs-agent/agent/eventhandler.(*sendableEvent).send(0xc0005184b0, 0x1246f80, 0x1246f90, 0x120f935, 0xf, 0x14156a0, 0xc0003e25a0, 0xc00051d4d0, 0x7fdf6b97c0a0, 0xc0003020e0, ...)
        /go/src/github.com/aws/amazon-ecs-agent/agent/eventhandler/task_handler_types.go:161 +0x30e
github.com/aws/amazon-ecs-agent/agent/eventhandler.(*taskSendableEvents).submitFirstEvent(0xc0005b3880, 0xc000299050, 0x1402860, 0xc000256bd0, 0x0, 0x0, 0x0)
        /go/src/github.com/aws/amazon-ecs-agent/agent/eventhandler/task_handler.go:368 +0x58f
github.com/aws/amazon-ecs-agent/agent/eventhandler.(*TaskHandler).submitTaskEvents.func1(0x0, 0x0)
        /go/src/github.com/aws/amazon-ecs-agent/agent/eventhandler/task_handler.go:295 +0x10f
github.com/aws/amazon-ecs-agent/agent/utils/retry.RetryWithBackoffCtx(0x140e8a0, 0xc000044018, 0x1402860, 0xc000256bd0, 0xc00051d530, 0x1248e38, 0x0)
        /go/src/github.com/aws/amazon-ecs-agent/agent/utils/retry/retry.go:46 +0x85
github.com/aws/amazon-ecs-agent/agent/utils/retry.RetryWithBackoff(...)
        /go/src/github.com/aws/amazon-ecs-agent/agent/utils/retry/retry.go:29
github.com/aws/amazon-ecs-agent/agent/eventhandler.(*TaskHandler).submitTaskEvents(0xc000299050, 0xc0005b3880, 0x14156a0, 0xc0003e25a0, 0xc0003c1b60, 0x57)
        /go/src/github.com/aws/amazon-ecs-agent/agent/eventhandler/task_handler.go:287 +0x147
created by github.com/aws/amazon-ecs-agent/agent/eventhandler.(*taskSendableEvents).sendChange
        /go/src/github.com/aws/amazon-ecs-agent/agent/eventhandler/task_handler.go:326 +0x24f
@FLavalliere FLavalliere changed the title awsvcp 1.29.1 - eni? awsvcp 1.29.1 - eni - ecs failed to start task no errors in console Sep 6, 2019
@FLavalliere FLavalliere changed the title awsvcp 1.29.1 - eni - ecs failed to start task no errors in console awsvpc task / agent 1.29.1 / eni - ecs failed to start task no errors in console Sep 6, 2019
@fenxiong
Copy link
Contributor

fenxiong commented Sep 7, 2019

Hi,
Sorry to hear that you are having the issue. Can you collect the logs on the instance with ecs log collector and send the logs to fenxiong AT amazon.com? Thanks.

Also, from the trace you gave, it looks like the error might be caused by the agent not initializing a field upon restart. Did you happen to restart the instance or the agent before getting this failure?

@FLavalliere
Copy link
Author

I did restart the agent before receiving that error but the ECS Task wasn't working so i googled up found this thread. I basically changed the debug level to debug and upon task registration that was occuring.

Note: I upgraded my VM's to ECS Agent 1.30 and the error doesn't occurs.
I guess this be closed as 1.30 seems to have resolved this issue...

@fenxiong
Copy link
Contributor

fenxiong commented Sep 9, 2019

I think we do have a bug, even in 1.30, though it does not affect normal use case, and only affects an edge case where you restart the agent right after launching an awsvpc task (in that case the agent might crash like you saw before). We will work on fixing the edge case.

@FLavalliere
Copy link
Author

FLavalliere commented Sep 23, 2019

Hello, here are an interesting behavior using 1.30.0

Our setup kickstart an ec2 with ecs-agent and everything functional.
When we try to submit an awsvpc enabled container the task simply never start with "Stopped"

Somehow, if we go on the host and do restart the ecs container using
initctl stop ecs; initctl start ecs;

The ecs agent is now able to provision the ecs tasks?

I believe this might only happen in "Awsvpc trunk enabled" vm.

On an non-awsvpc trunk enabled VM we didnt had issue.

Also using awsvpc trunk we have some kind of weird packet drop behaviors (but might be due to our custom iptables rules... connction get accepted but then hang or packet get dropped.... strange...)

@fenxiong
Copy link
Contributor

Our setup kickstart an ec2 with ecs-agent and everything functional.
When we try to submit an awsvpc enabled container the task simply never start with "Stopped"

Can you collect the logs on the instance (e.g. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-logs-collector.html) and send them to fenxiong AT amazon.com so that I can take a look? Thanks.

Also using awsvpc trunk we have some kind of weird packet drop behaviors (but might be due to our custom iptables rules... connction get accepted but then hang or packet get dropped.... strange...)

Does that only happen when you turn on awsvpc trunking? Do you have an easy way to reproduce the issue?

@FLavalliere
Copy link
Author

So, i've tested the 1.32 patch. It seems there are still errors after a while... not sure why.

Few days ago i did reprovision some instances to get to the 1.32.0 (using ECS AMI V1 not the ECS Optimized AMI 2 Linux )

I was not able to place an awsvpc task, so I stopped the ecs-agent + restarted it ( including debug log on )

I had to kill the task / get it created. Somehow it couldnt re-provision the task on that node...
I forced few task placement to it try to do it on that node in orfer to get some kind of debug informations.

Here is the log:

error_awsvpc.log

Hopefully that would help some investigations...

@fenxiong
Copy link
Contributor

fenxiong commented Oct 2, 2019

The logs show that the agent is not able to launch the task because it's not able to find the pause container image amazon/amazon-ecs-pause:0.1.0 on the instance, which is required to launch an awsvpc task:

2019-09-30T18:38:10Z [WARN] Managed task [arn:aws:ecs:us-east-1:XXXXXXXXX:task/Test-Cluster/f44057cc15a7492c8fd2c011fb8c6908]: error creating container [~internal~ecs~pause]; marking its desired status as STOPPED: Error: No such image: amazon/amazon-ecs-pause:0.1.0

I am not certain how you ended up in that situation (i.e. able to place a task on an instance but no pause container image on it). This image should be loaded into Docker by the agent when the agent starts. Did you by any chance manually remove the amazon/amazon-ecs-pause:0.1.0 image from Docker on the instance? Can you manually check whether amazon/amazon-ecs-pause:0.1.0 image is on that instance by running docker images? If the image indeed does not exist, can you try restarting the agent on the instance again to see if the image shows up in Docker?

@FLavalliere
Copy link
Author

Hmmmm, good catch i looked quite a bit too fast in the log file...

We have a cronjob that delete unused images... (i know ecs does that too but we have other running process that may generate images outside of ECS).

Does the docker image pull of amazon/amazon-ecs-pause:0.1.0 only happen during the ecs-agent initialization?

@FLavalliere
Copy link
Author

Ok yes from what I looked the pause image is only added / extracted at ecs initialization...

https://github.com/aws/amazon-ecs-agent/blob/a7f810405c7a56085e7c68bb3f1f55b165ed29a4/agent/app/agent.go#L263

@fenxiong
Copy link
Contributor

fenxiong commented Oct 2, 2019

Ok yes from what I looked the pause image is only added / extracted at ecs initialization...

Yes that's right. The agent doesn't pull this image from anywhere. Instead it loads this image into Docker (from a tar file bundled within the agent image itself) when it starts, and assumes that this image exists when running awsvpc task. So you might want to modify your cronjob to skip deleting this image in this case.

@FLavalliere
Copy link
Author

Indeed, i'll add exception for any amazon/ containers 👍

I guess we can close this bug.

@fenxiong
Copy link
Contributor

fenxiong commented Oct 2, 2019

Sounds good 👍closing this

@fenxiong fenxiong closed this as completed Oct 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants