-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always run DeleteVeth on cleanup #101
Conversation
Previously to this change we would skip running DeleteVeth on cleanup if cni plugins was not provided with an IPAM config to delete. The agent codebase, however, does not always pass in an IPAM config on purpose, when it is only cleaning up a single task, see https://github.com/aws/amazon-ecs-agent/blame/71a0f6765cc469913a0b09728196feb0d0d33e37/agent/engine/docker_task_engine.go#L1779 and https://github.com/aws/amazon-ecs-agent/blame/71a0f6765cc469913a0b09728196feb0d0d33e37/agent/engine/docker_task_engine.go#L1760 With this change, when we skip the IPAM delete, we no longer immediately return from the del function. Instead we let the code fall through to the engine.DeleteVeth call to make sure that veth interfaces are always cleaned up when a task is being cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall note -- it would be helpful to have more detail about the testing done which was specific to this change (if any).
detailLog("Running IPAM plugin DEL", args, conf, "") | ||
err = engine.RunIPAMPluginDel(conf.IPAM.Type, args.StdinData) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to log a warning here and continue on with veth deletion? I believe IPAM resource management is done separately from the actual veth management, and even if one of them fails we can still attempt the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point, I'm not sure under what conditions IPAM delete will fail, but I can change it to continue to the veth delete regardless
Pulling in cni plugins change for fixing veth cleanup behavior. See aws/amazon-ecs-cni-plugins#101 for more details.
Pulling in cni plugins change for fixing veth cleanup behavior. See aws/amazon-ecs-cni-plugins#101 for more details.
err = engine.RunIPAMPluginDel(conf.IPAM.Type, args.StdinData) | ||
if err != nil { | ||
return err | ||
detailLogInfo("IPAM configuration not found, skip DEL for IPAM", args, conf, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are leaking the task IP address when taking this path.
I'm not sure if this is the right fix. It probably doesn't cause additional harm, but the log level should be higher than Info. Leaking the IP address if a serious failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but I think in that case there might be two issues within the del
function. This PR is fixing one of the two (skipping VethDelete). As for the the other issue, it seems like we might need to synchronize on what makes sense for the agent to pass to the cni plugins.
That being said, it's still not clear to me how the ipam cni plugin fits into this, since it never logs anything.
Pulling in cni plugins change for fixing veth cleanup behavior. See aws/amazon-ecs-cni-plugins#101 for more details.
Summary
Previously to this change we would skip running DeleteVeth on cleanup if
cni plugins was not provided with an IPAM config to delete.
The agent codebase, however, does not always pass in an IPAM config on
purpose, when it is only cleaning up a single task, see
includeIPAMConfig
bool inhttps://github.com/aws/amazon-ecs-agent/blame/71a0f6765cc469913a0b09728196feb0d0d33e37/agent/engine/docker_task_engine.go#L1779
and https://github.com/aws/amazon-ecs-agent/blame/71a0f6765cc469913a0b09728196feb0d0d33e37/agent/engine/docker_task_engine.go#L1760
With this change, when we skip the IPAM delete, we no longer immediately
return from the del function. Instead we let the code fall through to
the engine.DeleteVeth call to make sure that veth interfaces are always
cleaned up when a task is being cleaned up.
This change should prevent "exchange full" type errors that may occur when awsvpc
tasks are not always cleaned up properly and the container instance reaches 1023
veth interfaces on the instance's
ecs-bridge
bridge (the bridge used by awsvpcnetwork mode tasks)
Testing
New tests cover the changes:
Functional test suite run, run manually on staging/testing ECS clusters.
Description for the changelog
Bugfix: Always run DeleteVeth on cleanup, fixes veth "exchange full" errors
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.