Skip to content

Ignore pods with status.phase=Succeeded when watching IP addresses#5412

Merged
kleimkuhler merged 1 commit intolinkerd:mainfrom
fpetkovski:ld-5394
Jan 12, 2021
Merged

Ignore pods with status.phase=Succeeded when watching IP addresses#5412
kleimkuhler merged 1 commit intolinkerd:mainfrom
fpetkovski:ld-5394

Conversation

@fpetkovski
Copy link
Contributor

Ignore pods with status.phase=Succeeded when watching IP addresses

When a pod terminates successfully, some CNIs will assign its IP address
to newly created pods. This can lead to duplicate pod IPs in the same
Kubernetes cluster.

Filter out pods which are in a Succeeded phase since they are not
routable anymore.

Fixes #5394

When a pod terminates successfully, some CNIs will assign its IP address
to newly created pods. This can lead to duplicate pod IPs in the same
Kubernetes cluster.

Filter out pods which are in a Succeeded phase since they are not
routable anymore.

Fixes linkerd#5394

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>

// Ignore terminated pods,
// their IPs can be reused for new Running pods
if podTerminated(pod) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure if this change is need. Looking forward to any feedback

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to omit terminated pods any time we are using the pod ip index. So yes, I think this makes sense.

@adleong
Copy link
Member

adleong commented Dec 21, 2020

This looks great to me. @kforsthoevel are you able to test this branch and confirm if it fixes #5394 for you?

@kforsthoevel
Copy link

Thanks. We will try it out after the holidays.

@thomaspeitz
Copy link

@adleong - Is there any chance that you improve your build process by pushing the branch images? - Or are they already pushed and I do not see them. - The ones used during building seem to be tagged correctly but not pushed.

That would help us a lot to test it. Else we would build / push those images ourselves.

Greetings :)

@kleimkuhler
Copy link
Contributor

@thomaspeitz It's not exactly what you're asking for, but we do have something like this available if you clone the repo.

There is a script bin/install-pr that will install the images uploaded during the CI process and load them into a local cluster. We use this for testing PRs which sounds like what you're trying to do here.

For this PR, I can test in a k3d cluster with the following:

❯ env GITHUB_TOKEN='..' bin/install-pr --k3d 5412
..
INFO[0004] DONE                                         
/home/kevin/Projects/linkerd/linkerd2

Linkerd CLI available:
/home/kevin/Projects/linkerd/linkerd2/target/release/linkerd2-cli-git-65f0d802-linux-amd64

Then you can install it and test:

./target/release/linkerd2-cli-git-65f0d802-linux-amd64 install |kubectl apply -f -

The value of GITHUB_TOKEN is your GitHub personal access token. If you don't pass --k3d or --kind, it assumes a regular k8s cluster.

@fpetkovski
Copy link
Contributor Author

fpetkovski commented Jan 5, 2021

The way I tested the change was by first running the bin/docker-build script which creates a unique tag based on the currently checked out commit sha. I then retagged the destination image and pushed it to a public dockerhub repo, from where I could get it pulled into a remote, non-production Kubernetes cluster.

@thomaspeitz
Copy link

Thanks for your input!

We will continue tomorrow and give you feedback as soon as possible.

@thomaspeitz
Copy link

We upgraded to newest edge version, replaced the destination container with the updated controller image and now monitor closely logs in case any problem happen. We get back to you beginning next week with test results (or earlier in cases of problems). Thanks for your work! :)

@kforsthoevel
Copy link

@adleong, @fpetkovski The patch is running for almost a week now and we haven't see the error anymore. It seems to have fixed the issue.

Thank you for the quick fix.

Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @fpetkovski

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks @kforsthoevel for confirming!

@kleimkuhler kleimkuhler merged commit 40192e2 into linkerd:main Jan 12, 2021
@kivagant-ba
Copy link

What version includes the fix?

@kleimkuhler
Copy link
Contributor

This only recently merged so it is not part of a release yet. We do edge releases at the end of each week, so edge-20.1.2 will include it, and that should be available by Thursday or Friday.

@kivagant-ba
Copy link

kivagant-ba commented Jan 12, 2021

Another question: can the fix be included as a hotfix into last couple of stable versions without having everyone to upgrade to a release with all of the new functionality? It is a quite critical bug that causes issues when pods cannot even start because liveness probes cannot pass. And this probably causes hidden issues when traffic cannot be balanced properly (I guess this is what is happening when we see the WARNs in logs).

@kleimkuhler
Copy link
Contributor

@kivagant-ba Sure, we'll release a stable-2.9.2 with this fix. I'll leave a comment here when it has been released; I'm going to start prepping it today.

kleimkuhler pushed a commit that referenced this pull request Jan 13, 2021
…5412)

Ignore pods with status.phase=Succeeded when watching IP addresses

When a pod terminates successfully, some CNIs will assign its IP address
to newly created pods. This can lead to duplicate pod IPs in the same
Kubernetes cluster.

Filter out pods which are in a Succeeded phase since they are not 
routable anymore.

Fixes #5394

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
kleimkuhler pushed a commit that referenced this pull request Jan 13, 2021
…5412)

Ignore pods with status.phase=Succeeded when watching IP addresses

When a pod terminates successfully, some CNIs will assign its IP address
to newly created pods. This can lead to duplicate pod IPs in the same
Kubernetes cluster.

Filter out pods which are in a Succeeded phase since they are not 
routable anymore.

Fixes #5394

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
kleimkuhler pushed a commit that referenced this pull request Jan 13, 2021
…5412)

Ignore pods with status.phase=Succeeded when watching IP addresses

When a pod terminates successfully, some CNIs will assign its IP address
to newly created pods. This can lead to duplicate pod IPs in the same
Kubernetes cluster.

Filter out pods which are in a Succeeded phase since they are not
routable anymore.

Fixes #5394

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@kleimkuhler
Copy link
Contributor

stable-2.9.2 has been released with this fix.

@kivagant-ba
Copy link

Thank you, @kleimkuhler !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Traffic stops when two pods have the same IP address

6 participants