-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flannel and kube-proxy race for postrouting chain #20391
Comments
Users are hitting this issue. (referenced issue #22717) |
Other option could be along these lines: #22717 (comment) |
we're hitting this, too. Does anyone have a workaround? Tried I think we could get by setting |
Can flannel maybe ONLY install this rule if the policy for POSTROUTING is DROP ? @steveej what do you think? We need to coordinate a bit... |
@mwhooker If you are still experiencing the issue, try running If that isnt where your env file is stored, you can just use the ip/subnet on the flannel.1 interface as that should match. Just switch out the 10.2.0.0/16's in the original command. Alternatively just edit /var/run/flannel/subnet.env ion the previous command accordingly. RE the userspace proxy, it would work as a short term bandaid, but it wont scale as well, so id limit its use just to testing. |
@martynd yes, that's the comment. I looked at I can try again but it will have to wait until later this week. thanks for the thought, but still looking for alternate ideas. |
I've come across a similar issue to this one. |
@steveej ping? |
cc @tomdee |
I'll stick my head together with @tomdee to figure this out. |
There is no "official way", but there is a way that should just work. Run kube-up with NETWORK_PROVIDER=flannel (https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/config-default.sh#L134).
Someone needs to add that masq rule (flannel-io/flannel#318) |
I don't think it's right that flannel does an ACCEPT. A much better option would be a RETURN. As @thockin points out above that could be an issue if the default for the POSTROUTING table was DROP but as far as I understand it that would be an extremely strange thing to do. I'm going to put up a PR for flannel to change ACCEPT to RETURN and that should resolve this bug. |
Perhaps it could use ACCEPT/RETURN based on the default policy? Probably a better fit that way. |
any updates for this issue? |
@changleicn It looks like I fixed this back in May in flannel, so maybe this isn't an issue any more? |
Happening for me in a fresh "kubeadm init" installation. My workaround is finding the "RETURN" rule in "POSTROUTING" that "ignores" intra-pod traffic ans simply delete it. Works fine until k8s restarts... Could someone care to explain why this rule is needed in the first place? Is it just for optimization?
Versions: (system is arm64)
|
@tomdee I don't have a flannel install up right now to poke at, but maybe we can hash out a protocol for this between kube and flannel? Is flannel still installing rules into POSTROUTING unilaterally? |
@thockin sounds good. I'd need to go and check but yes, I think flannel is still installing POSTROUTING rules unilaterally. |
Should we plan a slack meeting or hangout or something maybe next week or
week after?
…On Thu, Dec 1, 2016 at 1:10 PM, Tom Denham ***@***.***> wrote:
@thockin <https://github.com/thockin> sounds good. I'd need to go and
check but yes, I think flannel is still installing POSTROUTING rules
unilaterally.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20391 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVBWmCQZwCeElO5cMceWL5lxAabCuks5rDzeogaJpZM4HQH9o>
.
|
Has there been any progress on this? I think we may also be hitting this, but not sure. All pod to pod interactions across nodes seem to be fine when using their service, but host to service works until the service gives back a pod running on the host. If that makes sense :) |
Bump this again. I am not able to use the service endpoints from inside a pod of the service.
Should I be able to access ServiceA from inside PodA? |
Same issue. Any update? |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Hey @phillydogg28, @abemusic and @jl-dos We are facing the same issue and were wondering if any of you had found a workaround or a resolution? Happening on GKE with no plugins on both 1.9.4-gke.1 and 1.7.12-gke.1. Node/Master OS is ContainerOS |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
A non related but related thing here is that after changing to a pure Calico stack (only Calico, without flannel) solved my race condition issue in a big environment. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @rikatz Do you have any guidelines on what you did for migration from Flannel to Calico? |
Just installed a k8s v1.24.7 cluster with flannel v0.20.1 and the issue still seems to exist:
DNS only works if the coreDNS pod runs on the same node like the client. Pod to pod communication via services is broken ... |
but that is a flannel problem, isn't it? it should be reported on its repo |
@aojea you are right it is specific to flannel not to k8s - please forget about my post. |
no worries, sometimes is the other way around |
More recent versions of flannel when started with the --ip-masq flag, force a jump to the FLANNEL chain where there's an 'ACCEPT all traffic from node subnet' rule. i.e something like:
From: https://github.com/coreos/flannel/blob/master/network/ipmasq.go#L32
Since ACCEPT is a built in target like DROP, it'll stop processing any of the kube service rules. This manifests as a bug that looks like a misconfigured hairpin mode, i.e, if a pod gets loadbalanced to itself when accessing the service dns name, packets get dropped because of a martian source.
@kubernetes/goog-cluster how do we co-ordinate? is it safe to always have kube-proxy prepend?
The text was updated successfully, but these errors were encountered: