-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azure: sometimes turning hardware tx checksum is necessary #790
Comments
Here are tcpdumps of TX on and TX off: https://gist.github.com/philips/d680458e7cc2a36b16989a20df5ae0c2 |
This is a "normal" tcpdump from this machine to a cloud CDN for comparision: https://gist.github.com/philips/e3acde0cda317ec134900323cc5d924b#file-gistfile1-txt-L44-L55 |
We had to write a service to disable tx checksum offloading for Tectonic clusters on Azure [0]; it would be much nicer and more maintainable to do this in flannel configuration in a configmap than using systemd units. [0] coreos/tectonic-installer#1586 Below are some iperf benchmarks we ran to compare how different MTU and tx checksum offloading combinations would perform. We found that disabling TX checksum offloading and keeping the default MTU does not result in a major performance penalty. By only disabling tx checksum offloading we can take advantage of any future changes in Azure network MTU changes without having to modify existing deployments. This will be the most maintainable workaround going forward.
iperf.yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: iperf
namespace: default
labels:
app: iperf
spec:
replicas: 1
template:
metadata:
name: iperf
labels:
app: iperf
spec:
containers:
- name: iperf-server
image: networkstatic/iperf3
args:
- -s
ports:
- containerPort: 5201
protocol: TCP
nodeSelector:
node-role.kubernetes.io/node: ""
---
apiVersion: v1
kind: Service
metadata:
name: iperf
namespace: default
labels:
app: iperf
spec:
type: NodePort
selector:
app: iperf
ports:
- name: iperf
protocol: TCP
port: 5201
nodePort: 31211
---
apiVersion: batch/v1
kind: Job
metadata:
name: iperf
namespace: default
labels:
app: iperf
spec:
template:
metadata:
name: iperf
labels:
app: iperf
spec:
hostNetwork: true
containers:
- name: iperf
image: networkstatic/iperf3
args: ["-c", "green.westcentralus.cloudapp.azure.com", "-p", "31211", "-t", "30", "-V"]
restartPolicy: Never I got the following results: MTU 1350 tx on
MTU 1350 tx off:
MTU 1500 tx off:
MTU 2000 tx off:
Native host networking, MTU 1500 tx on:
|
@philips @squat Do you think this issue should stay open? Do you think this can or should be fixed in flannel? I could see a few options for resolving this
|
Oh, it's so much worse than that. It depends on the underlying HyperV OS; the networking driver for some versions of Windows support checksum offloading only for TCP (hence breaking vxlan), while newer ones support both TCP and UDP. Given that it's not really a flannel problem, this can probably just be closed. |
Expected Behavior
Connecting to HTTPs services on Kubernetes work on azure everytime.
Current Behavior
Something like this will hang on Azure sometimes.
Possible Solution
This fixes it:
Steps to Reproduce (for bugs)
Context
Some more potential explanation, though I don't totally understand it.
This could can do this too: https://github.com/weaveworks/weave/pull/2307/files#diff-4af03c67995550022d5fd9e1f8e72e8bR25
The text was updated successfully, but these errors were encountered: