-
Notifications
You must be signed in to change notification settings - Fork 533
Proposal to allow mtu changes #926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,285 @@ | ||||||||||||||
| --- | ||||||||||||||
| title: allow-mtu-changes | ||||||||||||||
| authors: | ||||||||||||||
| - "@juanluisvaladas" | ||||||||||||||
| - "@jcaamano" | ||||||||||||||
| reviewers: | ||||||||||||||
| - "@danwinship" | ||||||||||||||
| - "@dcbw" | ||||||||||||||
| - "@knobunc" | ||||||||||||||
| - "@msherif1234" | ||||||||||||||
| approvers: | ||||||||||||||
| - TBD | ||||||||||||||
| creation-date: 2021-10-07 | ||||||||||||||
| last-updated: 2021-10-14 | ||||||||||||||
| status: provisional | ||||||||||||||
| --- | ||||||||||||||
|
|
||||||||||||||
| # Allow MTU changes | ||||||||||||||
|
|
||||||||||||||
| This covers adding the capability to the cluster network operator of changing | ||||||||||||||
| the MTU post installation. | ||||||||||||||
|
|
||||||||||||||
| ## Release Signoff Checklist | ||||||||||||||
|
|
||||||||||||||
| - [ ] Enhancement is `implementable` | ||||||||||||||
| - [ ] Design details are appropriately documented from clear requirements | ||||||||||||||
| - [ ] Test plan is defined | ||||||||||||||
| - [ ] Operational readiness criteria is defined | ||||||||||||||
| - [ ] Graduation criteria for dev preview, tech preview, GA | ||||||||||||||
| - [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||||||||||||||
|
|
||||||||||||||
| ## Summary | ||||||||||||||
|
|
||||||||||||||
| Customers may need to change the MTU post-installation. However these changes | ||||||||||||||
| aren't trivial and may cause downtime, hence the CNO currently forbids them. | ||||||||||||||
|
|
||||||||||||||
| We propose a procedure that will be launched on demand. This procedure will | ||||||||||||||
| run pods on every node of the cluster and make the necessary changes in an | ||||||||||||||
| ordered and coordinated manner with a service disruption within the least | ||||||||||||||
| possible time, which if under a reasonable time of 10 minutes, should be well | ||||||||||||||
| under the typical TCP timeout interval. | ||||||||||||||
|
|
||||||||||||||
| ## Motivation | ||||||||||||||
|
|
||||||||||||||
| While cluster administrators usually set the MTU correctly during the | ||||||||||||||
| installation, sometimes they need to change it afterwards for reasons such as | ||||||||||||||
| changes in the underlay or because they were set incorrectly at install time. | ||||||||||||||
|
|
||||||||||||||
| ### Goals | ||||||||||||||
|
|
||||||||||||||
| * Allow to change MTU post install on OVN Kubernetes. | ||||||||||||||
|
|
||||||||||||||
| ### Non goals | ||||||||||||||
|
|
||||||||||||||
| * Change the MTU without service disruption. | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
|
||||||||||||||
| * Other safe or unsafe configuration changes. | ||||||||||||||
|
|
||||||||||||||
| ## Proposal | ||||||||||||||
|
|
||||||||||||||
| The CNO monitors changes on the operator configuration. When it detects a MTU | ||||||||||||||
| change: | ||||||||||||||
| 1. Set the `clusteroperator/network` conditions: | ||||||||||||||
| - Progressing: true | ||||||||||||||
| - Upgradeable: false | ||||||||||||||
| 2. Check that the MTU value is valid, within threoretically min/max values. | ||||||||||||||
| 3. Check that all the nodes are on Ready state. | ||||||||||||||
| 4. Deploy pods on every node with `restartPolicy: Never` which are responsible | ||||||||||||||
| for validating the preconditions. If the preconditions are met the pod will | ||||||||||||||
| exit with code 0. Some of the preconditions that we will check are: | ||||||||||||||
| - The underlay network supports the intended MTU value. | ||||||||||||||
| 5. Once all the previous pods finish successfully, deploy other set of pods with | ||||||||||||||
| `restartPolicy: Never` on every node that will handle the actual change of | ||||||||||||||
| the MTU (explained in more detail below). Wait for them to be ready and | ||||||||||||||
| running. | ||||||||||||||
| 6. Ensure that the configmap ovnkube-config is synchronized with the new MTU | ||||||||||||||
| value. | ||||||||||||||
| 7. If any previous steps (1-6) was unsuccesful, the CNO will set the | ||||||||||||||
| `clusteroperator/network` conditions to: | ||||||||||||||
| - Progressing: false | ||||||||||||||
| - Degraded: true | ||||||||||||||
| Update the operator configuration status with a description of the problem. | ||||||||||||||
| At this point the process is interrupted and we require manual intervention. | ||||||||||||||
| 8. Force a rollout of the ovnkube-node daemonset. This will ensure | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this means the ovnkube upgrade is totally out of sync with the pod-level upgrades, and every node needs to wait for every other node to finish its pod-level upgrades before any of them can do the ovnkube-level upgrade. A better approach might be: instead of having CNO force a re-rollout of the DaemonSet, just have the step 5 pod kill the local ovnkube-node process, forcing it to be restarted. And then it could even choose to do that step before or after the pod-level fixes, depending on which direction the MTU is changing in...
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The thing is that if we restart ovn-kube after the step 5 pod changes the MTUs, there is a time in between where new pods my allocate with the old MTU. That's why we restart ovn-kube first, we'll do the roll-out with max unavailability so that is quick and then we let the step 5 pod proceed with the MTU changes. Yes, it is out of sync, but hopefully quick enough. |
||||||||||||||
| ovn-kubernetes uses the new MTU value for new pods as well as set the new MTU | ||||||||||||||
| on managed node interfaces like ovn-k8s-mp0, ovn-k8s-gw0 (local gateway mode) | ||||||||||||||
| and related routes. | ||||||||||||||
| 9. Set the new MTU value to the applied-cluster config map AND wait for pods of | ||||||||||||||
| step 3 to complete successfully. | ||||||||||||||
|
Comment on lines
+88
to
+89
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. step 5 |
||||||||||||||
| 10. If any of the previous steps (8,9) failed, reboot the node, wait for the | ||||||||||||||
| kubelet to be reporting as Ready again. | ||||||||||||||
|
Comment on lines
+90
to
+91
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If any of the previous steps (8,9) failed on any nodes, drain and reboot the failed nodes, one at a time, and wait for each one to be reporting as Ready again. |
||||||||||||||
| If this step fails, set conditions to: | ||||||||||||||
| - Progressing: false | ||||||||||||||
| - Degraded: true | ||||||||||||||
| Update the operator configuration status with a description of the problem. | ||||||||||||||
| 11. Upon completion, set conditions to: | ||||||||||||||
| - Progressing: false | ||||||||||||||
| - Degraded: false | ||||||||||||||
|
Comment on lines
+96
to
+98
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||||||||||
|
|
||||||||||||||
| The steps to change the MTU performed by pods of previous step 3 are: | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this pod detect if there are MTU problems after migration is complete, and post an event or something to indicate if it was successful or not?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean with detect? If openshift has a verification procedure to health check deployments then we probably can suggest in the documentation to run it after this procedure.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean that in your step 5: So you are going to deploy another set of pods that do the configuration. I'm wondering if these pods will remain for some time after configuration and if they can run a healthcheck until all of the nodes are finished updating MTU. The check could be pinging from this pod to other "configuration pods" on other nodes with max MTU. If it doesn't come up after some time, then maybe an event can be posted or something to indicate to the user that MTU change failed.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Usually when you do a new deployment you ran some verification to check that the deployment has been done correctly and that cluster is healthy. If this exists for openshift we can suggest in documentation to run it again. Otherwise, it would probably be better to have a different set of pods with aliveness probe or the like rather than adding to these specific pods.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we have the network check target pods, but I was thinking something specifically scoped to the MTU change to give the user a signal that the MTU update worked as part of the MTU update process itself. Like the pods that you launch for doing the MTU upgrade exit successfully and log some message like MTU upgrade complete, or if they check network connectivity and something is now broken, they either crash or post an event to their pod saying MTU upgrade problem. If you think it's not necessary then that's fine to ignore.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would probably then use the network check target pods and enhance that for any specific MTU verification we think we need to do. Do you know where I can check them out? These MTU change pods only change the MTU of pods, which is an operation for which we should know definitively if it succeeded or not, and is only one step of a 3 step process which also includes changing the host sdn interfaces MTU and the host external interfaces MTU, so I feel that a final verification of the MTU in these pods could be out of place.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||
| 1. So that we don't have nodes doing things at different times and we have | ||||||||||||||
| everything synchronized, the pods will wait until the MTU value on the | ||||||||||||||
| applied-cluster configmap changes. | ||||||||||||||
| 2. Enter every network namespace. If an interface `eth0` exists in that | ||||||||||||||
| namespace with an ip address within the pod subnet, change the MTU of the | ||||||||||||||
| the veth pair. | ||||||||||||||
|
Comment on lines
+104
to
+106
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
| 3. If any of these steps failed (1-3), the pod will exit with code 1, if all | ||||||||||||||
| were successful it will exit with code 0. | ||||||||||||||
|
Comment on lines
+107
to
+108
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
|
||||||||||||||
| An administrator should be able to deploy a machine-config object to change | ||||||||||||||
| the node MTU as well. If increasing the MTU, it will do so at the beginning | ||||||||||||||
| of the procedure. If decreasing the MTU, it will do so at the end of the | ||||||||||||||
| procedure. | ||||||||||||||
|
|
||||||||||||||
| ### User Stories | ||||||||||||||
jcaamano marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
|
|
||||||||||||||
| #### As an administrator, I want to change the node MTU | ||||||||||||||
|
|
||||||||||||||
| An administrator should be able to deploy a machine-object config object | ||||||||||||||
| that configures the node MTU permanently. Ideally this would be achieved | ||||||||||||||
| through the ability to run configure-ovs with an MTU parameter. | ||||||||||||||
| configure-ovs should change the MTU of br-ex and ovs-if-phys0 with the | ||||||||||||||
| least impact on the existing configuration to avoid any unnecessary | ||||||||||||||
| disruption. This change should persist across reboots. | ||||||||||||||
|
|
||||||||||||||
| #### As an administrator, I want to change the cluster network MTU | ||||||||||||||
|
|
||||||||||||||
| An administrator should be able to change the cluster network MTU through | ||||||||||||||
| CNO configuration change. This would encompass the following tasks: | ||||||||||||||
|
|
||||||||||||||
| ##### Implement a pod that changes the actual MTU on running pods | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So again, the parts talking about implementation details don't belong in "User Stories". And they're redundant with what you've already said, so you can just remove them.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will move them to a specific section. The thing is that there is no way to map (user) stories here to (non-user) stories in Jira. |
||||||||||||||
|
|
||||||||||||||
| Implement a pod that changes the actual MTU for both ends of the veth | ||||||||||||||
| pair for pods hosted in the node where the pod runs as described in the | ||||||||||||||
| proposal, and in the least possible time. | ||||||||||||||
|
|
||||||||||||||
| ##### Add support in ovnkube-node to reset MTU on start | ||||||||||||||
|
|
||||||||||||||
| Make sure that upon restart, ovnkube-node resets the MTU on all the relevant | ||||||||||||||
| interfaces, like ovn-k8s-mp0, ovn-k8s-gw0, br-int as well as related routes | ||||||||||||||
| that currently have a MTU set. | ||||||||||||||
|
|
||||||||||||||
| ##### Add support in CNO for MTU change coordination | ||||||||||||||
|
|
||||||||||||||
| Add support in CNO to allow and coordinate the MTU change for OVN-Kubernetes | ||||||||||||||
| as described in the proposal. | ||||||||||||||
|
|
||||||||||||||
| ### Implementation Details/Notes/Constraints | ||||||||||||||
|
|
||||||||||||||
| ## Design Details | ||||||||||||||
|
|
||||||||||||||
| ### Open Questions | ||||||||||||||
|
|
||||||||||||||
| * If changing the MTU on a node fails, do we have guarantee that we can still | ||||||||||||||
| reboot the node? | ||||||||||||||
|
|
||||||||||||||
| ### Test Plan | ||||||||||||||
| We will create the following tests: | ||||||||||||||
| 1. An HTTPS server with a very large certificate, and multiple clients | ||||||||||||||
| in different nodes doing a single HTTPS request. The acceptance criteria | ||||||||||||||
| is TLS negotiation suceeds and HTTPS request returns 200 after every MTU | ||||||||||||||
| change. | ||||||||||||||
|
|
||||||||||||||
| Packet loss, TCP retransmissions, increased latency, and reduced bandwidth and | ||||||||||||||
| connectivity loss considered acceptable while the change is happening. | ||||||||||||||
|
|
||||||||||||||
| While previous test is running, we will decrease the MTU, and | ||||||||||||||
| once it's finished we'll increase it to it's previous value. | ||||||||||||||
|
|
||||||||||||||
| This test will be two new jobs in CI, one for IPv4 and another for IPv6, that | ||||||||||||||
| will be launched on demand. | ||||||||||||||
|
|
||||||||||||||
| ### Risks and Mitigations | ||||||||||||||
|
|
||||||||||||||
| * If unexpected problems ocurr this procedure, the mitigation is an automated | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
| node reboot. The worst possible outcome is a full unplanned reboot | ||||||||||||||
| of the cluster. Documentation should advertise of these possible | ||||||||||||||
| consequences. An alternate implementation with planned reboots is described | ||||||||||||||
| in the Alternatives section. | ||||||||||||||
| * Even though the procedure takes place under the absolute TCP timeout interval, | ||||||||||||||
| applications might have their own timeout implementation. Service disruption | ||||||||||||||
| and how applications handle it is a risk that might need to be considered on | ||||||||||||||
| per application basis but that can not be reasonably scoped in this | ||||||||||||||
| enhancement. | ||||||||||||||
| * During the procedure, different MTUs will be used throughout the cluster. Next | ||||||||||||||
| section analyzes the consequences in detail. | ||||||||||||||
|
|
||||||||||||||
| #### Running the cluster with different MTUs | ||||||||||||||
|
|
||||||||||||||
| On the process of a `live` change of the MTU, there is going to be traffic | ||||||||||||||
| endpoints temporarily using different MTU values. In general, if the path MTU | ||||||||||||||
| to an endpoint is known, fragmentation will occur or the application will be | ||||||||||||||
| informed that it is trying to send larger packets than possible so that it can | ||||||||||||||
| adjust. Additionally, connection oriented protocols, such as TCP, usually | ||||||||||||||
| negotiate their segment size based on the lower MTU of the endpoints on | ||||||||||||||
| connection. | ||||||||||||||
|
|
||||||||||||||
| So generally, different MTUs on endpoints affect ongoing connection-oriented | ||||||||||||||
| traffic or connection-less traffic, when the known destination MTU is not the | ||||||||||||||
| actual destination MTU. In this case, the most likely scenario is that traffic | ||||||||||||||
| is dropped on the receiving end by OVS if larger than the destination MTU. | ||||||||||||||
|
|
||||||||||||||
| There are circumstances that prevent an endpoint from being aware of the actual | ||||||||||||||
| MTU to a destination, which depends on Path MTU discovery and specific ICMP | ||||||||||||||
| `FRAG_NEEDED` messages: | ||||||||||||||
|
Comment on lines
+203
to
+205
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. which in general seem to not work over OVS |
||||||||||||||
| * A firewall is blocking these ICMP messages or the ICMP messages are not being | ||||||||||||||
| relayed to the endpoint. | ||||||||||||||
| * There is no router between the endpoints generating these ICMP messages. | ||||||||||||||
|
|
||||||||||||||
| Let's look at different scenarios. | ||||||||||||||
|
|
||||||||||||||
| ##### Node to node | ||||||||||||||
|
|
||||||||||||||
| On the receiving end, a nic driver might size the buffers in relation to the | ||||||||||||||
| configured MTU and drop larger packets before they are handed off to the system. | ||||||||||||||
|
|
||||||||||||||
| Past that, OVN-K sets up flows in OVS br-ex for packets that are larger than pod | ||||||||||||||
| MTU and sends them off to the network stack to generate ICMP `FRAG_NEEDED` | ||||||||||||||
| messages. If these packets exceed the MTU of br-ex, they will be dropped by OVS | ||||||||||||||
| and never reach the network stack. Otherwise they will reach the network stack | ||||||||||||||
| but not generate ICMP `FRAG_NEEDED` messages as network stack only does so for | ||||||||||||||
| traffic being forwarded and not for traffic with that node as final destination. | ||||||||||||||
|
|
||||||||||||||
| As there is generally no router in between two cluster nodes, more than likely a | ||||||||||||||
| node would not be aware of the path MTU to another node. | ||||||||||||||
|
|
||||||||||||||
| ##### Node to pod | ||||||||||||||
|
|
||||||||||||||
| As explained before, network stack receives larger packets betwewen host MTU and | ||||||||||||||
| pod MTU and might cause ICMP `FRAG_NEEDED` messages to be sent to the originating | ||||||||||||||
| node such that a node might be aware of the proper path MTU when reaching out to | ||||||||||||||
| pod. Otherwise larger than pod MTU traffic will dropped by OVS. | ||||||||||||||
|
|
||||||||||||||
| ##### Pod to Node | ||||||||||||||
|
|
||||||||||||||
| On this datapath, OVS at the destination node will drop the larger packets without | ||||||||||||||
| generating ICMP `FRAG_NEEDED` messages as the node is the final destination of the | ||||||||||||||
| traffic. The originating pod is never aware of the actual path MTU. | ||||||||||||||
|
|
||||||||||||||
| ##### Pod to Pod | ||||||||||||||
|
|
||||||||||||||
| This traffic is encapsulated with geneve. The geneve driver might drop it and | ||||||||||||||
| generate ICMP `FRAG_NEEDED` messages back to the originating pod if it is trying | ||||||||||||||
| to send packets that would not fit in the originating node MTU once encapsulated. | ||||||||||||||
| But OVN is not prepared to relay back these ICMP messages to the originating pod | ||||||||||||||
| so it would not be aware of an appropiate MTU to use. | ||||||||||||||
|
|
||||||||||||||
| On the receiving end, OVS would drop the packet silently if larger than the | ||||||||||||||
| destination MTU of the veth interface. Even if this would not be the case, the | ||||||||||||||
| veth driver itself would drop the packet silently if over the MTU of the pod's | ||||||||||||||
| end veth interface. | ||||||||||||||
|
|
||||||||||||||
jcaamano marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
| ## Alternatives | ||||||||||||||
|
|
||||||||||||||
| ### New ovn-k setting: `routable-mtu` | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this sounds better than the proposed solution... why aren't we doing it this way?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A double rolling reboot seemed unacceptable. But I don't know what is the latest stance on it. Perhaps @vpickard can comment on this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I was concerned that 2 reboots would not be acceptable from a customer perspective. @mcurry-rh What are your thoughts on having to perform 2 reboots to change the mtu?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replicating the feedback we got form @mcurry-rh
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, if the cluster is actually "broken" because of the bad MTU, then having to do two reboots isn't that bad since you're probably not running anything useful anyway. And if it's not broken, then the MTU change probably isn't urgent, and the procedure doesn't actually require that the two reboots happen back-to-back; they could happen 24 hours apart or something. (Right? The cluster is stable/consistent in the inter-reboot phase?) So we could even just make it so that the CNO doesn't initiate any rolling reboots itself, it just does:
So then the admin could schedule two sets of rolling reboots on consecutive nights, or even just make the config change and then forget about it, and the first change would complete the next time they did a z-stream update and the second change would complete after the next update after that.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "So, if the cluster is actually "broken" because of the bad MTU" -- That is not always a safe assumption. One case we had was where a customer had a large, running cluster and wanted to add new nodes. But the new nodes were on OpenShift and they needed to drop the MTU to allow for the VxLAN header in the OSP networking. I assume most cases will be like that, otherwise they could just reinstall...
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Hence the "if" |
||||||||||||||
|
|
||||||||||||||
| OVN-Kube Node, upon start, sets that `routable-mtu` on all the host routes and | ||||||||||||||
| on all created pods routes. This will make all node-wide traffic effectively | ||||||||||||||
| use that MTU value even though the interfaces might be configured with a higher | ||||||||||||||
| MTU. Then, with a double rolling reboot procedure, it should be possible to | ||||||||||||||
| change the MTU with no service disruption. | ||||||||||||||
|
|
||||||||||||||
| Decrease example: | ||||||||||||||
| * Set in ovn-config a `routable-mtu` setting lower than the `mtu` setting. | ||||||||||||||
| * Do rolling reboot, as nodes restart they will effectively use lower MTU, but | ||||||||||||||
| since the actual interfaces MTU did not change they will not drop traffic | ||||||||||||||
| coming from other nodes. | ||||||||||||||
| * Set in ovn-config a `mtu` equal to `routable-mtu` or replace `mtu` with the | ||||||||||||||
| `routing-mtu` value and remove the latter. | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what routing-mtu is here?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That should be |
||||||||||||||
| * Do rolling reboot, as nodes restart they will do so with interfaces configured | ||||||||||||||
| the expected MTU. As other nodes are effectively using this MTU setting, no | ||||||||||||||
| traffic drop is expected. | ||||||||||||||
|
|
||||||||||||||
| Increase example: | ||||||||||||||
| * Set in ovn-config the actual `mtu` as `routable-mtu` and a new `mtu` setting | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is confusing... I would suggest just using some numbers in your example.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prototyped it in ovn-kubernetes/ovn-kubernetes#2654, perhaps the description I gave there is easier to understand: |
||||||||||||||
| higher than `routable-mtu`. | ||||||||||||||
| * Do rolling reboot, nodes will restart with the higer MTU setting configured | ||||||||||||||
| on their interfaces but still be effectively using the lower MTU. | ||||||||||||||
| * Set in ovn-config a `mtu` equal to `routable-mtu` or replace `mtu` with the | ||||||||||||||
| value of `routing-mtu` value and remove the latter. | ||||||||||||||
| * Do rolling reboot, as nodes restart they will use the higher MTU. As other | ||||||||||||||
| nodes already have this MTU set on their interfaces no drops are expected. | ||||||||||||||
|
|
||||||||||||||
| These procedure should be coordinated with changing the MTU setting on br-ex | ||||||||||||||
| and its physical port. | ||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we had wanted to do this for openshift-sdn too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do but I understood that not at the same time and not for 4.10 anyway so I did not cover it in this enhancement just because of time constraints and the fact that I don't know anything about it.