-
Notifications
You must be signed in to change notification settings - Fork 532
Proposal to allow MTU changes with rolling reboots #963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to allow MTU changes with rolling reboots #963
Conversation
2d0b147 to
30af4e8
Compare
|
Not ready for review yet, working through some corner cases. |
6e56799 to
55f7f88
Compare
|
/cc @yuqi-zhang |
600ec24 to
a77fe74
Compare
yuqi-zhang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions from the MCO perspective:
- Is there a configuration change required from the MCO perspective? You do mention creating a dropin, but does does that actually affect anything? Or is it just being used as a reboot marker? The later comments of removing all the MachineConfigs seem to indicate no.
And some other questons inline:
| The CNO should clean up MachineConfig objects that are no longer needed in the | ||
| procedure which deserves a closer look since this might trigger unwanted | ||
| reboots. This happens at two times in the procedure: | ||
| * `cno-set-target-mtu-<TARGET_MTU>-<pool>` MachineConfig object is removed when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MCO doesn't have a batch changes concept. If you really want, you could just modify the existing MachineConfig from A to B (so instead of creating cno-unset-target-mtu-<pool>, you can simply modify cno-set-target-mtu-<TARGET_MTU>-<pool> to the new desired state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuqi-zhang I agree that it is preferable to just modify the contents but...
How can we know when a node has been updated with a MachineConfig object with specific contents?? The only way I know is to parse the rendered MachineConfig object and look there for the contents of the source MachineConfig. This is not very practical.
Whereas if we use different MachineConfig objects with deterministic names, we can just check if that name is listed in the source of the MachineConfigPool.
| * `cno-set-target-mtu-<TARGET_MTU>-<pool>` MachineConfig object is removed when | ||
| `cno-unset-target-mtu-<pool>` MachineConfig object is applied which overall | ||
| causes one intended reboot. | ||
| * `cno-unset-target-mtu-<pool>` MachineConfig object is removed when the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, wait if this originally existed, why not just delete the above machineconfig to roll out the second change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only want to derive CNO status from MCP status when machine state is changing due to CNO actions. If we simply remove the MachineConfig then we don't know if the MCP is being updated because of that or any other change unrelated to CNO.
58ae1ff to
026b04c
Compare
trozet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good. Couple of minor comments and would add a detailed procedure about the manual alternative method that may be used in the short term (due to time constraint) and then later the more complex automated CNO and machine config bits can be added. Summary of manual steps:
- configure CNO with routable-mtu and mtu
- reboot the nodes manually
- unset routable-mtu
- reboot the nodes again
| * `MTU`, if set, shall be configured as MTU on br-ex and ovs-if-phys0 | ||
| interfaces. | ||
| * `ROUTABLE_MTU`, if set, shall be configured on br-ex link scoped routes. | ||
| * If `TARGET_MTU` is unset, the previous `TARGET_MTU` should be configured as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about "if set"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing. TARGET_MTU is just included to be aware of it's previous value when it is unset.
| operator configuration. If not, consider the MTU change unsafe and don't | ||
| proceed. Otherwise, consider this to be the target host MTU value or assume a | ||
| default target host MTU value of the target MTU value + 100. | ||
| 3. Check that the target host MTU values are within the minimum and maximum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are the minimum and maximum values determined? is that just the minimum of a set of {current host value, OVN value, target value}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the maximum and minimum MTU values the node's physical nic can be configured with. This are annotated on the node through NFD and are obtained for example from maxmtu and minmtu of ip -d link show
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah OK. It would be good to add that to the doc I think.
|
|
||
| The CNO monitors changes on the operator configuration and coordinates the MTU | ||
| change. When it detects a MTU change in the operator configuration: | ||
| 1. Check if an annotation `mtu-migration` is set in the operator. If not, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the first time that 'mtu-migration' is mentioned, but there is no explanation of what it is. Is this the target MTU value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for the host. While I do mention the annotation before without specifics, it is introduced here. But if you follow through the procedure I think it's clear what it is for.
| will not have impact since the cluster was already effectively using that MTU | ||
| due to the routes MTU being previously set. | ||
|
|
||
| ### MTU increase example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the examples, really helps.
|
|
||
| ### Open Questions | ||
|
|
||
| * The proposed procedure is completely automated. Should we consider making some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the answer to this is yes and that should be listed as an alternative approach that might be used in the short term. You can add it to the Implementation Details/Notes/Constraints section or Alternatives. You can remove this question here.
| * We might not be aware at any given time of br-ex link scoped routes if these | ||
| are dynamically assigned through IPv6 RA. NetworkManager dispatcher does not | ||
| seem to trigger an event for route modifications resulting from IPv6 RA. | ||
| * Should the host MTU change outlive OVN-K? For example, if migrating back to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we migrate back to openshift-sdn, br-ex and the associated connections will be deleted, so whatever MTU the host interfaces connections were configured with will be used. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be a rollback to the old MTU and I guess that's what I am asking, would that be desirable or maybe not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I noticed the title of this talks about MTU change generically which makes me thing openshift-sdn is included, but the details of the proposal talk about ovnk changes. Are we trying to support this for openshift-sdn and ovnk or ovnk only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the epic says openshift-sdn as well. We met offline and came up with a plan to make it work for openshift-sdn as well.
@yuqi-zhang The dropin causes configure-ovs to use specified MTU parameters within, so if does affect things. The MTU itself persists in nmconnection files that configure-ovs handles (or elsewhere if we need to) so the actual dropin or MCO doe snot need to persist so it is removed at the end. |
026b04c to
dc23bfb
Compare
5f3932a to
2976a2a
Compare
|
|
||
| The CNO monitors changes on the operator network configuration. When it detects | ||
| a change in `Network.Spec.Migration`: | ||
| 1. Check that `MTU.Network.From` and `MTU.Network.To` are not independently set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this say "or different". Shouldn't what you are migrating From and migrating To be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Actually To and From can be equal to attempt to rollback a MTU migration that is in progress when something goes wrong.
| New `MTU.Network.From`, `MTU.Network.To` and `MTU.Machine.To` fields will be | ||
| added in the existing `NetworkMigration` type which is used in CNO's cluster and | ||
| operator network configurations and will contain the CNI and host MTU values to | ||
| migrate from and to to respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to to
0290e21 to
db1219d
Compare
trozet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: trozet The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
db1219d to
f6b6753
Compare
f6b6753 to
7887e55
Compare
|
/lgtm |
This supersedes #926 and expands on the alternative of performing the MTU change through rolling reboots.
/cc @mccv1r0
For MTU changes
/cc @trozet
For OVN kubernetes
/cc @danwinship @knobunc @dcbw
For general thoughts