Skip to content

Conversation

@jcaamano
Copy link
Contributor

This supersedes #926 and expands on the alternative of performing the MTU change through rolling reboots.

/cc @mccv1r0
For MTU changes
/cc @trozet
For OVN kubernetes
/cc @danwinship @knobunc @dcbw
For general thoughts

@jcaamano jcaamano changed the title Proposal to allow MTU changes Proposal to allow MTU changes with rolling reboots Nov 23, 2021
@jcaamano jcaamano force-pushed the change-mtu-with-reboots branch from 2d0b147 to 30af4e8 Compare November 23, 2021 10:47
@jcaamano
Copy link
Contributor Author

Not ready for review yet, working through some corner cases.

@jcaamano jcaamano changed the title Proposal to allow MTU changes with rolling reboots [WIP] Proposal to allow MTU changes with rolling reboots Nov 23, 2021
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 23, 2021
@jcaamano jcaamano force-pushed the change-mtu-with-reboots branch 4 times, most recently from 6e56799 to 55f7f88 Compare November 24, 2021 11:29
@jcaamano jcaamano changed the title [WIP] Proposal to allow MTU changes with rolling reboots Proposal to allow MTU changes with rolling reboots Nov 24, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 24, 2021
@jcaamano
Copy link
Contributor Author

/cc @yuqi-zhang
For any possible comments related to MCO usage.

@openshift-ci openshift-ci bot requested a review from yuqi-zhang November 24, 2021 11:32
@jcaamano jcaamano force-pushed the change-mtu-with-reboots branch 4 times, most recently from 600ec24 to a77fe74 Compare November 25, 2021 17:24
Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions from the MCO perspective:

  1. Is there a configuration change required from the MCO perspective? You do mention creating a dropin, but does does that actually affect anything? Or is it just being used as a reboot marker? The later comments of removing all the MachineConfigs seem to indicate no.

And some other questons inline:

The CNO should clean up MachineConfig objects that are no longer needed in the
procedure which deserves a closer look since this might trigger unwanted
reboots. This happens at two times in the procedure:
* `cno-set-target-mtu-<TARGET_MTU>-<pool>` MachineConfig object is removed when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCO doesn't have a batch changes concept. If you really want, you could just modify the existing MachineConfig from A to B (so instead of creating cno-unset-target-mtu-<pool>, you can simply modify cno-set-target-mtu-<TARGET_MTU>-<pool> to the new desired state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuqi-zhang I agree that it is preferable to just modify the contents but...

How can we know when a node has been updated with a MachineConfig object with specific contents?? The only way I know is to parse the rendered MachineConfig object and look there for the contents of the source MachineConfig. This is not very practical.

Whereas if we use different MachineConfig objects with deterministic names, we can just check if that name is listed in the source of the MachineConfigPool.

* `cno-set-target-mtu-<TARGET_MTU>-<pool>` MachineConfig object is removed when
`cno-unset-target-mtu-<pool>` MachineConfig object is applied which overall
causes one intended reboot.
* `cno-unset-target-mtu-<pool>` MachineConfig object is removed when the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wait if this originally existed, why not just delete the above machineconfig to roll out the second change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to derive CNO status from MCP status when machine state is changing due to CNO actions. If we simply remove the MachineConfig then we don't know if the MCP is being updated because of that or any other change unrelated to CNO.

@jcaamano jcaamano force-pushed the change-mtu-with-reboots branch 3 times, most recently from 58ae1ff to 026b04c Compare November 30, 2021 19:39
Copy link
Contributor

@trozet trozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good. Couple of minor comments and would add a detailed procedure about the manual alternative method that may be used in the short term (due to time constraint) and then later the more complex automated CNO and machine config bits can be added. Summary of manual steps:

  1. configure CNO with routable-mtu and mtu
  2. reboot the nodes manually
  3. unset routable-mtu
  4. reboot the nodes again

* `MTU`, if set, shall be configured as MTU on br-ex and ovs-if-phys0
interfaces.
* `ROUTABLE_MTU`, if set, shall be configured on br-ex link scoped routes.
* If `TARGET_MTU` is unset, the previous `TARGET_MTU` should be configured as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about "if set"?

Copy link
Contributor Author

@jcaamano jcaamano Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing. TARGET_MTU is just included to be aware of it's previous value when it is unset.

operator configuration. If not, consider the MTU change unsafe and don't
proceed. Otherwise, consider this to be the target host MTU value or assume a
default target host MTU value of the target MTU value + 100.
3. Check that the target host MTU values are within the minimum and maximum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are the minimum and maximum values determined? is that just the minimum of a set of {current host value, OVN value, target value}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the maximum and minimum MTU values the node's physical nic can be configured with. This are annotated on the node through NFD and are obtained for example from maxmtu and minmtu of ip -d link show

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah OK. It would be good to add that to the doc I think.


The CNO monitors changes on the operator configuration and coordinates the MTU
change. When it detects a MTU change in the operator configuration:
1. Check if an annotation `mtu-migration` is set in the operator. If not,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the first time that 'mtu-migration' is mentioned, but there is no explanation of what it is. Is this the target MTU value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the host. While I do mention the annotation before without specifics, it is introduced here. But if you follow through the procedure I think it's clear what it is for.

will not have impact since the cluster was already effectively using that MTU
due to the routes MTU being previously set.

### MTU increase example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the examples, really helps.


### Open Questions

* The proposed procedure is completely automated. Should we consider making some
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the answer to this is yes and that should be listed as an alternative approach that might be used in the short term. You can add it to the Implementation Details/Notes/Constraints section or Alternatives. You can remove this question here.

* We might not be aware at any given time of br-ex link scoped routes if these
are dynamically assigned through IPv6 RA. NetworkManager dispatcher does not
seem to trigger an event for route modifications resulting from IPv6 RA.
* Should the host MTU change outlive OVN-K? For example, if migrating back to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we migrate back to openshift-sdn, br-ex and the associated connections will be deleted, so whatever MTU the host interfaces connections were configured with will be used. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be a rollback to the old MTU and I guess that's what I am asking, would that be desirable or maybe not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I noticed the title of this talks about MTU change generically which makes me thing openshift-sdn is included, but the details of the proposal talk about ovnk changes. Are we trying to support this for openshift-sdn and ovnk or ovnk only?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the epic says openshift-sdn as well. We met offline and came up with a plan to make it work for openshift-sdn as well.

@jcaamano
Copy link
Contributor Author

jcaamano commented Dec 1, 2021

A few questions from the MCO perspective:

  1. Is there a configuration change required from the MCO perspective? You do mention creating a dropin, but does does that actually affect anything? Or is it just being used as a reboot marker? The later comments of removing all the MachineConfigs seem to indicate no.

And some other questons inline:

@yuqi-zhang The dropin causes configure-ovs to use specified MTU parameters within, so if does affect things. The MTU itself persists in nmconnection files that configure-ovs handles (or elsewhere if we need to) so the actual dropin or MCO doe snot need to persist so it is removed at the end.

@jcaamano jcaamano force-pushed the change-mtu-with-reboots branch from 026b04c to dc23bfb Compare December 1, 2021 19:25
@jcaamano jcaamano force-pushed the change-mtu-with-reboots branch 6 times, most recently from 5f3932a to 2976a2a Compare December 1, 2021 20:33

The CNO monitors changes on the operator network configuration. When it detects
a change in `Network.Spec.Migration`:
1. Check that `MTU.Network.From` and `MTU.Network.To` are not independently set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this say "or different". Shouldn't what you are migrating From and migrating To be different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Actually To and From can be equal to attempt to rollback a MTU migration that is in progress when something goes wrong.

New `MTU.Network.From`, `MTU.Network.To` and `MTU.Machine.To` fields will be
added in the existing `NetworkMigration` type which is used in CNO's cluster and
operator network configurations and will contain the CNI and host MTU values to
migrate from and to to respectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to to

@jcaamano jcaamano force-pushed the change-mtu-with-reboots branch 8 times, most recently from 0290e21 to db1219d Compare December 2, 2021 11:48
Copy link
Contributor

@trozet trozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 2, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: trozet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2021
@knobunc
Copy link
Contributor

knobunc commented Dec 2, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 2, 2021
@jcaamano jcaamano force-pushed the change-mtu-with-reboots branch from db1219d to f6b6753 Compare December 2, 2021 15:38
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 2, 2021
@jcaamano jcaamano force-pushed the change-mtu-with-reboots branch from f6b6753 to 7887e55 Compare December 2, 2021 15:58
@knobunc
Copy link
Contributor

knobunc commented Dec 2, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 2, 2021
@openshift-merge-robot openshift-merge-robot merged commit e65f8c4 into openshift:master Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants