Skip to content

OCPBUGS-55962: Inter advertised UDN isolation configurable#2569

Closed
pperiyasamy wants to merge 2 commits intoopenshift:masterfrom
pperiyasamy:inter_udn_isolation-configurable
Closed

OCPBUGS-55962: Inter advertised UDN isolation configurable#2569
pperiyasamy wants to merge 2 commits intoopenshift:masterfrom
pperiyasamy:inter_udn_isolation-configurable

Conversation

@pperiyasamy
Copy link
Copy Markdown
Member

No description provided.

@openshift-ci openshift-ci Bot requested review from jcaamano and tssurya May 12, 2025 14:49
@pperiyasamy pperiyasamy force-pushed the inter_udn_isolation-configurable branch from b8124a9 to 2f9dcd9 Compare May 12, 2025 14:52
@pperiyasamy
Copy link
Copy Markdown
Member Author

/assign @kyrtapz @tssurya @jcaamano @trozet

@pperiyasamy pperiyasamy force-pushed the inter_udn_isolation-configurable branch 3 times, most recently from b8fad75 to b09dee2 Compare May 22, 2025 08:43
@pperiyasamy pperiyasamy changed the title [DNM] Inter advertised UDN isolation configurable OCPBUGS-55962: Inter advertised UDN isolation configurable May 22, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 22, 2025
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@pperiyasamy: This pull request references Jira Issue OCPBUGS-55962, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @zhaozhanqi

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label May 22, 2025
@openshift-ci openshift-ci Bot requested a review from zhaozhanqi May 22, 2025 08:46
Copy link
Copy Markdown
Contributor

@kyrtapz kyrtapz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.
I think you are missing a check in configureAdvertisedNetworkIsolation as well.

Comment thread go-controller/pkg/util/multi_network.go Outdated
// UDNLooseIsolation allows communication between two advertised UDN networks.
UDNLooseIsolation string = "loose"
// UDNLooseIsolation drops communication between two advertised UDN networks.
UDNSecureIsolation string = "secure"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I would remove this variable since it is not used anywhere.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment thread go-controller/pkg/util/multi_network.go Outdated
return "", ""
}

// IsLooseUDNIsolation returns true if two UDN networks are not configured to be
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention that this regards pod to pod on advertised networks isolation. The host->udn isolation is still in place with this pr.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the comment.

@pperiyasamy pperiyasamy force-pushed the inter_udn_isolation-configurable branch from b09dee2 to 45d72b1 Compare May 22, 2025 14:46
@pperiyasamy
Copy link
Copy Markdown
Member Author

I think you are missing a check in configureAdvertisedNetworkIsolation as well.

ok @kyrtapz, thought it would be no harm having those address sets lying there. but anyway added check now for configureAdvertisedNetworkIsolation and CleanupStaleNetworks.

@pperiyasamy
Copy link
Copy Markdown
Member Author

/retest

@kyrtapz
Copy link
Copy Markdown
Contributor

kyrtapz commented May 27, 2025

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2025
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kyrtapz, pperiyasamy
Once this PR has been reviewed and has the lgtm label, please ask for approval from jcaamano. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines +533 to +545
if util.IsLooseUDNIsolation() {
klog.Infof("Skip creating global advertised networks addressset in loose UDN isolation mode")
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we actually ensure the address set is not there if start with loose UDN isolation?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption was that moving ovn-k to loose mode would only be "supported" when there is no advertised networks. If that is not the case I agree we need to add the necessary cleanup.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that was my assumption as well, AFAIU this is a stopgap arrangement for testing loose UDN isolation mode in a lab. This commit will be removed once it's implemented in upstream with a proper UDN isolation specific API.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an assumption, can we pass it by @trozet in case he is aware of something else?

Copy link
Copy Markdown
Contributor

@jcaamano jcaamano May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So toggling this knob won't do anything for networks that were already advertised. It would only affect networks that became advertised after.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my current opinion is that we should do this upstream with a global config flag to enable/disable "RoutedUDNIsolation", default enabled. In the future we might make this a per RA thing, where it might be "UDNIsolation": default strict, maybe another mode is "AllowExternalRouting".

I think the code should sync the current state if the flag is flipped on or off. So if someone toggles the flag on an existing cluster, it should remove the address set, ACL, whatever that was configured to enforce isolation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trozet @jcaamano agree to your thoughts, here is upstream PR as suggested above: ovn-kubernetes/ovn-kubernetes#5276. PTAL.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to concerns with the upstream approach:

  • lack of time: we are going to start backporting to 4.19 as soon as we lift feature gate in 4.20 which hopefully is any day now, and I am not sure we can start doing it without this in place. Unless we change plans of course.
  • changing the behavior of the config flag down the line: the isolation we really want to have configurable is the one for traffic that egresses the cluster. This current implementation is a big hammer that just isolates all traffic regardless of whether it egresses the cluster or not. When we get the implementation right,we would change the behavior of the config flag. That can be confusing in general.

So the idea was doing a downstream temporary implementation while we get the upstream implementation right.

Comment on lines +371 to +390
if util.IsLooseUDNIsolation() {
klog.Infof("The network %s is configured with loose isolation mode, skip deleting tier-0 drop ACL rule",
bnc.GetNetworkName())
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we actually ensure the ACL is not there if start with loose UDN isolation?

The ovnk by default isolates advertised UDN networks isolated from each other,
but there is a requirement to disable isolation so that BGP routing functionality
can be tested between different UDN networks. Hence this commit consumes the
UDN_ISOLATION_MODE env variable and isolation can be determined accordingly.
By default it uses secure mode to isolate the networks and it can be overridden
by CNO via config map.

Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
The loose udn isolation option may be rolled out later when there are
already existing BGP advertised networks in place, so this commit
cleans up associated tier-0 pass and drop ACLs belonging to existing
networks and also deleting global advertised networks address set.

Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
@pperiyasamy pperiyasamy force-pushed the inter_udn_isolation-configurable branch from 45d72b1 to a10b7b3 Compare June 2, 2025 12:52
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 2, 2025
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 2, 2025

New changes are detected. LGTM label has been removed.

@openshift-bot
Copy link
Copy Markdown
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2025
@openshift-merge-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link
Copy Markdown
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci Bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 1, 2025
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 9, 2025

@pperiyasamy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn-techpreview a10b7b3 link false /test e2e-vsphere-ovn-techpreview
ci/prow/e2e-openstack-ovn a10b7b3 link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-hypershift-kubevirt a10b7b3 link false /test e2e-aws-ovn-hypershift-kubevirt
ci/prow/4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade a10b7b3 link true /test 4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade
ci/prow/4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-rt-upgrade a10b7b3 link true /test 4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-rt-upgrade
ci/prow/security a10b7b3 link false /test security
ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview a10b7b3 link false /test e2e-aws-ovn-hypershift-conformance-techpreview
ci/prow/e2e-aws-ovn-local-to-shared-gateway-mode-migration a10b7b3 link true /test e2e-aws-ovn-local-to-shared-gateway-mode-migration
ci/prow/e2e-aws-ovn-upgrade-local-gateway a10b7b3 link true /test e2e-aws-ovn-upgrade-local-gateway
ci/prow/4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade a10b7b3 link true /test 4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade
ci/prow/4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade a10b7b3 link true /test 4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade
ci/prow/4.21-upgrade-from-stable-4.20-images a10b7b3 link true /test 4.21-upgrade-from-stable-4.20-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Copy Markdown
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci Bot closed this Nov 9, 2025
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 9, 2025

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants