Skip to content

Enforce udn ns label#4912

Merged
tssurya merged 11 commits into
ovn-kubernetes:masterfrom
trozet:enforce_udn_ns_annotation
Jan 17, 2025
Merged

Enforce udn ns label#4912
tssurya merged 11 commits into
ovn-kubernetes:masterfrom
trozet:enforce_udn_ns_annotation

Conversation

@trozet

@trozet trozet commented Dec 13, 2024

Copy link
Copy Markdown
Contributor

Require namespace label for primary UDN
k8s.ovn.org/primary-user-defined-network is now required to be labeled on a
namespace at namespace creation time in order to use a primary UDN.
The label may not be updated and may only be added at creation time.

The following conditions are true:

  1. If namespace is missing the label, and a pod is created, it attaches
    to default network.
  2. If the namespace is missing the label, and a primary UDN or CUDN is
    created that matches that namespace, the UDN/CUDN will report error
    status and the NAD will not be generated.
  3. If the namespace is missing the label, and a primary UDN/CUDN exists,
    a pod in the namespace will be created and attached to default
    network.
  4. If the namespace has the label, and a primary UDN/CUDN does not exist
    a pod in the namespace will fail creation until the UDN/CUDN is
    created.

Also includes some fixes to unit tests that were brought to light by
this PR. For example, the layer 2 multi-network tests were adding
invalid annotations for node-subnets, etc.

Related OCP jira:
https://issues.redhat.com/browse/OCPBUGS-42609

@trozet trozet requested a review from a team as a code owner December 13, 2024 23:28
@trozet trozet requested a review from cathy-zhou December 13, 2024 23:28
@github-actions github-actions Bot added the area/unit-testing Issues related to adding/updating unit tests label Dec 13, 2024
@trozet

trozet commented Dec 13, 2024

Copy link
Copy Markdown
Contributor Author

Note: e2es will fail because they have not been updated to add the annotation, but that's fine for now to see the guard work as we expect

@trozet trozet requested review from npinaeva and removed request for cathy-zhou December 14, 2024 17:31
@tssurya tssurya added feature/user-defined-networks All PRs related to User defined network segmentation kind/bug All issues that are bugs and PRs opened to fix bugs labels Dec 14, 2024
Comment thread go-controller/pkg/clustermanager/userdefinednetwork/controller_helper.go Outdated
@trozet trozet force-pushed the enforce_udn_ns_annotation branch from faeaa83 to ff0e5a7 Compare December 17, 2024 03:18
@github-actions github-actions Bot added feature/egress-ip Issues related to EgressIP feature area/e2e-testing labels Dec 17, 2024
Comment thread test/e2e/network_segmentation.go Outdated
Comment thread go-controller/pkg/util/util.go Outdated
Comment thread go-controller/pkg/clustermanager/userdefinednetwork/controller_helper.go Outdated
Comment thread go-controller/pkg/types/const.go Outdated
@trozet trozet force-pushed the enforce_udn_ns_annotation branch from ff0e5a7 to 474c68d Compare December 18, 2024 02:51
Comment thread test/e2e/network_segmentation.go

@maiqueb maiqueb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a small question about the namespace teardown.

I also share @ormergi 's opinion about the label name (primary UDN vs user defined primary network).

Other than that, LGTM.

@ormergi ormergi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks of the PR, Please see my inline comments

Comment thread test/e2e/network_segmentation.go
Comment thread test/e2e/network_segmentation.go Outdated
Comment on lines +233 to +236
_, err := cs.CoreV1().Namespaces().Create(context.Background(), &v1.Namespace{
ObjectMeta: metav1.ObjectMeta{
Name: defaultNetNamespace,
Name: defaultNetNamespace,
Labels: map[string]string{RequiredUDNNamespaceLabel: ""},

@ormergi ormergi Dec 18, 2024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Commenting here but it related to other places as well.
Please consider introducing a helper to create the test namespace, it will reduce the noise in all places that create the namespace, and make it easier to introduce additional changes to such namespaces.

_, err := cs.CoreV1().Namespaces().Create(context.Background(), newTestNamespace(defaultNetNamespace))...
...
func newTestNamesapce(name string) *Namespace {
  return  &v1.Namespace{ ObjectMeta: metav1.ObjectMeta{
    Name: name,
    Labels: map[string]string{RequiredUDNNamespaceLabel: ""},
}}


By("create the new target namespace")
_, err = cs.CoreV1().Namespaces().Create(context.Background(), &v1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: testNewNs}}, metav1.CreateOptions{})
_, err = cs.CoreV1().Namespaces().Create(context.Background(), &v1.Namespace{

@ormergi ormergi Dec 18, 2024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: It seem this test create CUDN with role=secondary at the before-each, does it fail w/o this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, i just looked for everywhere we create namespace and added it for net seg tests. It's not obvious to me without going through each test what is secondary or primary. I dont have time to do that.


By("create new namespace")
_, err := cs.CoreV1().Namespaces().Create(context.Background(), &v1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: testNewNs}}, metav1.CreateOptions{})
_, err := cs.CoreV1().Namespaces().Create(context.Background(), &v1.Namespace{

@ormergi ormergi Dec 18, 2024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: It seem this test create CUDN with role=secondary at the before-each, does it fail w/o this change?

}
}

func invalidTestNamespace(name string) *corev1.Namespace {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: I would define this function to use testNamespace instead of the other way around.

udn := testUDN()
expectedNAD := testNAD()
c = newTestController(renderNadStub(expectedNAD), udn)
c = newTestController(renderNadStub(expectedNAD), udn, testNamespace("test"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: are these tests fail if we dont pass the test-namespace object?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in almost all cases yes cause it needs to check the namespace, kidn of makes sense for the test with NAD to also include a namespace though

)

func (c *Controller) updateNAD(obj client.Object, namespace string) (*netv1.NetworkAttachmentDefinition, error) {
if utiludn.IsPrimaryNetwork(template.GetSpec(obj)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to move the label check under the existing primary-network spec check? (line 54)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming we just want to bail out from this asap ...

@ormergi

ormergi commented Dec 18, 2024

Copy link
Copy Markdown
Contributor

I had some suggestion around tests, but they can be done on a follow-up PR, sorry for the noise.

Also a question about the label check on the UDN controllers but I think I got it.

LGTM.

@trozet trozet force-pushed the enforce_udn_ns_annotation branch from 474c68d to e0309a2 Compare December 18, 2024 14:31
maiqueb
maiqueb previously approved these changes Dec 18, 2024

@maiqueb maiqueb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, this is good.

Thanks for the prompt fix to the nasty race we had.

verbs: ["create", "patch", "update"]

---
apiVersion: admissionregistration.k8s.io/v1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trozet just as a reminder, this will require a separate PR downstream(CNO).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also need to get the e2es backward compatible downstream before we merge this

@kyrtapz kyrtapz changed the title Enforce udn ns annotation Enforce udn ns label Dec 19, 2024
@trozet trozet force-pushed the enforce_udn_ns_annotation branch 4 times, most recently from d5bfe48 to 03632a5 Compare December 20, 2024 22:48
name: nadName,
topology: "layer3",
cidr: fmt.Sprintf("%s,%s", userDefinedNetworkIPv4Subnet, userDefinedNetworkIPv6Subnet),
cidr: correctCIDRFamily(userDefinedNetworkIPv4Subnet, userDefinedNetworkIPv6Subnet),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

}
cachedNetwork := e.getNetworkFromPodAssignment(podKey)
err = e.nodeZoneState.DoWithLock(statusToRemove.Node, func(key string) error {
if cachedNetwork == nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm for deletion of EIP status we rely on the NAD being present in that namespace that determines the active network? that does sound wrong cause the NAD / namespace could just be gone i guess in order of events we don't control...
the fix looks correct to me but the fact that not a single unit test had to be changed worries me :D
@martinkennelly can we track this somewhere in TODOs as we need to add a test for the sequence of events of deletion if a NAD doesn't exist and ensure nothing breaks...

meanwhile @trozet why was GetActiveNetworkForNamespace not working in this specific instances? I assume this is some CI issue you happened to fix in this PR right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the tests do fail now, thast how i caught it :) It's because the old getActiveNetworkForNamespace behavior would assume default network if a NAD didn't exist. Now it wont do that anymore if the namespace has the label. So an error was getting thrown here during the test and the status was not being removed in the e2es, causing them to fail. I'm not sure if the false positive was just enough to clean up stuff correctly for egress IP. Will defer to @martinkennelly for that.

@tssurya

tssurya commented Jan 13, 2025

Copy link
Copy Markdown
Contributor

https://github.com/ovn-kubernetes/ovn-kubernetes/actions/runs/12693489980/job/35419153714?pr=4912

Summarizing 2 Failures:
  [FAIL] Network Segmentation a user defined primary network with multicast feature enabled for namespace should be able to send multicast UDP traffic between nodes [It] with primary layer3 UDN
  /home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/network_segmentation.go:734
  [FAIL] Network Segmentation a user defined primary network with multicast feature enabled for namespace should be able to send multicast UDP traffic between nodes [It] with primary layer2 UDN
  /home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/network_segmentation.go:734

Ran 79 of 428 Specs in 3145.252 seconds
FAIL! -- 77 Passed | 2 Failed | 1 Flaked | 2 Pending | 347 Skipped

multicast tests are failing here

trozet and others added 10 commits January 13, 2025 16:23
k8s.ovn.org/user-defined-network is now required to be labeled on a
namespace at namespace creation time in order to use a primary UDN. The
following conditions are true:

1. If namespace is missing the label, and a pod is created, it attaches
   to default network.
2. If the namespace is missing the label, and a primary UDN or CUDN is
   created that matches that namespace, the UDN/CUDN will report error
   status and the NAD will not be generated.
3. If the namespace is missing the label, and a primary UDN/CUDN exists,
   a pod in the namespace will be created and attached to default
   network.
4. If the namespace has the label, and a primary UDN/CUDN does not exist
   a pod in the namespace will fail creation until the UDN/CUDN is
   created.

Also includes some fixes to unit tests that were brought to light by
this PR. For example, the layer 2 multi-network tests were adding
invalid annotations for node-subnets, etc.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Patryk Diak <pdiak@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
Was using ipv6 on ipv4 cluster.

Signed-off-by: Tim Rozet <trozet@redhat.com>
EgressIP was depending on getActiveNetworkFromNamespace to work, or
would fail to remove egressIP status.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
Test ensures that a pod will still come up when a UDN exists, but the
UDN required label is missing on the namespace. The pod will be wired to
the default cluster network.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
@trozet trozet force-pushed the enforce_udn_ns_annotation branch from 9361363 to b8c3d78 Compare January 13, 2025 21:23
@trozet

trozet commented Jan 13, 2025

Copy link
Copy Markdown
Contributor Author

https://github.com/ovn-kubernetes/ovn-kubernetes/actions/runs/12693489980/job/35419153714?pr=4912

Summarizing 2 Failures:
  [FAIL] Network Segmentation a user defined primary network with multicast feature enabled for namespace should be able to send multicast UDP traffic between nodes [It] with primary layer3 UDN
  /home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/network_segmentation.go:734
  [FAIL] Network Segmentation a user defined primary network with multicast feature enabled for namespace should be able to send multicast UDP traffic between nodes [It] with primary layer2 UDN
  /home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/network_segmentation.go:734

Ran 79 of 428 Specs in 3145.252 seconds
FAIL! -- 77 Passed | 2 Failed | 1 Flaked | 2 Pending | 347 Skipped

multicast tests are failing here

yeah I see them failing for ipv6, in the latest run and this time we have logs and e2e dbs. I fixed these tests to use the correctCIDRFamily and now they seem to fail. I looked at the dbs and I dont see anything obvious. All the 3 servers were created and I see them attached to the UDN, but traffic seems to fail. @dceara do you mind taking a look into this please?

https://github.com/ovn-kubernetes/ovn-kubernetes/actions/runs/12756338404/job/35555521085?pr=4912

@trozet

trozet commented Jan 14, 2025

Copy link
Copy Markdown
Contributor Author

hmm some new flake that I haven't seen before:

2025-01-14T21:27:42.0290463Z �[38;5;9m[FAIL]�[0m �[0mOVN master EgressIP Operations cluster default network �[38;5;243mOn node DELETE �[0m[secondary host network] should perform proper OVN transactions when namespace and pod is created after node egress label switch �[38;5;9m�[1m[It] interconnect enabled; node1 in global and node2 in remote zones�[0m
2025-01-14T21:27:42.0293085Z �[38;5;243m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/ovn/egressip_test.go:3296�[0m

looks legit as the next hop for the stale egress node was not removed:

2025-01-14T21:26:07.5804576Z   the missing elements were
2025-01-14T21:26:07.5805165Z       <[]*libovsdb.testDataMatcher | len:2, cap:2>: [
2025-01-14T21:26:07.5805641Z           {
2025-01-14T21:26:07.5806393Z               expected: <*nbdb.LogicalRouterPolicy | 0xc02c2a3180>{
2025-01-14T21:26:07.5807258Z                   UUID: "reroute-UUID",
2025-01-14T21:26:07.5807891Z                   Action: "reroute",
2025-01-14T21:26:07.5808495Z                   BFDSessions: nil,
2025-01-14T21:26:07.5809076Z                   ExternalIDs: {
2025-01-14T21:26:07.5809752Z                       "ip-family": "ip4",
2025-01-14T21:26:07.5810481Z                       "network": "default",
2025-01-14T21:26:07.5811681Z                       "k8s.ovn.org/name": "egressip_egressip-namespace/egress-pod",
2025-01-14T21:26:07.5813730Z                       "k8s.ovn.org/id": "default-network-controller:EgressIP:100:egressip_egressip-namespace/egress-pod:ip4:default",
2025-01-14T21:26:07.5815499Z                       "k8s.ovn.org/owner-controller": "default-network-controller",
2025-01-14T21:26:07.5816549Z                       "k8s.ovn.org/owner-type": "EgressIP",
2025-01-14T21:26:07.5817282Z                       "priority": "100",
2025-01-14T21:26:07.5817783Z                   },
2025-01-14T21:26:07.5818432Z                   Match: "ip4.src == 10.128.0.15",
2025-01-14T21:26:07.5819044Z                   Nexthop: nil,
2025-01-14T21:26:07.5819700Z                   Nexthops: ["100.88.0.3"],
2025-01-14T21:26:07.5820280Z                   Options: nil,
2025-01-14T21:26:07.5820842Z                   Priority: 100,
2025-01-14T21:26:07.5821268Z               },
2025-01-14T21:26:07.5821733Z               ignoreUUID: false,
2025-01-14T21:26:07.5822137Z           },
2025-01-14T21:26:07.5822450Z           {
2025-01-14T21:26:07.5823148Z               expected: <*nbdb.LogicalRouter | 0xc02c2ea9c0>{
2025-01-14T21:26:07.5824411Z                   UUID: "fca81e0b-99d1-4181-bcac-31a0e5f5f893 [ovn_cluster_router-UUID]",
2025-01-14T21:26:07.5825170Z                   Copp: nil,
2025-01-14T21:26:07.5825705Z                   Enabled: nil,
2025-01-14T21:26:07.5826274Z                   ExternalIDs: nil,
2025-01-14T21:26:07.5826887Z                   LoadBalancer: nil,
2025-01-14T21:26:07.5827554Z                   LoadBalancerGroup: nil,
2025-01-14T21:26:07.5828293Z                   Name: "ovn_cluster_router",
2025-01-14T21:26:07.5828856Z                   Nat: nil,
2025-01-14T21:26:07.5829386Z                   Options: nil,
2025-01-14T21:26:07.5830035Z                   Policies: [
2025-01-14T21:26:07.5830632Z                       "reroute-UUID",
2025-01-14T21:26:07.5831878Z                       "9a60bd0d-1012-4865-b98c-1e1723f22335 [default-no-reroute-UUID]",
2025-01-14T21:26:07.5833289Z                       "68da3de2-a809-425b-81df-2ad6158121b1 [no-reroute-service-UUID]",
2025-01-14T21:26:07.5834646Z                       "661e7434-239d-4862-a161-2c2de5e0efb3 [no-reroute-node-UUID]",
2025-01-14T21:26:07.5836401Z                       "35a3fb4e-fe39-4d88-986e-ce543597eee3 [default-no-reroute-reply-traffic]",
2025-01-14T21:26:07.5837134Z                   ],
2025-01-14T21:26:07.5837609Z                   Ports: nil,
2025-01-14T21:26:07.5838203Z                   StaticRoutes: nil,
2025-01-14T21:26:07.5838648Z               },
2025-01-14T21:26:07.5839123Z               ignoreUUID: false,
2025-01-14T21:26:07.5839518Z           },
2025-01-14T21:26:07.5839821Z       ]
2025-01-14T21:26:07.5840143Z   the extra elements were
2025-01-14T21:26:07.5840615Z       <[]interface {} | len:2, cap:2>: [
2025-01-14T21:26:07.5841295Z           <*nbdb.LogicalRouterPolicy | 0xc02c2a3680>{
2025-01-14T21:26:07.5842110Z               UUID: "e2efbc3b-602e-40a6-b0b9-46924b5951b9",
2025-01-14T21:26:07.5842734Z               Action: "reroute",
2025-01-14T21:26:07.5843262Z               BFDSessions: nil,
2025-01-14T21:26:07.5843765Z               ExternalIDs: {
2025-01-14T21:26:07.5844528Z                   "k8s.ovn.org/owner-type": "EgressIP",
2025-01-14T21:26:07.5845354Z                   "network": "default",
2025-01-14T21:26:07.5846424Z                   "k8s.ovn.org/name": "egressip_egressip-namespace/egress-pod",
2025-01-14T21:26:07.5847184Z                   "priority": "100",
2025-01-14T21:26:07.5847801Z                   "ip-family": "ip4",
2025-01-14T21:26:07.5849608Z                   "k8s.ovn.org/id": "default-network-controller:EgressIP:100:egressip_egressip-namespace/egress-pod:ip4:default",
2025-01-14T21:26:07.5851117Z                   "k8s.ovn.org/owner-controller": "default-network-controller",
2025-01-14T21:26:07.5851739Z               },
2025-01-14T21:26:07.5852304Z               Match: "ip4.src == 10.128.0.15",
2025-01-14T21:26:07.5852857Z               Nexthop: nil,
2025-01-14T21:26:07.5853507Z               Nexthops: ["100.88.0.3", "10.128.0.2"],
2025-01-14T21:26:07.5854068Z               Options: nil,
2025-01-14T21:26:07.5854558Z               Priority: 100,

I don't think it is related to this PR, but @martinkennelly heads up

https://github.com/ovn-kubernetes/ovn-kubernetes/actions/runs/12776376936/job/35614904024?pr=4912

Fixes test "should be able to send multicast UDP traffic between nodes"
which was failing in IPv6 lane due to bugs with an older iperf version.
Updates the test case to bind iperf to the right interface (eth0 or
ovn-udn1) depending on the test.

Test "should be able to receive multicast IGMP query" is skipped on
IPv6. I tried to fix it, but it doesn't seem to work. I left some notes
there so someone can follow up later to fix the test and unskip it.

Signed-off-by: Tim Rozet <trozet@redhat.com>
@trozet trozet force-pushed the enforce_udn_ns_annotation branch from 8504fe1 to da702b8 Compare January 15, 2025 02:01
@trozet

trozet commented Jan 15, 2025

Copy link
Copy Markdown
Contributor Author

Filed #4965 for the dualstack failure. @dceara has seen this before as well.

@tssurya tssurya left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not reviewed all the new commits, will pause till we are sure we are getting this in

Comment thread test/e2e/e2e.go
retryTimeout = 40 * time.Second // polling timeout
rolloutTimeout = 10 * time.Minute
agnhostImage = "registry.k8s.io/e2e-test-images/agnhost:2.26"
agnhostImageNew = "registry.k8s.io/e2e-test-images/agnhost:2.53"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not changing Image to be 26 and added a new version here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tim said when he updated it to 53 EIP tests started failing, in lieu of time he added the 53 version on top of 26.

udnNamespace.Labels = map[string]string{}
_, err = cs.CoreV1().Namespaces().Update(context.TODO(), udnNamespace, metav1.UpdateOptions{})
Expect(err).To(HaveOccurred())
Expect(err.Error()).To(ContainSubstring("The 'k8s.ovn.org/primary-user-defined-network' label cannot be added/removed after the namespace was created"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for these tests

Comment thread test/e2e/util.go
}
}

// newLatestAgnhostPod returns a pod that uses the newer agnhost image. The image's binary supports various subcommands

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we added new util and image instead of updating this for the default image we use?

return true
})

// remove load balancer groups

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remind me what the bug was? For UDNs we were not deleting the LBGs? In case you push please add it to the commit description?

@tssurya tssurya Jan 17, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tim explained it in slack..when UDNs were deleted we were leaving behind stale LBGs

@tssurya

tssurya commented Jan 17, 2025

Copy link
Copy Markdown
Contributor

tools seems to be failing:

ovn-trace from pod to IP indicates success from pod-test-6d94fd4497-g59sm to 8.8.8.8
ovs-appctl ofproto/trace pod to IP indicates success from pod-test-6d94fd4497-g59sm to 8.8.8.8
ovn-detrace pod to external IP indicates success from pod-test-6d94fd4497-g59sm to 8.8.8.8
Running: ./scripts/../../go-controller/_output/go/bin/ovnkube-trace -src-namespace ovn-kubetrace-pod-test -src pod-test-6d94fd4497-g59sm -dst-ip 8.8.8.8 -udp -loglevel 0
ovn-trace from pod to IP indicates success from pod-test-6d94fd4497-g59sm to 8.8.8.8
ovs-appctl ofproto/trace pod to IP indicates success from pod-test-6d94fd4497-g59sm to 8.8.8.8
ovn-detrace pod to external IP indicates success from pod-test-6d94fd4497-g59sm to 8.8.8.8
Run ovnkube-trace from all egressip-pod-test pods to 8.8.8.8
Running: ./scripts/../../go-controller/_output/go/bin/ovnkube-trace -src-namespace egressip-kubetrace-pod-test -src egressip-pod-test-f96c669c5-7hht7 -dst-ip 8.8.8.8 -tcp -loglevel 0
ovn-trace from pod to IP indicates success from egressip-pod-test-f96c669c5-7hht7 to 8.8.8.8
ovs-appctl ofproto/trace pod to IP indicates failure from egressip-pod-test-f96c669c5-7hht7 to 8.8.8.8
make: *** [Makefile:47: tools] Error 255
make: Leaving directory '/home/runner/work/ovn-kubernetes/ovn-kubernetes/test'
Error: Process completed with exit code 2.

@tssurya tssurya left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

lbGroups = append(lbGroups, &nbdb.LoadBalancerGroup{UUID: lbGroupUUID})
}
if err := libovsdbops.DeleteLoadBalancerGroups(oc.nbClient, lbGroups); err != nil {
klog.Errorf("Failed to delete load balancer groups on network: %q, error: %v", oc.GetNetworkName(), err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best effort with no retries is intentional?

Comment thread test/e2e/e2e.go
retryTimeout = 40 * time.Second // polling timeout
rolloutTimeout = 10 * time.Minute
agnhostImage = "registry.k8s.io/e2e-test-images/agnhost:2.26"
agnhostImageNew = "registry.k8s.io/e2e-test-images/agnhost:2.53"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tim said when he updated it to 53 EIP tests started failing, in lieu of time he added the 53 version on top of 26.

@tssurya

tssurya commented Jan 17, 2025

Copy link
Copy Markdown
Contributor

Summarizing 1 Failure:
  [FAIL] Network Segmentation ClusterUserDefinedNetwork CRD Controller pod connected to ClusterUserDefinedNetwork [BeforeEach] CR & managed NADs cannot be deleted when being used
  /home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/network_segmentation.go:2007

Ran 56 of 431 Specs in 1801.979 seconds

known flake, merging this PR

@tssurya tssurya merged commit b52daaa into ovn-kubernetes:master Jan 17, 2025
@EdDev

EdDev commented Jan 19, 2025

Copy link
Copy Markdown

I am feedbacking too late here, but AFAIU this seems not to be in sync with the original UDN requirements.

Per the design user-stories [1]:

As a user, I want to be able to request a unique, primary network for my namespace without having to get administrator permission.

A project-admin should be able to enable a UDN network without a cluster-admin assistance.
But per the change in this PR, only the cluster-admin can now create the namespace that has UDN abilities.
It seems to be problematic for projects (e.g. virtualization) that depend on this.

I am not 100% clear if this change is indeed limiting the user story above and/or if it was considered.

[1] https://github.com/openshift/enhancements/blob/4065a3c36352bec19fd144daa526e4c24fbd1aed/enhancements/network/user-defined-network-segmentation.md?plain=1#L77-L78

maiqueb added a commit to maiqueb/fosdem2025-p-udn that referenced this pull request Jan 20, 2025
Since ovn-kubernetes/ovn-kubernetes#4912 was
merged, the namespaces *must* feature the
`k8s.ovn.org/primary-user-defined-network` annotation for plumbing the
UDN.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
RamLavi added a commit to RamLavi/ipam-extensions that referenced this pull request Jan 23, 2025
Aligning with Ovn-kubernetes change [0], now namespaces that deploy
primary-udn workloads need to be created with a specific label.

[0] ovn-kubernetes/ovn-kubernetes#4912

Signed-off-by: Ram Lavi <ralavi@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/e2e-testing area/unit-testing Issues related to adding/updating unit tests feature/egress-ip Issues related to EgressIP feature feature/kubevirt-live-migration All issues related to kubevirt live migration feature/user-defined-networks All PRs related to User defined network segmentation kind/bug All issues that are bugs and PRs opened to fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants