Enforce udn ns label#4912
Conversation
|
Note: e2es will fail because they have not been updated to add the annotation, but that's fine for now to see the guard work as we expect |
faeaa83 to
ff0e5a7
Compare
ff0e5a7 to
474c68d
Compare
ormergi
left a comment
There was a problem hiding this comment.
Thanks of the PR, Please see my inline comments
| _, err := cs.CoreV1().Namespaces().Create(context.Background(), &v1.Namespace{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: defaultNetNamespace, | ||
| Name: defaultNetNamespace, | ||
| Labels: map[string]string{RequiredUDNNamespaceLabel: ""}, |
There was a problem hiding this comment.
nit: Commenting here but it related to other places as well.
Please consider introducing a helper to create the test namespace, it will reduce the noise in all places that create the namespace, and make it easier to introduce additional changes to such namespaces.
_, err := cs.CoreV1().Namespaces().Create(context.Background(), newTestNamespace(defaultNetNamespace))...
...
func newTestNamesapce(name string) *Namespace {
return &v1.Namespace{ ObjectMeta: metav1.ObjectMeta{
Name: name,
Labels: map[string]string{RequiredUDNNamespaceLabel: ""},
}}
|
|
||
| By("create the new target namespace") | ||
| _, err = cs.CoreV1().Namespaces().Create(context.Background(), &v1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: testNewNs}}, metav1.CreateOptions{}) | ||
| _, err = cs.CoreV1().Namespaces().Create(context.Background(), &v1.Namespace{ |
There was a problem hiding this comment.
Q: It seem this test create CUDN with role=secondary at the before-each, does it fail w/o this change?
There was a problem hiding this comment.
no, i just looked for everywhere we create namespace and added it for net seg tests. It's not obvious to me without going through each test what is secondary or primary. I dont have time to do that.
|
|
||
| By("create new namespace") | ||
| _, err := cs.CoreV1().Namespaces().Create(context.Background(), &v1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: testNewNs}}, metav1.CreateOptions{}) | ||
| _, err := cs.CoreV1().Namespaces().Create(context.Background(), &v1.Namespace{ |
There was a problem hiding this comment.
Q: It seem this test create CUDN with role=secondary at the before-each, does it fail w/o this change?
| } | ||
| } | ||
|
|
||
| func invalidTestNamespace(name string) *corev1.Namespace { |
There was a problem hiding this comment.
super nit: I would define this function to use testNamespace instead of the other way around.
| udn := testUDN() | ||
| expectedNAD := testNAD() | ||
| c = newTestController(renderNadStub(expectedNAD), udn) | ||
| c = newTestController(renderNadStub(expectedNAD), udn, testNamespace("test")) |
There was a problem hiding this comment.
Question: are these tests fail if we dont pass the test-namespace object?
There was a problem hiding this comment.
in almost all cases yes cause it needs to check the namespace, kidn of makes sense for the test with NAD to also include a namespace though
| ) | ||
|
|
||
| func (c *Controller) updateNAD(obj client.Object, namespace string) (*netv1.NetworkAttachmentDefinition, error) { | ||
| if utiludn.IsPrimaryNetwork(template.GetSpec(obj)) { |
There was a problem hiding this comment.
Does it make sense to move the label check under the existing primary-network spec check? (line 54)
There was a problem hiding this comment.
I'm assuming we just want to bail out from this asap ...
|
I had some suggestion around tests, but they can be done on a follow-up PR, sorry for the noise. Also a question about the label check on the UDN controllers but I think I got it. LGTM. |
474c68d to
e0309a2
Compare
maiqueb
left a comment
There was a problem hiding this comment.
From my perspective, this is good.
Thanks for the prompt fix to the nasty race we had.
e0309a2 to
da2bfa4
Compare
| verbs: ["create", "patch", "update"] | ||
|
|
||
| --- | ||
| apiVersion: admissionregistration.k8s.io/v1 |
There was a problem hiding this comment.
@trozet just as a reminder, this will require a separate PR downstream(CNO).
There was a problem hiding this comment.
i also need to get the e2es backward compatible downstream before we merge this
d5bfe48 to
03632a5
Compare
| name: nadName, | ||
| topology: "layer3", | ||
| cidr: fmt.Sprintf("%s,%s", userDefinedNetworkIPv4Subnet, userDefinedNetworkIPv6Subnet), | ||
| cidr: correctCIDRFamily(userDefinedNetworkIPv4Subnet, userDefinedNetworkIPv6Subnet), |
| } | ||
| cachedNetwork := e.getNetworkFromPodAssignment(podKey) | ||
| err = e.nodeZoneState.DoWithLock(statusToRemove.Node, func(key string) error { | ||
| if cachedNetwork == nil { |
There was a problem hiding this comment.
hmm for deletion of EIP status we rely on the NAD being present in that namespace that determines the active network? that does sound wrong cause the NAD / namespace could just be gone i guess in order of events we don't control...
the fix looks correct to me but the fact that not a single unit test had to be changed worries me :D
@martinkennelly can we track this somewhere in TODOs as we need to add a test for the sequence of events of deletion if a NAD doesn't exist and ensure nothing breaks...
meanwhile @trozet why was GetActiveNetworkForNamespace not working in this specific instances? I assume this is some CI issue you happened to fix in this PR right?
There was a problem hiding this comment.
well the tests do fail now, thast how i caught it :) It's because the old getActiveNetworkForNamespace behavior would assume default network if a NAD didn't exist. Now it wont do that anymore if the namespace has the label. So an error was getting thrown here during the test and the status was not being removed in the e2es, causing them to fail. I'm not sure if the false positive was just enough to clean up stuff correctly for egress IP. Will defer to @martinkennelly for that.
multicast tests are failing here |
e45d1ce to
9361363
Compare
k8s.ovn.org/user-defined-network is now required to be labeled on a namespace at namespace creation time in order to use a primary UDN. The following conditions are true: 1. If namespace is missing the label, and a pod is created, it attaches to default network. 2. If the namespace is missing the label, and a primary UDN or CUDN is created that matches that namespace, the UDN/CUDN will report error status and the NAD will not be generated. 3. If the namespace is missing the label, and a primary UDN/CUDN exists, a pod in the namespace will be created and attached to default network. 4. If the namespace has the label, and a primary UDN/CUDN does not exist a pod in the namespace will fail creation until the UDN/CUDN is created. Also includes some fixes to unit tests that were brought to light by this PR. For example, the layer 2 multi-network tests were adding invalid annotations for node-subnets, etc. Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Patryk Diak <pdiak@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
Was using ipv6 on ipv4 cluster. Signed-off-by: Tim Rozet <trozet@redhat.com>
EgressIP was depending on getActiveNetworkFromNamespace to work, or would fail to remove egressIP status. Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
Test ensures that a pod will still come up when a UDN exists, but the UDN required label is missing on the namespace. The pod will be wired to the default cluster network. Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
9361363 to
b8c3d78
Compare
yeah I see them failing for ipv6, in the latest run and this time we have logs and e2e dbs. I fixed these tests to use the correctCIDRFamily and now they seem to fail. I looked at the dbs and I dont see anything obvious. All the 3 servers were created and I see them attached to the UDN, but traffic seems to fail. @dceara do you mind taking a look into this please? |
|
hmm some new flake that I haven't seen before: 2025-01-14T21:27:42.0290463Z �[38;5;9m[FAIL]�[0m �[0mOVN master EgressIP Operations cluster default network �[38;5;243mOn node DELETE �[0m[secondary host network] should perform proper OVN transactions when namespace and pod is created after node egress label switch �[38;5;9m�[1m[It] interconnect enabled; node1 in global and node2 in remote zones�[0m looks legit as the next hop for the stale egress node was not removed: I don't think it is related to this PR, but @martinkennelly heads up |
Fixes test "should be able to send multicast UDP traffic between nodes" which was failing in IPv6 lane due to bugs with an older iperf version. Updates the test case to bind iperf to the right interface (eth0 or ovn-udn1) depending on the test. Test "should be able to receive multicast IGMP query" is skipped on IPv6. I tried to fix it, but it doesn't seem to work. I left some notes there so someone can follow up later to fix the test and unskip it. Signed-off-by: Tim Rozet <trozet@redhat.com>
8504fe1 to
da702b8
Compare
tssurya
left a comment
There was a problem hiding this comment.
I have not reviewed all the new commits, will pause till we are sure we are getting this in
| retryTimeout = 40 * time.Second // polling timeout | ||
| rolloutTimeout = 10 * time.Minute | ||
| agnhostImage = "registry.k8s.io/e2e-test-images/agnhost:2.26" | ||
| agnhostImageNew = "registry.k8s.io/e2e-test-images/agnhost:2.53" |
There was a problem hiding this comment.
why are we not changing Image to be 26 and added a new version here?
There was a problem hiding this comment.
Tim said when he updated it to 53 EIP tests started failing, in lieu of time he added the 53 version on top of 26.
| udnNamespace.Labels = map[string]string{} | ||
| _, err = cs.CoreV1().Namespaces().Update(context.TODO(), udnNamespace, metav1.UpdateOptions{}) | ||
| Expect(err).To(HaveOccurred()) | ||
| Expect(err.Error()).To(ContainSubstring("The 'k8s.ovn.org/primary-user-defined-network' label cannot be added/removed after the namespace was created")) |
| } | ||
| } | ||
|
|
||
| // newLatestAgnhostPod returns a pod that uses the newer agnhost image. The image's binary supports various subcommands |
There was a problem hiding this comment.
not sure why we added new util and image instead of updating this for the default image we use?
| return true | ||
| }) | ||
|
|
||
| // remove load balancer groups |
There was a problem hiding this comment.
can you remind me what the bug was? For UDNs we were not deleting the LBGs? In case you push please add it to the commit description?
There was a problem hiding this comment.
Tim explained it in slack..when UDNs were deleted we were leaving behind stale LBGs
|
tools seems to be failing: |
| lbGroups = append(lbGroups, &nbdb.LoadBalancerGroup{UUID: lbGroupUUID}) | ||
| } | ||
| if err := libovsdbops.DeleteLoadBalancerGroups(oc.nbClient, lbGroups); err != nil { | ||
| klog.Errorf("Failed to delete load balancer groups on network: %q, error: %v", oc.GetNetworkName(), err) |
There was a problem hiding this comment.
best effort with no retries is intentional?
| retryTimeout = 40 * time.Second // polling timeout | ||
| rolloutTimeout = 10 * time.Minute | ||
| agnhostImage = "registry.k8s.io/e2e-test-images/agnhost:2.26" | ||
| agnhostImageNew = "registry.k8s.io/e2e-test-images/agnhost:2.53" |
There was a problem hiding this comment.
Tim said when he updated it to 53 EIP tests started failing, in lieu of time he added the 53 version on top of 26.
known flake, merging this PR |
|
I am feedbacking too late here, but AFAIU this seems not to be in sync with the original UDN requirements. Per the design user-stories [1]:
A project-admin should be able to enable a UDN network without a cluster-admin assistance. I am not 100% clear if this change is indeed limiting the user story above and/or if it was considered. |
Since ovn-kubernetes/ovn-kubernetes#4912 was merged, the namespaces *must* feature the `k8s.ovn.org/primary-user-defined-network` annotation for plumbing the UDN. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Aligning with Ovn-kubernetes change [0], now namespaces that deploy primary-udn workloads need to be created with a specific label. [0] ovn-kubernetes/ovn-kubernetes#4912 Signed-off-by: Ram Lavi <ralavi@redhat.com>
Require namespace label for primary UDN
k8s.ovn.org/primary-user-defined-network is now required to be labeled on a
namespace at namespace creation time in order to use a primary UDN.
The label may not be updated and may only be added at creation time.
The following conditions are true:
to default network.
created that matches that namespace, the UDN/CUDN will report error
status and the NAD will not be generated.
a pod in the namespace will be created and attached to default
network.
a pod in the namespace will fail creation until the UDN/CUDN is
created.
Also includes some fixes to unit tests that were brought to light by
this PR. For example, the layer 2 multi-network tests were adding
invalid annotations for node-subnets, etc.
Related OCP jira:
https://issues.redhat.com/browse/OCPBUGS-42609