UDN workloads improvements#399
Conversation
9b9e5ff to
452d8e1
Compare
| - objectTemplate: udn_l2.yml | ||
| replicas: 1 | ||
| {{ end }} | ||
| waitOptions: |
There was a problem hiding this comment.
As discussed with @mohit-sheth, this check is not valid to verify that a UDN is ready
There was a problem hiding this comment.
@rsevilla87 @mohit-sheth @jtaleric I thik you can use NetworkAllocationSucceeded instead of NetworkCreated. In my 72 CUDNs, I have observed, I have observed some UDNs taking 30 seconds to report NetworkAllocationSucceeded status, check this gist https://gist.github.com/venkataanil/9a046c3e86400a7a4248fd3b01c32fb5
Here P99 podreadylatency is closely matching with that
time="2026-02-23 19:00:18" level=info msg="udn-density-l3-pods: Initialized 99th: 1000 ms max: 1000 ms avg: 16 ms" file="base_measurement.go:127"
time="2026-02-23 19:00:18" level=info msg="udn-density-l3-pods: Ready 99th: 34000 ms max: 36000 ms avg: 23252 ms" file="base_measurement.go:127"
time="2026-02-23 19:00:18" level=info msg="udn-density-l3-pods: PodReadyToStartContainers 99th: 34000 ms max: 36000 ms avg: 23252 ms" file="base_measurement.go:127"
time="2026-02-23 19:00:18" level=info msg="udn-density-l3-pods: ContainersStarted 99th: 34404 ms max: 36086 ms avg: 23311 ms" file="base_measurement.go:127"
time="2026-02-23 19:00:18" level=info msg="udn-density-l3-pods: PodScheduled 99th: 469 ms max: 1000 ms avg: 12 ms" file="base_measurement.go:127"
time="2026-02-23 19:00:18" level=info msg="udn-density-l3-pods: ContainersReady 99th: 34000 ms max: 36000 ms avg: 23252 ms" file="base_measurement.go:127"
So I think we can use NetworkAllocationSucceeded I assume this means OVNK has done with creating reasources in NBDB. At least this can be a confimation for us to proceed creating pods though OVS flows might not be programmed.
I have also tried a go program (https://gist.github.com/venkataanil/ddc254653d95863d1d60bb0dc827212a which is generated with AI assistnace) which watches for logical switches and routers in all nodes NBDBs for the corresponding UDN, during this testing, and observed similar latency of 34 sec, see this output of the go program run https://gist.github.com/venkataanil/d5d513be46d6377f7a8751f669ce2490
To run this go program, you need to run
oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node -o name |
xargs -I{} oc exec -n openshift-ovn-kubernetes -c nbdb {} --
ovn-nbctl set-connection ptcp:6641:0.0.0.0 punix:/var/run/ovn/ovnnb_db.sock
and then run the go program like
go run ./testudnstatus2.go --kubeconfig=/root/mno/kubeconfig
452d8e1 to
55d7c8a
Compare
ceffd18 to
ea9af23
Compare
Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
Signed-off-by: Raul Sevilla <rsevilla@redhat.com>
e4cfeb1 to
bb339aa
Compare
Type of change
Description
This refactoring improves the reliability of UDN workload testing by ensuring proper network setup validation and standardizing configuration patterns across different UDN workload types.
Related Tickets & Documents