OCPBUGS-8080,OCPBUGS-8280,OCPBUGS-8278,OCPBUGS-7988,OCPBUGS-7932,OCPBUGS-9990: Downstream Merge [10-mar-2023]#1574
Conversation
NAD controller would wait for the workers to be terminated before shutting down the worker queue. In turn the workers could be endlessly waiting for the queue to be shut down. This results in the NAD controller never actually stopping. Improved also how the network controllers are stopped for which we wait until the NAD controller workers have effectively terminated. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
This commit adds a section to the user documentation explaining how to configure the following data for a pod's attachment: - IP address - MAC address - pod interface name Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
The markdown was not being properly rendered since the `>` is the character for blockquotes. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
error and never returned. That would prevent retry on failure. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
update "egress firewall with node selector updates during node update" to work for both gateway modes. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
…twork-selection-elements docs, multi-homing: secondary attachment config via net-selection-elements
https://kubernetes.io/blog/2023/02/06/k8s-gcr-io-freeze-announcement/ Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
The default transaction timeout is 10 seconds, it can be reached when we delete all egress firewall acls during migration to port groups from switches. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
stale acls. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
MetricResourceRetryFailuresCount metric is registered in
NetworkControllerManager and Node modes. When both are
specified, the second registration fails with the following
error:
panic: duplicate metrics collector registration attempted
goroutine 234 [running]:
github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0x1e1a662?, {0xc000e35620?, 0x1, 0x15?})
/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/prometheus/client_golang/prometheus/registry.go:403 +0x7f
github.com/prometheus/client_golang/prometheus.MustRegister(...)
/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/github.com/prometheus/client_golang/prometheus/registry.go:178
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/metrics.RegisterMasterFunctional()
/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/metrics/master.go:385 +0x6b8
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/network-controller-manager.(*networkControllerManager).configureMetrics(0xc0003f61e0, 0x48fe77?)
/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/network-controller-manager/network_controller_manager.go:359 +0x36
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/network-controller-manager.(*networkControllerManager).Init(0xc0003f61e0)
/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/network-controller-manager/network_controller_manager.go:298 +0x25
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/network-controller-manager.(*networkControllerManager).Start.func1({0x214bc88, 0xc000e5c1c0})
/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/network-controller-manager/network_controller_manager.go:220 +0x14f
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:211 +0x11b
Signed-off-by: Zenghui Shi <zshi@redhat.com>
Batch potentially big transaction on egress firewall ACLs migration.
first argument is nil. Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
As @tssurya found the min/max of 1 for destinations was hardcoded in the crd j2 file and was never in the types.go. Fix nodeSelector to take a pointer to a label so that max properties works as expected. Also add an additional test case. Signed-off-by: Tim Rozet <trozet@redhat.com>
fedora: Update OVN to latest release - 23.03.0.
Fix NAD controller Stop
Fix duplicated metric registration
Ensure pods can connect to an idled ClusterIP service at the first attempt without getting any TCP Reject or connection timeout. Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
Signed-off-by: Dan Williams <dcbw@redhat.com>
When a handler is added to an informer we must deliver to it all existing objects of its type to match the Shared Informer behavior. For queued informers we spread the adds out over an array of initial-add-specific queues, so that we don't dump a whole set of objects into the queues that other informers are also receiving events from; only the handler being added should get this set of initial objects. Unfortunately the initial add logic to spread objects across the initial-add-specific queues took its data from the regular queues, which caused initial add queues to become unbalanced when the regular queues were processing lots of objects. To fix that, abstract queue management out into a queueMap struct and use that same struct for both regular event delivery and the initial add event delivery. Thus the same logic gets run on different sets of queues, rather than just one set of queues (regular events) which provides the wrong result for the other set (initial add). Signed-off-by: Dan Williams <dcbw@redhat.com>
Instead of making the actual handlers do the queue management push that down into queueMap itself. Signed-off-by: Dan Williams <dcbw@redhat.com>
Egress firewall fix retry
Fix egress firewall CRD
In the hairpin SNAT e2e test case I was pinning
the nodePort of the service created when that was not
really necessary.
So when in times of flake, CI re-runs this test we run into
2023-03-07T09:33:02.1112035Z [1mSTEP[0m: creating a TCP service service-for-pods with type=NodePort in namespace service-hairpin-test-2286
2023-03-07T09:33:02.1162862Z Mar 7 09:33:02.116: FAIL: unable to create service: service-for-pods, err: Failed to create service service-for-pods service-hairpin-test-2286: Service "service-for-pods" is invalid: spec.ports[0].nodePort: Invalid value: 32766: provided port is already allocated
2023-03-07T09:33:02.1163500Z Unexpected error:
2023-03-07T09:33:02.1163792Z <*errors.withStack | 0xc000f8d188>: {
2023-03-07T09:33:02.1164046Z error: {
2023-03-07T09:33:02.1164293Z cause: {
2023-03-07T09:33:02.1164587Z ErrStatus: {
2023-03-07T09:33:02.1165066Z TypeMeta: {Kind: "", APIVersion: ""},
2023-03-07T09:33:02.1165422Z ListMeta: {
2023-03-07T09:33:02.1165788Z SelfLink: "",
2023-03-07T09:33:02.1166216Z ResourceVersion: "",
2023-03-07T09:33:02.1166603Z Continue: "",
2023-03-07T09:33:02.1167076Z RemainingItemCount: nil,
2023-03-07T09:33:02.1167368Z },
2023-03-07T09:33:02.1167724Z Status: "Failure",
2023-03-07T09:33:02.1168903Z Message: "Service \"service-for-pods\" is invalid: spec.ports[0].nodePort: Invalid value: 32766: provided port is already allocated",
2023-03-07T09:33:02.1169419Z Reason: "Invalid",
2023-03-07T09:33:02.1169724Z Details: {
2023-03-07T09:33:02.1170235Z Name: "service-for-pods",
2023-03-07T09:33:02.1170621Z Group: "",
2023-03-07T09:33:02.1171007Z Kind: "Service",
2023-03-07T09:33:02.1171345Z UID: "",
2023-03-07T09:33:02.1171687Z Causes: [
2023-03-07T09:33:02.1172001Z {
2023-03-07T09:33:02.1172560Z Type: "FieldValueInvalid",
2023-03-07T09:33:02.1173522Z Message: "Invalid value: 32766: provided port is already allocated",
2023-03-07T09:33:02.1174529Z Field: "spec.ports[0].nodePort",
2023-03-07T09:33:02.1174889Z },
2023-03-07T09:33:02.1175181Z ],
2023-03-07T09:33:02.1175605Z RetryAfterSeconds: 0,
2023-03-07T09:33:02.1175885Z },
2023-03-07T09:33:02.1176188Z Code: 422,
2023-03-07T09:33:02.1176523Z },
2023-03-07T09:33:02.1176734Z },
2023-03-07T09:33:02.1177371Z msg: "Failed to create service service-for-pods service-hairpin-test-2286",
2023-03-07T09:33:02.1177681Z },
2023-03-07T09:33:02.1178319Z stack: [0x170e01c, 0x1780a72, 0x76cbb1, 0x76c5a5, 0x76bc9b, 0x76f2aa, 0x76eca7, 0x78f548, 0x78f265, 0x78e905, 0x790cd2, 0x79cfe9, 0x79cdf6, 0x17914c5, 0x520182, 0x46d7e1],
2023-03-07T09:33:02.1178610Z }
Let's just let random ports get assigned. Test can query the service and
figure out the nodePort.
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
when addLogicalPortToNetwork() fails after it successfully allocated pod IPs and updated the pod annotation, there is a case new pod IPs can be allocated to the pod during retry if the informer cache does not cache up with the latest pod annotation update. This would cause leak of the old IPs allocated to the pod. Signed-off-by: Yun Zhou <yunz@nvidia.com>
|
@trozet @jluhrsen Attempting a fix for the upgrade jobs: openshift/release#37558 |
|
@trozet: Overrode contexts on behalf of trozet: ci/prow/4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade, ci/prow/4.14-upgrade-from-stable-4.13-images, ci/prow/4.14-upgrade-from-stable-4.13-local-gateway-e2e-aws-ovn-upgrade, ci/prow/4.14-upgrade-from-stable-4.13-local-gateway-images DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest e2e-aws-ovn-upgrade-local-gateway |
|
@jcaamano: The
The following commands are available to trigger optional jobs:
Use
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test e2e-aws-ovn-upgrade-local-gateway |
|
@zshi-redhat: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dcbw, npinaeva, trozet, zshi-redhat The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel |
|
/tide refresh |
|
@zshi-redhat: Jira Issue OCPBUGS-8080: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-8080 has been moved to the MODIFIED state. Jira Issue OCPBUGS-8280: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-8280 has been moved to the MODIFIED state. Jira Issue OCPBUGS-8278: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-8278 has been moved to the MODIFIED state. Jira Issue OCPBUGS-7988: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-7988 has been moved to the MODIFIED state. Jira Issue OCPBUGS-7932: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-7932 has been moved to the MODIFIED state. Jira Issue OCPBUGS-9990: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-9990 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We added a new package called: github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util/batching starting in openshift/ovn-kubernetes#1574. Since then we have not been running UTs because the parser gets screwed as we escape on github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util currently. So what happens is: PKGS=$(go list -mod vendor -f '{{if len .TestGoFiles}} {{.ImportPath}} {{end}}' ${PKGS:-./cmd/... ./pkg/... ./hybrid-overlay/...} | xargs) PKGS=${PKGS//"github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util"/ } PKGS=$PKGS make test NOROOT=TRUE results in value of PKGS containing "/batching" which leads to stat /batching: directory not found thus blocking the UTs from running. This PR fixes this by escaping util/batching as well. If we want to enable this in the future we can do that, for now fixing the introduced regression in tests. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
We added a new package called: github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util/batching starting in openshift/ovn-kubernetes#1574. Since then we have not been running UTs because the parser gets screwed as we escape on github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util currently. So what happens is: PKGS=$(go list -mod vendor -f '{{if len .TestGoFiles}} {{.ImportPath}} {{end}}' ${PKGS:-./cmd/... ./pkg/... ./hybrid-overlay/...} | xargs) PKGS=${PKGS//"github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util"/ } PKGS=$PKGS make test NOROOT=TRUE results in value of PKGS containing "/batching" which leads to stat /batching: directory not found thus blocking the UTs from running. This PR fixes this by escaping util/batching as well. If we want to enable this in the future we can do that, for now fixing the introduced regression in tests. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Apply to YAMLs for 4.11, 4.12, 4.13 the fix for running unit tests downstream that was merged for master branch with commit 42124ca. We added a new package called: github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util/batching starting in openshift/ovn-kubernetes#1574. Since then we have not been running UTs because the parser breaks as we escape on github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util currently. So what happens is: PKGS=$(go list -mod vendor -f '{{if len .TestGoFiles}} {{.ImportPath}} {{end}}' ${PKGS:-./cmd/... ./pkg/... ./hybrid-overlay/...} | xargs) PKGS=${PKGS//"github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util"/ } PKGS=$PKGS make test NOROOT=TRUE results in value of PKGS containing "/batching" which leads to stat /batching: directory not found thus blocking the UTs from running. This PR fixes this by escaping util/batching as well. If we want to enable this in the future we can do that, for now fixing the introduced regression in tests. Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Apply to YAMLs for 4.11, 4.12, 4.13 the fix for running unit tests downstream that was merged for master branch with commit 42124ca. We added a new package called: github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util/batching starting in openshift/ovn-kubernetes#1574. Since then we have not been running UTs because the parser breaks as we escape on github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util currently. So what happens is: PKGS=$(go list -mod vendor -f '{{if len .TestGoFiles}} {{.ImportPath}} {{end}}' ${PKGS:-./cmd/... ./pkg/... ./hybrid-overlay/...} | xargs) PKGS=${PKGS//"github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util"/ } PKGS=$PKGS make test NOROOT=TRUE results in value of PKGS containing "/batching" which leads to stat /batching: directory not found thus blocking the UTs from running. This PR fixes this by escaping util/batching as well. If we want to enable this in the future we can do that, for now fixing the introduced regression in tests. Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
clean merge.