Skip to content

OCPBUGS-15544: Add adminpolicybasedexternalroutes rights for ovnkube-node.#1867

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
npinaeva:add-apbroute-node-rbac
Jul 8, 2023
Merged

OCPBUGS-15544: Add adminpolicybasedexternalroutes rights for ovnkube-node.#1867
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
npinaeva:add-apbroute-node-rbac

Conversation

@npinaeva
Copy link
Contributor

@npinaeva npinaeva commented Jul 5, 2023

It shouldn't change the object, therefore get, list, watch should be enough.
Initally added here #1765
Without this change non-OVN-IC deployments have errors like

current.log:2023-06-30T13:10:51.027146349Z E0630 13:10:51.027130    2903 reflector.go:148] github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/adminpolicybasedroute/v1/apis/informers/externalversions/factory.go:116: Failed to watch *v1.AdminPolicyBasedExternalRoute: failed to list *v1.AdminPolicyBasedExternalRoute: adminpolicybasedexternalroutes.k8s.ovn.org is forbidden: User "system:serviceaccount:openshift-ovn-kubernetes:ovn-kubernetes-node" cannot list resource "adminpolicybasedexternalroutes" in API group "k8s.ovn.org" at the cluster scope

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 5, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@npinaeva npinaeva marked this pull request as ready for review July 5, 2023 18:37
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2023
@npinaeva
Copy link
Contributor Author

npinaeva commented Jul 5, 2023

@openshift-ci openshift-ci bot requested review from jcaamano and tssurya July 5, 2023 18:41
@npinaeva
Copy link
Contributor Author

npinaeva commented Jul 6, 2023

/retest-required

@npinaeva
Copy link
Contributor Author

npinaeva commented Jul 6, 2023

looks like this PR solves memory leak problem: you can pick one job from this PR e.g. https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/1867/pull-ci-openshift-cluster-network-operator-master-e2e-azure-ovn/1676661712515764224 and same job for another pr e.g. https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/1866/pull-ci-openshift-cluster-network-operator-master-e2e-azure-ovn/1676458790066589696
then spin up PromeCleus pod, and look at mem usage with node_namespace_pod_container:container_memory_rss{pod=~"ovnkube-.*", container="ovnkube-node"}
you will see old metrics (max 410 MB)
image
vs new metrics (max 60 MB)
image

@npinaeva npinaeva changed the title Add adminpolicybasedexternalroutes rights for ovnkube-node. OCPBUGS-15544: Add adminpolicybasedexternalroutes rights for ovnkube-node. Jul 6, 2023
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jul 6, 2023
@openshift-ci-robot
Copy link
Contributor

@npinaeva: This pull request references Jira Issue OCPBUGS-15544, which is invalid:

  • expected the bug to target the "4.14.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

It shouldn't change the object, therefore get, list, watch should be enough.
Initally added here #1765
Without this change non-OVN-IC deployments have errors like

current.log:2023-06-30T13:10:51.027146349Z E0630 13:10:51.027130    2903 reflector.go:148] github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/adminpolicybasedroute/v1/apis/informers/externalversions/factory.go:116: Failed to watch *v1.AdminPolicyBasedExternalRoute: failed to list *v1.AdminPolicyBasedExternalRoute: adminpolicybasedexternalroutes.k8s.ovn.org is forbidden: User "system:serviceaccount:openshift-ovn-kubernetes:ovn-kubernetes-node" cannot list resource "adminpolicybasedexternalroutes" in API group "k8s.ovn.org" at the cluster scope

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jordigilh
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 6, 2023
@npinaeva
Copy link
Contributor Author

npinaeva commented Jul 6, 2023

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jul 6, 2023
@openshift-ci-robot
Copy link
Contributor

@npinaeva: This pull request references Jira Issue OCPBUGS-15544, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @anuragthehatter

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from anuragthehatter July 6, 2023 07:22
@jcaamano
Copy link
Contributor

jcaamano commented Jul 6, 2023

perhaps add it to bindata/network/ovn-kubernetes/common/007-rbac-cluster-reader.yaml as well?

@npinaeva npinaeva force-pushed the add-apbroute-node-rbac branch from 2beadee to 0a43705 Compare July 6, 2023 08:27
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 6, 2023
@npinaeva
Copy link
Contributor Author

npinaeva commented Jul 6, 2023

nice catch! I didn't even know that file exists :)

@jcaamano
Copy link
Contributor

jcaamano commented Jul 6, 2023

Would this be a good time to add adminpolicybasedexternalroutes/status for the write verbs?
That's what we usually add for controllers with CRDs that update status.

@jcaamano
Copy link
Contributor

jcaamano commented Jul 6, 2023

Would this be a good time to add adminpolicybasedexternalroutes/status for the write verbs? That's what we usually add for controllers with CRDs that update status.

I guess we can do this after. PTAL @jordigilh

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 6, 2023
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 6, 2023
@jordigilh
Copy link
Contributor

Would this be a good time to add adminpolicybasedexternalroutes/status for the write verbs? That's what we usually add for controllers with CRDs that update status.

I guess we can do this after. PTAL @jordigilh

/lgtm /approve

Yeah, would make sense to have only access to the status in the controller rather than the whole object. I will raise another PR. Thanks for the suggestion 😄

@npinaeva
Copy link
Contributor Author

npinaeva commented Jul 6, 2023

@jcaamano write verbs are only used by masters (before we default to ic) and it was added here #1765

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 64ab110 and 2 for PR HEAD 0a43705 in total

@jordigilh
Copy link
Contributor

@jcaamano write verbs are only used by masters (before we default to ic) and it was added here #1765

I guess what he means is to change this line to point to the status
https://github.com/jordigilh/cluster-network-operator/blob/cce5df4245f16688caca5b1c543376f181c0be00/bindata/network/ovn-kubernetes/common/003-rbac-controller.yaml#L90

- apiGroups: ["k8s.ovn.org"]
  resources:
  - egressfirewalls
...
...
  - "adminpolicybasedexternalroutes/status"

@jcaamano
Copy link
Contributor

jcaamano commented Jul 6, 2023

@jcaamano write verbs are only used by masters (before we default to ic) and it was added here #1765

I guess what he means is to change this line to point to the status https://github.com/jordigilh/cluster-network-operator/blob/cce5df4245f16688caca5b1c543376f181c0be00/bindata/network/ovn-kubernetes/common/003-rbac-controller.yaml#L90

- apiGroups: ["k8s.ovn.org"]
  resources:
  - egressfirewalls
...
...
  - "adminpolicybasedexternalroutes/status"

Yeah, you need to take care also in the code that you are just updating the status (edit: using the UpdateStatus method)

But it's just the same with egressips, egresssvcs, ... so we can also do this as a general effort.

edit: and we still need the read permission on the overall resource

@jordigilh
Copy link
Contributor

@jcaamano write verbs are only used by masters (before we default to ic) and it was added here #1765

I guess what he means is to change this line to point to the status https://github.com/jordigilh/cluster-network-operator/blob/cce5df4245f16688caca5b1c543376f181c0be00/bindata/network/ovn-kubernetes/common/003-rbac-controller.yaml#L90

- apiGroups: ["k8s.ovn.org"]
  resources:
  - egressfirewalls
...
...
  - "adminpolicybasedexternalroutes/status"

Yeah, you need to take care also in the code that you are just updating the status (edit: using the UpdateStatus method)

But it's just the same with egressips, egresssvcs, ... so we can also do this as a general effort.

edit: and we still need the read permission on the overall resource

Yep, it is already using UpdateStatus().

@dcbw
Copy link
Contributor

dcbw commented Jul 6, 2023

@npinaeva awesome that it fixes the memleak thing... can you put the details of why this happens into the commit message too though?

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 329e328 and 1 for PR HEAD 0a43705 in total

It shouldn't change the object, therefore get, list, watch should be
enough.

The memory leak was hapenning because ovn-k doesn't fail when a
controller can't watch resources, but it references other objects
like pods and namespaces that have their own workqueues and can't
be handled because the main watcher didn't start, thus the queues
are overflowed.

Add adminpolicybasedexternalroutes to cluster-reader permissions
together with the other k8s.ovn.org resources

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
@npinaeva npinaeva force-pushed the add-apbroute-node-rbac branch from 0a43705 to 98c7509 Compare July 6, 2023 17:05
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 6, 2023
@npinaeva
Copy link
Contributor Author

npinaeva commented Jul 6, 2023

added some memory leak details to the comment

@jcaamano
Copy link
Contributor

jcaamano commented Jul 7, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 7, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 7, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcaamano, jordigilh, npinaeva

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 660d5bc and 2 for PR HEAD 98c7509 in total

@npinaeva
Copy link
Contributor Author

npinaeva commented Jul 7, 2023

/retest-required

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 3a5af01 and 1 for PR HEAD 98c7509 in total

@dcbw
Copy link
Contributor

dcbw commented Jul 8, 2023

/override ci/prow/e2e-metal-ipi-ovn-ipv6
must-gather failed, not the tests

/override ci/prow/e2e-vsphere-ovn-windows

 time="2023-07-07T19:40:43Z" level=info msg="  Found ClusterServiceVersion \"openshift-windows-machine-config-operator/windows-machine-config-operator.v9.0.0\" phase: Installing"
time="2023-07-07T19:41:29Z" level=fatal msg="Failed to run packagemanifests: error waiting for CSV to install: etcdserver: request timed out\n"
Waiting for deployment "windows-machine-config-operator" rollout to finish: 0 of 1 updated replicas are available...
error: timed out waiting for the condition

/override ci/prow/e2e-gcp-ovn
Fixed by openshift/ovn-kubernetes#1747

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 8, 2023

@dcbw: Overrode contexts on behalf of dcbw: ci/prow/e2e-gcp-ovn, ci/prow/e2e-metal-ipi-ovn-ipv6, ci/prow/e2e-vsphere-ovn-windows

Details

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-ipv6
must-gather failed, not the tests

/override ci/prow/e2e-vsphere-ovn-windows

time="2023-07-07T19:40:43Z" level=info msg="  Found ClusterServiceVersion \"openshift-windows-machine-config-operator/windows-machine-config-operator.v9.0.0\" phase: Installing"
time="2023-07-07T19:41:29Z" level=fatal msg="Failed to run packagemanifests: error waiting for CSV to install: etcdserver: request timed out\n"
Waiting for deployment "windows-machine-config-operator" rollout to finish: 0 of 1 updated replicas are available...
error: timed out waiting for the condition

/override ci/prow/e2e-gcp-ovn
Fixed by openshift/ovn-kubernetes#1747

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 8, 2023

@npinaeva: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-network-mtu-migration-ovn-ipv4 98c7509 link false /test e2e-network-mtu-migration-ovn-ipv4
ci/prow/e2e-vsphere-ovn-dualstack 98c7509 link false /test e2e-vsphere-ovn-dualstack
ci/prow/e2e-ovn-hybrid-step-registry 98c7509 link false /test e2e-ovn-hybrid-step-registry
ci/prow/e2e-openstack-ovn 98c7509 link false /test e2e-openstack-ovn
ci/prow/e2e-ovn-ipsec-step-registry 98c7509 link false /test e2e-ovn-ipsec-step-registry

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 47c5d16 into openshift:master Jul 8, 2023
@openshift-ci-robot
Copy link
Contributor

@npinaeva: Jira Issue OCPBUGS-15544: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-15544 has been moved to the MODIFIED state.

Details

In response to this:

It shouldn't change the object, therefore get, list, watch should be enough.
Initally added here #1765
Without this change non-OVN-IC deployments have errors like

current.log:2023-06-30T13:10:51.027146349Z E0630 13:10:51.027130    2903 reflector.go:148] github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/adminpolicybasedroute/v1/apis/informers/externalversions/factory.go:116: Failed to watch *v1.AdminPolicyBasedExternalRoute: failed to list *v1.AdminPolicyBasedExternalRoute: adminpolicybasedexternalroutes.k8s.ovn.org is forbidden: User "system:serviceaccount:openshift-ovn-kubernetes:ovn-kubernetes-node" cannot list resource "adminpolicybasedexternalroutes" in API group "k8s.ovn.org" at the cluster scope

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants