Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛: securitygroup: fix comparison of ingress rules sets. #5024

Merged
merged 3 commits into from
Jul 10, 2024

Conversation

r4f4
Copy link
Contributor

@r4f4 r4f4 commented Jun 13, 2024

What type of PR is this?
/kind bug

What this PR does / why we need it:

We are comparing sets that are not compatible, so rules will always be revoked and authorized during reconciliation. We need to expand the rules from the spec so there is one rule for each item in cidrBlock/sourceSecurityGroupIds.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #5023

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emojis
  • adds unit tests
  • adds or updates e2e tests

Release note:

Fix comparison of ingress rules during Security Group reconciliation to avoid unnecessary revokes and authorizes.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority labels Jun 13, 2024
@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 13, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @r4f4. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 13, 2024

Without this change, I'd see a constant stream of revoke-authorize ingress rules in the debug logs [1]. Now I see the rules authorized once and no revokes (will add logs once I have more signal from openshift CI).

[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/8545/pull-ci-openshift-installer-master-e2e-aws-ovn/1801267394513997824/artifacts/e2e-aws-ovn/ipi-install-install/artifacts/.openshift_install-1718293235.log

Copy link
Contributor

@mtulio mtulio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM, very good fix/improvement in the provision phase. 🎉

@r4f4 Do you think we are covered in the ReconcileSecurityGroups or would be hard to reproduce in new case as revoke is in the same function, and/or would be nice to create a test dedicated to exercise expandIngressRules()?

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 13, 2024

Do you think we are covered in the ReconcileSecurityGroups or would be hard to reproduce in new case as revoke is in the same function, and/or would be nice to create a test dedicated to exercise expandIngressRules()?

Good call. I can, at a minimum, add unit tests for the expandIngressRules() function. Will do that while I wait for CI results.

r4f4 added 2 commits June 13, 2024 21:37
We are comparing sets that are not compatible, so rules will always be
revoked and authorized during reconciliation. We need to expand the
rules from the spec so there is one rule for each item in
cidrBlock/sourceSecurityGroupIds.
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 13, 2024
@r4f4
Copy link
Contributor Author

r4f4 commented Jun 13, 2024

Updated: add a few fixes and unit tests.

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 14, 2024

Openshift CI results are all green! openshift/installer#8596. Looking at the logs, the only rule being revoked is the one we explicitely removed from the cluster spec:

time="2024-06-13T21:27:14Z" level=debug msg="I0613 21:27:14.705844     356 securitygroups.go:181] \"Revoked ingress rules from security group\" controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/ci-op-x5n2dn57-2d061-fxccx\" namespace=\"openshift-cluster-api-guests\" name=\"ci-op-x5n2dn57-2d061-fxccx\" reconcileID=\"1eebbaa2-4a91-478f-91df-6d0cf652d71c\" cluster=\"openshift-cluster-api-guests/ci-op-x5n2dn57-2d061-fxccx\" revoked-ingress-rules=[{\"description\":\"Bootstrap SSH Access\",\"protocol\":\"tcp\",\"fromPort\":22,\"toPort\":22,\"cidrBlocks\":[\"0.0.0.0/0\"]}] security-group-id=\"sg-0193bb8269722f9e3\""
time="2024-06-13T21:27:14Z" level=debug msg="I0613 21:27:14.705926     356 securitygroups.go:150] \"second pass security group reconciliation\" controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/ci-op-x5n2dn57-2d061-fxccx\" namespace=\"openshift-cluster-api-guests\" reconcileID=\"1eebbaa2-4a91-478f-91df-6d0cf652d71c\" cluster=\"openshift-cluster-api-guests/ci-op-x5n2dn57-2d061-fxccx\" group-id=\"sg-06d4855d6bc549b25\" name=\"ci-op-x5n2dn57-2d061-fxccx-lb\" role=\"lb\""

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/8596/pull-ci-openshift-installer-master-altinfra-e2e-aws-ovn/1801355744365252608/artifacts/e2e-aws-ovn/ipi-install-install/artifacts/.openshift_install-1718314761.log

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 14, 2024

@damdo would you mind enabling tests on this PR?

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 14, 2024

/cc @alexander-demicev
I see you've recently changed this code. WDYT of this fix?

This test checks that target ingress rules that have already been
authorized are not revoked and then authorized again.
@r4f4
Copy link
Contributor Author

r4f4 commented Jun 16, 2024

@mtulio following your suggestion, I've added a unit test to check that we are not revoking ingress rules unnecessarily.

@mtulio
Copy link
Contributor

mtulio commented Jun 18, 2024

/lgtm

@k8s-ci-robot
Copy link
Contributor

@mtulio: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@nrb
Copy link
Contributor

nrb commented Jun 20, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 20, 2024
@r4f4
Copy link
Contributor Author

r4f4 commented Jun 20, 2024

Tests are all green here as well!

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 20, 2024

/test ?

@k8s-ci-robot
Copy link
Contributor

@r4f4: The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-aws-build
  • /test pull-cluster-api-provider-aws-build-docker
  • /test pull-cluster-api-provider-aws-test
  • /test pull-cluster-api-provider-aws-verify

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-aws-apidiff-main
  • /test pull-cluster-api-provider-aws-e2e
  • /test pull-cluster-api-provider-aws-e2e-blocking
  • /test pull-cluster-api-provider-aws-e2e-clusterclass
  • /test pull-cluster-api-provider-aws-e2e-conformance
  • /test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
  • /test pull-cluster-api-provider-aws-e2e-eks
  • /test pull-cluster-api-provider-aws-e2e-eks-gc
  • /test pull-cluster-api-provider-aws-e2e-eks-testing

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-aws-apidiff-main
  • pull-cluster-api-provider-aws-build
  • pull-cluster-api-provider-aws-build-docker
  • pull-cluster-api-provider-aws-test
  • pull-cluster-api-provider-aws-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 20, 2024

/test pull-cluster-api-provider-aws-e2e pull-cluster-api-provider-aws-e2e-blocking

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 20, 2024

aws-e2e failure doesn't seem to be cause by the PR changes:

   Should've eventually succeeded creating an AWS CloudFormation stack
  Expected
      <bool>: false
  to be true
  In [SynchronizedBeforeSuite] at: /home/prow/go/src/sigs.k8s.io/cluster-api-provider-aws/test/e2e/shared/suite.go:147 @ 06/20/24 17:24:37.437 

/test pull-cluster-api-provider-aws-e2e

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 20, 2024

aws-e2e: the 2 e2e failures don't look related to the PR changes but will retest again to be sure.

Test from this PR

In the log of this PR no revokes until the cluster is deleted.

Test from another PR

Looking at an old test log:

I0611 16:19:08.741022       1 securitygroups.go:179] "Revoked ingress rules from security group" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="self-hosted-yr5v5v/self-hosted-70i0jt-klt6x" namespace="self-hosted-yr5v5v" name="self-hosted-70i0jt-klt6x" reconcileID="df380180-8af8-45d2-a1c0-fd7d1108558e" cluster="self-hosted-yr5v5v/self-hosted-70i0jt" revoked-ingress-rules=[{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-04c31b37099612208"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-0799511b5638468a1"]},{"description":"bgp (calico)","protocol":"tcp","fromPort":179,"toPort":179,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28"]},{"description":"bgp (calico)","protocol":"tcp","fromPort":179,"toPort":179,"sourceSecurityGroupIds":["sg-0799511b5638468a1"]},{"description":"IP-in-IP (calico)","protocol":"4","fromPort":0,"toPort":0,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28"]},{"description":"IP-in-IP (calico)","protocol":"4","fromPort":0,"toPort":0,"sourceSecurityGroupIds":["sg-0799511b5638468a1"]}] security-group-id="sg-02a4d2faaf7102a28"
I0611 16:19:09.000272       1 securitygroups.go:193] "Authorized ingress rules in security group" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="self-hosted-yr5v5v/self-hosted-70i0jt-klt6x" namespace="self-hosted-yr5v5v" name="self-hosted-70i0jt-klt6x" reconcileID="df380180-8af8-45d2-a1c0-fd7d1108558e" cluster="self-hosted-yr5v5v/self-hosted-70i0jt" authorized-ingress-rules=[{"description":"bgp (calico)","protocol":"tcp","fromPort":179,"toPort":179,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28","sg-0799511b5638468a1"]},{"description":"IP-in-IP (calico)","protocol":"4","fromPort":-1,"toPort":65535,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28","sg-0799511b5638468a1"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-04c31b37099612208","sg-02a4d2faaf7102a28","sg-0799511b5638468a1"]}] security-group-id="sg-02a4d2faaf7102a28"

So the following rules are being revoked:

[{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-04c31b37099612208"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-0799511b5638468a1"]}

Then being authorized as:

{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-04c31b37099612208","sg-02a4d2faaf7102a28","sg-0799511b5638468a1"]}

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 20, 2024

/test pull-cluster-api-provider-aws-e2e

1 similar comment
@r4f4
Copy link
Contributor Author

r4f4 commented Jun 21, 2024

/test pull-cluster-api-provider-aws-e2e

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 21, 2024

/assign @nrb

Tests look good. This is ready for review.

Copy link
Contributor

@alexander-demicev alexander-demicev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks taking care of this
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 21, 2024
@richardcase
Copy link
Member

Thanks @r4f4

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: richardcase

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 10, 2024
@k8s-ci-robot k8s-ci-robot merged commit d375b7d into kubernetes-sigs:main Jul 10, 2024
25 checks passed
r4f4 added a commit to r4f4/installer that referenced this pull request Jul 10, 2024
* Brings in this fix
  kubernetes-sigs/cluster-api-provider-aws#5024
  to avoid unnecessary revoke-authorize of ingress rules.
r4f4 added a commit to r4f4/installer that referenced this pull request Jul 14, 2024
* Brings in this fix
  kubernetes-sigs/cluster-api-provider-aws#5024
  to avoid unnecessary revoke-authorize of ingress rules.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ingress rules reconciliation is not comparing equivalent sets
6 participants