Skip to content

Make VNet and NSGs reconcile/delete async#1684

Closed
CecileRobertMichon wants to merge 4 commits into
kubernetes-sigs:mainfrom
CecileRobertMichon:async-vnets
Closed

Make VNet and NSGs reconcile/delete async#1684
CecileRobertMichon wants to merge 4 commits into
kubernetes-sigs:mainfrom
CecileRobertMichon:async-vnets

Conversation

@CecileRobertMichon
Copy link
Copy Markdown
Contributor

@CecileRobertMichon CecileRobertMichon commented Sep 14, 2021

What type of PR is this?
/kind cleanup
/kind feature

What this PR does / why we need it: Follow up on #1610 and implementation of #1541 for virtual networks and security groups (bundled together because they were both part of the original #1610 PR and share the managed vnet logic).

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Make VNet and NSGs reconcile/delete async

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Sep 14, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from cecilerobertmichon after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 14, 2021
}, nil
}

// TODO: review this logic and make sure it is what we want. It seems incorrect to skip rules that don't have a certain protocol, etc.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic smells fishy to me, not sure why we're doing a continue when the rule has certain properties. That being said I didn't want to introduce bug fixes/behavior changes as part of this already large refactor so I kept the logic as is from https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1684/files#diff-5bb14c98c52ecfa3dad6cf4745c9378b7624fdc60f0910343534ddac18408fb1L115

@CecileRobertMichon
Copy link
Copy Markdown
Contributor Author

/retest

@CecileRobertMichon CecileRobertMichon force-pushed the async-vnets branch 3 times, most recently from d50304a to 4b651a6 Compare September 14, 2021 18:13
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 18, 2021
@shysank shysank mentioned this pull request Sep 20, 2021
23 tasks
Comment thread azure/services/securitygroups/securitygroups_test.go Outdated
Comment thread azure/services/securitygroups/securitygroups.go Outdated
@karuppiah7890
Copy link
Copy Markdown
Contributor

@CecileRobertMichon I was looking at the implementation of async reconcile of NSGs in this PR to replicate it for public ips to fix #1716 . I have a question regarding the status updates of NSGs. I see that we are storing only one error as a result for the UpdatePutStatus call. I was wondering about the case where we have multiple errors - errors which are non operation-not-done errors . I see that we would just store the last such non operation-not-done errors and then never overwrite it with any possible operation-not-done-errors. Why not capture all the errors which are non operation-not-done errors? So that we can show them all here -

conditions.MarkFalse(s.AzureCluster, condition, infrav1.FailedReason, clusterv1.ConditionSeverityError, "%s failed to create or update. err: %s", service, err.Error())

Comment thread azure/services/securitygroups/securitygroups.go Outdated
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2021
@CecileRobertMichon CecileRobertMichon force-pushed the async-vnets branch 3 times, most recently from bcedf3d to 674701c Compare November 5, 2021 20:15
@CecileRobertMichon CecileRobertMichon force-pushed the async-vnets branch 3 times, most recently from e0c47c5 to dd601d8 Compare November 8, 2021 18:50
@CecileRobertMichon CecileRobertMichon force-pushed the async-vnets branch 2 times, most recently from c08c457 to 9a1bb8c Compare November 9, 2021 01:27
@CecileRobertMichon
Copy link
Copy Markdown
Contributor Author

Rebased/tests updated/fixed issues, this should be ready to go!

@CecileRobertMichon
Copy link
Copy Markdown
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 9, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 9, 2021
@CecileRobertMichon
Copy link
Copy Markdown
Contributor Author

Can't repro the private cluster deletion issue locally. Trying one more time...

/retest

@CecileRobertMichon CecileRobertMichon force-pushed the async-vnets branch 2 times, most recently from 2ac0140 to c7b57f0 Compare November 16, 2021 19:35
Comment thread azure/scope/cluster.go Outdated
func (s *ClusterScope) IsVnetManaged(ctx context.Context) (bool, error) {
var err error
if s.cache.IsVnetManaged == nil {
vnetSvc := virtualnetworks.New(s)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shysank PTAL. I had to go back on our conversation in #1610 (comment) because it doesn't work for Delete() as we call services in the opposite order as Reconcile() (eg. the subnets need to be deleted before the vnet is deleted) so this means the vnet service isn't the first one to be reconciled which means we have to do this before we get to the vnet service. I'm not sure how to work around that constraint without calling the virtual network from either the scope or other services (eg. from the subnet service). Thoughts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @devigned as well

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we think of IsManaged as something that identifies whether that resource is managed by capz instead of checking IsVnetManaged? Every Service can implement IsManaged behaviour, and internally a service can depend on another service or another service's client to determine whether it is managed or not. The cache will be shared among all services. In this way, we can make clear abstractions. Concretely: SubnetSvc.IsManaged() will check if vnet is managed either through virtualNetworksClient (we can write a helper if it is used in more than one place) or by exposing virtualNetworksSvc.IsManaged, and updates the cache through scope which can be used by subsequent services. wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me try some things once #1874 is merged and I can rebase on top of it

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 2, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 10, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@CecileRobertMichon: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 13, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@CecileRobertMichon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-e2e-windows 7a76e16 link /test pull-cluster-api-provider-azure-e2e-windows
pull-cluster-api-provider-azure-ci-entrypoint 5069658 link false /test pull-cluster-api-provider-azure-ci-entrypoint
pull-cluster-api-provider-azure-e2e-windows-dockershim 5069658 link true /test pull-cluster-api-provider-azure-e2e-windows-dockershim
pull-cluster-api-provider-azure-verify 5069658 link true /test pull-cluster-api-provider-azure-verify
pull-cluster-api-provider-azure-e2e 5069658 link true /test pull-cluster-api-provider-azure-e2e
pull-cluster-api-provider-azure-test 5069658 link true /test pull-cluster-api-provider-azure-test
pull-cluster-api-provider-azure-apidiff 5069658 link false /test pull-cluster-api-provider-azure-apidiff
pull-cluster-api-provider-azure-build 5069658 link true /test pull-cluster-api-provider-azure-build
pull-cluster-api-provider-azure-coverage 5069658 link false /test pull-cluster-api-provider-azure-coverage

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@CecileRobertMichon
Copy link
Copy Markdown
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 13, 2021
@CecileRobertMichon
Copy link
Copy Markdown
Contributor Author

/close

replaced by #1921 and #1918

will tackle refactoring the IsManaged logic separately once all services are async.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@CecileRobertMichon: Closed this PR.

Details

In response to this:

/close

replaced by #1921 and #1918

will tackle refactoring the IsManaged logic separately once all services are async.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants