Skip to content

E2E: waiting for services to be deleted before proceeding#2157

Merged
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
CecileRobertMichon:svc-delete-e2e
Mar 10, 2022
Merged

E2E: waiting for services to be deleted before proceeding#2157
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
CecileRobertMichon:svc-delete-e2e

Conversation

@CecileRobertMichon
Copy link
Copy Markdown
Contributor

What type of PR is this?
/kind failing-test

What this PR does / why we need it: following up on this slack thread, the Delete() client doesn't actually wait for a service to be deleted, it simply triggers the delete. cloud-provider-azure has a Finalizer on the Service resource as it needs to reconcile load balancers before the service can be deleted, which means it can take several minutes (sometimes even more) for the service to actually be gone. However, cloud-provider only reconciles one service at a time. This means before the ELB service starts reconciling, the ILB service first needs to be done deleting. By not waiting for the ILB to fully delete, we run into occasional flakes as the timeout for creating the ELB service also includes deleting the previous service.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 10, 2022
Comment thread test/e2e/azure_lb.go
return nil
}, deleteOperationTimeout, retryableOperationSleepBetweenRetries).Should(Succeed())
}, retryableOperationTimeout, retryableOperationSleepBetweenRetries).Should(Succeed())
Logf("waiting for the external LB service to be deleted: %s", elbService.Name)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this to be consistent but need to check if it adds a lot of time to e2e duration since we don't strictly need to wait on this one (it's the last service to be reconciled)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so we have actually not been validating ELB delete (ILBs and ELBs are ultimately the same Azure LoadBalancer type, so not super critical, but I agree it's best-practice to not make an exception here)

@CecileRobertMichon
Copy link
Copy Markdown
Contributor Author

/assign @jackfrancis

Copy link
Copy Markdown
Contributor

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2022
@CecileRobertMichon
Copy link
Copy Markdown
Contributor Author

CecileRobertMichon commented Mar 10, 2022

@jackfrancis I got this flake:

Timed out after 300.001s.
Expected success, but got an error:
    <*errors.fundamental | 0xc0003ffb30>: {
        msg: "no machine has started to drain",
        stack: [0x1ca6ff4, 0x4d4145, 0x4d36c5, 0x7b4931, 0x7b4c99, 0x7b54e5, 0x7b4a2b, 0x1ca6d69, 0x1ca6528, 0x1cc6fd3, 0x7a538c, 0x7b1abe, 0x1cc6f51, 0x79171a, 0x7910e5, 0x79017b, 0x796569, 0x795f47, 0x7a3005, 0x7a2d25, 0x7a2565, 0x7a4812, 0x7b1125, 0x7b0f3e, 0x1cc14de, 0x517702, 0x46d2a1],
    }
    no machine has started to drain
/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/azure_machinepool_drain.go:185

which suggests a new flake after #2123 merged

/retest

@jackfrancis
Copy link
Copy Markdown
Contributor

@CecileRobertMichon indeed.

I'll open a PR to disable those drain tests. Recall they were never actually running w/ validation turned on, so we need to go through the process of tuning them and actually implementing them in a useable way.

@jackfrancis
Copy link
Copy Markdown
Contributor

@CecileRobertMichon see #2160

Copy link
Copy Markdown
Contributor Author

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2022
@k8s-ci-robot k8s-ci-robot merged commit 5cf5985 into kubernetes-sigs:main Mar 10, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.3 milestone Mar 10, 2022
@CecileRobertMichon CecileRobertMichon deleted the svc-delete-e2e branch February 17, 2023 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants