Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Increase parallelism for e2e tests #1816

Closed
wants to merge 4 commits into from

Conversation

shysank
Copy link
Contributor

@shysank shysank commented Nov 1, 2021

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 1, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign cecilerobertmichon after the PR has been reviewed.
You can assign the PR to them by writing /assign @cecilerobertmichon in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Nov 1, 2021
@shysank
Copy link
Contributor Author

shysank commented Nov 1, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 1, 2021
@shysank
Copy link
Contributor Author

shysank commented Nov 1, 2021

/test pull-cluster-api-provider-azure-capi-e2e

2 similar comments
@shysank
Copy link
Contributor Author

shysank commented Nov 2, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 3, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 4, 2021

Two of the tests that failed in the last run didn't get reconciled at all in AzureClusterController and remains in provisioning state. This has been a somewhat consistent failure in most of the runs.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 4, 2021
@shysank
Copy link
Contributor Author

shysank commented Nov 4, 2021

/test pull-cluster-api-provider-azure-capi-e2e

2 similar comments
@shysank
Copy link
Contributor Author

shysank commented Nov 5, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 5, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 5, 2021

@shysank
Copy link
Contributor Author

shysank commented Nov 5, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 5, 2021

One more pass https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/1816/pull-cluster-api-provider-azure-capi-e2e/1456648404498124800 in ~1hr30 mins. 2/3 passed. Cautiously optimistic. Try again.

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 5, 2021

uh oh it failed again https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/1816/pull-cluster-api-provider-azure-capi-e2e/1456676924385398784. Again, not reconciling azure cluster objects. Perhaps the bootstrap cluster is overloaded with watches?

@shysank
Copy link
Contributor Author

shysank commented Nov 5, 2021

/test pull-cluster-api-provider-azure-capi-e2e

2 similar comments
@shysank
Copy link
Contributor Author

shysank commented Nov 5, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 6, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 6, 2021

/test pull-cluster-api-provider-azure-capi-e2e

4 similar comments
@shysank
Copy link
Contributor Author

shysank commented Nov 6, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 6, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 6, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 6, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 8, 2021

4 out of the last 5 tests passed. The one failure was due to a mhc flake. Looks like increasing the concurrency has resolved the AzureCluster reconciliation issue. Going to revert other hacks, and properly set concurrency in test, and try again for a few times.

@shysank shysank force-pushed the ginkgo_parallel_nodes branch from 7177356 to 8769bd6 Compare November 8, 2021 21:24
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 8, 2021
@shysank
Copy link
Contributor Author

shysank commented Nov 8, 2021

/test pull-cluster-api-provider-azure-capi-e2e

3 similar comments
@shysank
Copy link
Contributor Author

shysank commented Nov 8, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 9, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 9, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 9, 2021

Ginkgo node is not responding sometimes onsi/ginkgo#206 and times out after 4 hours. This is happening for quick start spec mostly. Going to add -debug and try again to find the cause as suggested in the issue.

@shysank
Copy link
Contributor Author

shysank commented Nov 9, 2021

/test pull-cluster-api-provider-azure-capi-e2e

1 similar comment
@shysank
Copy link
Contributor Author

shysank commented Nov 9, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 9, 2021

Message: "admission webhook \"validation.kubeadmcontrolplane.controlplane.cluster.x-k8s.io\" denied the request: KubeadmControlPlane.controlplane.cluster.x-k8s.io \"kcp-upgrade-9hh6vc-control-plane\" is invalid: spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.extraArgs: Forbidden: cannot be modified", This error is back again :(

@shysank shysank force-pushed the ginkgo_parallel_nodes branch from 9364a6c to 2ff77ac Compare November 10, 2021 01:53
@shysank
Copy link
Contributor Author

shysank commented Nov 10, 2021

/test pull-cluster-api-provider-azure-capi-e2e

2 similar comments
@shysank
Copy link
Contributor Author

shysank commented Nov 10, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 10, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@k8s-ci-robot
Copy link
Contributor

@shysank: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-e2e-exp 2ff77ac link false /test pull-cluster-api-provider-azure-e2e-exp
pull-cluster-api-provider-azure-capi-e2e 2ff77ac link false /test pull-cluster-api-provider-azure-capi-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@CecileRobertMichon
Copy link
Contributor

I think we can close this now that we somewhat solved our problem by removing upgrade from spec

Still something that might be worth revisiting later but for now

/close

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: Closed this PR.

In response to this:

I think we can close this now that we somewhat solved our problem by removing upgrade from spec

Still something that might be worth revisiting later but for now

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants