🏃 Add e2e tests for all flavors#798

Merged

k8s-ci-robot merged 1 commit intokubernetes-sigs:masterfrom

sbueringer:pr-add-more-e2e-tests

Mar 30, 2021

Member

sbueringer commented Mar 20, 2021 •

edited

Loading

What this PR does / why we need it:
This PR adds e2e tests for all our flavors (and some additional ones).

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #577
Fixes #723
Fixes #741

Notes:

This PR can also be reviewed but won't be merged before the test framework and the linter PR and also contains those commits/PRs

squashed commits
if necessary:
- includes documentation
- adds unit tests

/hold

k8s-ci-robot added do-not-merge/hold do-not-merge/work-in-progress cncf-cla: yes labels

Contributor

k8s-ci-robot commented Mar 20, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [sbueringer]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from hidekazuna and prankul88

March 20, 2021 12:38

k8s-ci-robot added the approved label

Member Author

sbueringer commented Mar 20, 2021

/test ?

Contributor

k8s-ci-robot commented Mar 20, 2021

@sbueringer: The following commands are available to trigger jobs:

/test pull-cluster-api-provider-openstack-build
/test pull-cluster-api-provider-openstack-test
/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-make-conformance

Use /test all to run the following jobs:

pull-cluster-api-provider-openstack-build
pull-cluster-api-provider-openstack-test

Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the size/XXL label

Member Author

sbueringer commented Mar 20, 2021

/test pull-cluster-api-provider-openstack-e2e-test

sbueringer mentioned this pull request

Can not delete cluster due to SecurityGroupInUse #741

Closed

Member Author

sbueringer commented Mar 20, 2021

/test pull-cluster-api-provider-openstack-e2e-test

13 similar comments

Member Author

sbueringer commented Mar 20, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Member Author

sbueringer commented Mar 20, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Member Author

sbueringer commented Mar 20, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Member Author

sbueringer commented Mar 20, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Member Author

sbueringer commented Mar 20, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Member Author

sbueringer commented Mar 21, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Member Author

sbueringer commented Mar 21, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Member Author

sbueringer commented Mar 21, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Member Author

sbueringer commented Mar 21, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Member Author

sbueringer commented Mar 21, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Member Author

sbueringer commented Mar 21, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Member Author

sbueringer commented Mar 21, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Member Author

sbueringer commented Mar 21, 2021

/test pull-cluster-api-provider-openstack-e2e-test

sbueringer force-pushed the pr-add-more-e2e-tests branch from 75441a5 to c05394e Compare

March 22, 2021 05:41

sbueringer changed the title ~~[WIP]🏃 Add e2e tests for all flavors~~ 🏃 Add e2e tests for all flavors

k8s-ci-robot removed the do-not-merge/work-in-progress label

Member Author

sbueringer commented Mar 22, 2021

/test pull-cluster-api-provider-openstack-e2e-test

Contributor

k8s-ci-robot commented Mar 25, 2021

@sbueringer: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

/test pull-cluster-api-provider-openstack-build
/test pull-cluster-api-provider-openstack-test
/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

Use /test all to run the following jobs:

pull-cluster-api-provider-openstack-build
pull-cluster-api-provider-openstack-test

Details

In response to this:

/test pull-cluster-api-provider-openstack-make-conformance

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Member Author

sbueringer commented Mar 25, 2021

/test pull-cluster-api-provider-openstack-conformance-test

sbueringer force-pushed the pr-add-more-e2e-tests branch from 2410d12 to 4562b14 Compare

March 26, 2021 06:02

Member Author

sbueringer commented Mar 26, 2021

/test pull-cluster-api-provider-openstack-make-conformance
/test pull-cluster-api-provider-openstack-e2e-test

Also dropped the commit here, so ready for review/merge now.

@jichenjc @hidekazuna @prankul88 PTAL, if you have some time :)

Member Author

sbueringer commented Mar 26, 2021

/test pull-cluster-api-provider-openstack-conformance-test

sbueringer mentioned this pull request

cluster-api-provider-openstack: enable and enforce e2e tests on PR kubernetes/test-infra#21536

Merged

sbueringer commented

View reviewed changes

controllers/openstackcluster_controller.go Outdated

               	}
               	// Delete other things
+              	if err = networkingService.DeleteOrphanedPorts(); err != nil {

Member Author

sbueringer Mar 26, 2021

This should fix: #723

sbueringer force-pushed the pr-add-more-e2e-tests branch from 4562b14 to b16b7ab Compare

March 26, 2021 08:08

Member Author

sbueringer commented Mar 26, 2021

/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

sbueringer force-pushed the pr-add-more-e2e-tests branch from b16b7ab to af3f553 Compare

March 26, 2021 08:15

Member Author

sbueringer commented Mar 26, 2021

Some final polishing (and subsequent linter fixes...), but now I'm done.
/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

sbueringer mentioned this pull request

🏃 Improve ci script patching #812

Merged

3 tasks

jichenjc reviewed

View reviewed changes

Contributor

jichenjc left a comment

overall quick , some nit question

pkg/cloud/services/networking/network.go Outdated

+              		if orphanedPort.DeviceOwner != "" {
+              			continue
+              		}

Contributor

jichenjc Mar 27, 2021

I'd think we at least log this behavior

Member Author

sbueringer Mar 27, 2021

Yup I"ll add log statements where we do (i.e. delete) something

pkg/cloud/services/networking/service.go Outdated

               	}
+              	var projectID string
+              	if clientOpts.AuthInfo != nil {

Contributor

jichenjc Mar 27, 2021

is it possible clientOpts.AuthInfo is nil?
if so the projectID will be "" , will it impact follow up features?

Member Author

sbueringer Mar 27, 2021

I think it cannot be nil, but it has very high impact if it is. With an empty project id we would just list and delete orphaned ports depending on the rights of the user.

This means when you have some kind of admin / domain admin credentials in the cloud.yaml it will delete stuff all over the OpenStack.

I think we should fail hard here if we're unable to determine a project id

Contributor

jichenjc Mar 27, 2021

ok, thanks for the info

test/e2e/shared/exec_test.go Outdated

    
              				machineIP: "10.6.0.230",

              				bastionIP: "172.24.4.58",

              				machineIP: "10.6.0.209",

              				bastionIP: "172.24.4.10",

Contributor

jichenjc Mar 27, 2021

I am not sure why we need such change? do we have hard dependency on IP?

Member Author

sbueringer Mar 27, 2021

This was just for debugging locally. The test is not executed (t.Skip() above)
It's just way easier to develop the exec method locally when you don't have to execute the whole e2e tests to do it :)

I'll rollback the IPs, I just changed them when debugging the func

Contributor

jichenjc Mar 27, 2021

I am just curious as, not a big deal

Member Author

sbueringer Mar 27, 2021 •

edited

Loading

Absolutely fine, happy to explain :)

sbueringer commented

View reviewed changes

hack/ci/devstack-on-gce-project-install.sh

                   if ! gcloud compute networks describe "${GCP_NETWORK_NAME}" --project "${GCP_PROJECT}" > /dev/null;
                   then
-                    gcloud compute networks create --project "$GCP_PROJECT" "${GCP_NETWORK_NAME}" --subnet-mode auto --quiet
+                    gcloud compute networks create --project "$GCP_PROJECT" "${GCP_NETWORK_NAME}" --subnet-mode custom

Member Author

sbueringer Mar 27, 2021

Ah I got this from testing locally. I would like to keep it anyway. The idea is to not automatically create a lot of subnets as GCP is doing it with subnet mode auto

Member Author

sbueringer commented Mar 27, 2021

/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

@jichenjc thx for the review, good findings :) Should be all fixed now ptal

Member Author

sbueringer commented Mar 27, 2021

/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

(small fixup, was necessary because of the change to the subnet)

Member Author

sbueringer commented Mar 27, 2021

/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

Contributor

jichenjc commented Mar 30, 2021

@sbueringer please help to squash commit then I think we might merge this

sbueringer force-pushed the pr-add-more-e2e-tests branch from 6270053 to dfd6283 Compare

March 30, 2021 03:27

Member Author

sbueringer commented Mar 30, 2021

@sbueringer please help to squash commit then I think we might merge this

@jichenjc done :)

rustycl0ck suggested changes

View reviewed changes

pkg/cloud/services/networking/network.go Outdated

+              	}
+              	for _, orphanedPort := range orphanedPorts {
+              		if orphanedPort.DeviceOwner != "" {

Contributor

rustycl0ck Mar 30, 2021

We shouldn't delete anything which is not created by ClusterAPI. This statement would lead to deletion of any ports which were not created by ClusterAPI but were created for other purposes (either manually or via some other automation).

One common use case for this: In cases where a loadbalancer is not available (or the user does not use if for whatever reasons), and instead a VIP (Virtual IP) along with keepalived is used for load balancing, then an openstack port is created (to reserve the IP and to create proper routes), but is not attached to any VM. Since this openstack port is not attached to any VM, it has device_owner == "".

The DeleteOrphanedPorts function would delete this port unnecessarily and would cause service disruption elsewhere.

Member Author

sbueringer Mar 30, 2021 •

edited

Loading

@rustycl0ck ~~Understood. So I assume there is no real way to cleanup those ports safely. I'll drop this for now from the e2e tests~~ (see my next comment). Not sure how we can get rid of the problems during cluster deletion though. I'll try if I can get the e2e tests clean without this change but I'm not sure if it's possible

Member Author

sbueringer Mar 30, 2021 •

edited

Loading

I'm not entirely sure about the general statement "we shouldn't delete anything which is not created by ClusterAPI". In my opinion there are cases where it's totally valid. One case is that the cluster as a whole should be deleted and somebody has the clean up the loadbalancers which has been created by the cloud provider openstack. This is also done in CAPA and the ccm itself is unable to delete them as it doesn't exist anymore.

Are those service interruptions really an issue when the whole cluster is deleted? I mean of course somebody can build entirely different infrastructure in the same OpenStack tenant as the CAPO-managed cluster. But I"m not sure I want to support this.

Contributor

rustycl0ck Mar 30, 2021

My use case was referring to a setup that I worked on, where a single openstack network (IPv6) was being used by multiple teams/products. While I was trying to use CAPI to setup a k8s cluster there, a few other teams had used the VIP approach for making their applications HA (since a loadbalancer wasn't available). Just FYI, the other products weren't k8s based, but legacy VM based systems/appliccations.

#723 was caused at the time of cluster creation where the port is created first, but if the image (or the ssh key or the flavor) does not exist, server creation would error out and the port would be left behind without any cleanup on server creation failure. This issue seems to have been addressed in #705 correctly (where a rollback is performed if the server creation failed). Similarly, we should try to fix the root cause instead of performing a general cleanup.

I think the correct way to fix this would be to figure out why this error occurred in the first place.
From the following code block: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/master/pkg/cloud/services/compute/instance.go#L631-L648, I don't see how/why the port would have been left behind while the server got deleted successfully. If we can find the root cause, we can fix the problem in a cleaner way.

Member Author

sbueringer Mar 30, 2021 •

edited

Loading

@rustycl0ck Thx for providing some more details :).

I think overall you're right. The better approach would be to cleanup the ports wherever they're orphaned. I'm currently testing if this PR also works without the DeleteOrphanedPorts func: #816
I assume this will be successful and then I'll drop the func on the current PR.

Out of curiosity (I'm not sure what is possible with OpenStack). Did you have:

a separate project and where using the same project and network as the other teams or
have you been using the same network but separate projects?

Contributor

rustycl0ck Mar 30, 2021

It was a single project and same network, because all the applications (k8s hosted as well as legacy VM based) had to interact with each other (on a private network internally).

Member Author

sbueringer Mar 30, 2021

As I feared the tests are now failing because they are timing out because the cluster deletion does not work: https://prow.k8s.io/log?container=test&id=1376784008490258432&job=pull-cluster-api-provider-openstack-e2e-test

I would suggest merging the PR anyway without the DeleteOrphanedPort func. I would rather have those tests with a known issue on master and unblock a lot of other PRs instead of having to rely only on the conformance tests.Those tests are not mandatory yet and I try to find the bug in a follow-up PR.

What do you think?
(@jichenjc )

Member Author

sbueringer Mar 30, 2021

Got the exact results. Looks like the error only happens when server are not successfully created right now. I'll check if I can just fix this...

Member Author

sbueringer Mar 30, 2021 •

edited

Loading

Fixed it here: 2e012c6
Manual tests against a devstack running AWS were green. I'll squash and re-run the tests on this PR


          implement e2e tests for all our flavors

9fa2481

sbueringer force-pushed the pr-add-more-e2e-tests branch from 2e012c6 to 9fa2481 Compare

March 30, 2021 09:28

Member Author

sbueringer commented Mar 30, 2021

/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

Contributor

jichenjc commented Mar 30, 2021

/lgtm

thanks :)

k8s-ci-robot assigned jichenjc

k8s-ci-robot added the lgtm label

Member Author

sbueringer commented Mar 30, 2021

tests are green, so:
/hold cancel

k8s-ci-robot removed the do-not-merge/hold label

k8s-ci-robot merged commit 6227252 into kubernetes-sigs:master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

hidekazuna Awaiting requested review from hidekazuna

prankul88 Awaiting requested review from prankul88

2 more reviewers

rustycl0ck rustycl0ck requested changes

jichenjc jichenjc left review comments

Labels

approved cncf-cla: yes lgtm size/XXL