Skip to content

🏃 Add e2e tests for all flavors#798

Merged
k8s-ci-robot merged 1 commit intokubernetes-sigs:masterfrom
sbueringer:pr-add-more-e2e-tests
Mar 30, 2021
Merged

🏃 Add e2e tests for all flavors#798
k8s-ci-robot merged 1 commit intokubernetes-sigs:masterfrom
sbueringer:pr-add-more-e2e-tests

Conversation

@sbueringer
Copy link
Copy Markdown
Member

@sbueringer sbueringer commented Mar 20, 2021

What this PR does / why we need it:
This PR adds e2e tests for all our flavors (and some additional ones).

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #577
Fixes #723
Fixes #741

Notes:

This PR can also be reviewed but won't be merged before the test framework and the linter PR and also contains those commits/PRs

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 20, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 20, 2021
@sbueringer
Copy link
Copy Markdown
Member Author

/test ?

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@sbueringer: The following commands are available to trigger jobs:

  • /test pull-cluster-api-provider-openstack-build
  • /test pull-cluster-api-provider-openstack-test
  • /test pull-cluster-api-provider-openstack-e2e-test
  • /test pull-cluster-api-provider-openstack-make-conformance

Use /test all to run the following jobs:

  • pull-cluster-api-provider-openstack-build
  • pull-cluster-api-provider-openstack-test
Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 20, 2021
@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

13 similar comments
@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@sbueringer sbueringer force-pushed the pr-add-more-e2e-tests branch from 75441a5 to c05394e Compare March 22, 2021 05:41
@sbueringer sbueringer changed the title [WIP]🏃 Add e2e tests for all flavors 🏃 Add e2e tests for all flavors Mar 22, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2021
@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@sbueringer: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test pull-cluster-api-provider-openstack-build
  • /test pull-cluster-api-provider-openstack-test
  • /test pull-cluster-api-provider-openstack-e2e-test
  • /test pull-cluster-api-provider-openstack-conformance-test

Use /test all to run the following jobs:

  • pull-cluster-api-provider-openstack-build
  • pull-cluster-api-provider-openstack-test
Details

In response to this:

/test pull-cluster-api-provider-openstack-make-conformance

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-conformance-test

@sbueringer sbueringer force-pushed the pr-add-more-e2e-tests branch from 2410d12 to 4562b14 Compare March 26, 2021 06:02
@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-make-conformance
/test pull-cluster-api-provider-openstack-e2e-test

Also dropped the commit here, so ready for review/merge now.

@jichenjc @hidekazuna @prankul88 PTAL, if you have some time :)

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-conformance-test

}

// Delete other things
if err = networkingService.DeleteOrphanedPorts(); err != nil {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix: #723

@sbueringer sbueringer force-pushed the pr-add-more-e2e-tests branch from 4562b14 to b16b7ab Compare March 26, 2021 08:08
@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

@sbueringer sbueringer force-pushed the pr-add-more-e2e-tests branch from b16b7ab to af3f553 Compare March 26, 2021 08:15
@sbueringer
Copy link
Copy Markdown
Member Author

Some final polishing (and subsequent linter fixes...), but now I'm done.
/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

@sbueringer sbueringer mentioned this pull request Mar 26, 2021
3 tasks
Copy link
Copy Markdown
Contributor

@jichenjc jichenjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall quick , some nit question

if orphanedPort.DeviceOwner != "" {
continue
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think we at least log this behavior

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I"ll add log statements where we do (i.e. delete) something

}

var projectID string
if clientOpts.AuthInfo != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible clientOpts.AuthInfo is nil?
if so the projectID will be "" , will it impact follow up features?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it cannot be nil, but it has very high impact if it is. With an empty project id we would just list and delete orphaned ports depending on the rights of the user.

This means when you have some kind of admin / domain admin credentials in the cloud.yaml it will delete stuff all over the OpenStack.

I think we should fail hard here if we're unable to determine a project id

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for the info

machineIP: "10.6.0.230",
bastionIP: "172.24.4.58",
machineIP: "10.6.0.209",
bastionIP: "172.24.4.10",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we need such change? do we have hard dependency on IP?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just for debugging locally. The test is not executed (t.Skip() above)
It's just way easier to develop the exec method locally when you don't have to execute the whole e2e tests to do it :)

I'll rollback the IPs, I just changed them when debugging the func

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just curious as, not a big deal

Copy link
Copy Markdown
Member Author

@sbueringer sbueringer Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely fine, happy to explain :)

if ! gcloud compute networks describe "${GCP_NETWORK_NAME}" --project "${GCP_PROJECT}" > /dev/null;
then
gcloud compute networks create --project "$GCP_PROJECT" "${GCP_NETWORK_NAME}" --subnet-mode auto --quiet
gcloud compute networks create --project "$GCP_PROJECT" "${GCP_NETWORK_NAME}" --subnet-mode custom
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I got this from testing locally. I would like to keep it anyway. The idea is to not automatically create a lot of subnets as GCP is doing it with subnet mode auto

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

@jichenjc thx for the review, good findings :) Should be all fixed now ptal

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

(small fixup, was necessary because of the change to the subnet)

@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

@jichenjc
Copy link
Copy Markdown
Contributor

@sbueringer please help to squash commit then I think we might merge this

@sbueringer sbueringer force-pushed the pr-add-more-e2e-tests branch from 6270053 to dfd6283 Compare March 30, 2021 03:27
@sbueringer
Copy link
Copy Markdown
Member Author

@sbueringer please help to squash commit then I think we might merge this

@jichenjc done :)

}

for _, orphanedPort := range orphanedPorts {
if orphanedPort.DeviceOwner != "" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't delete anything which is not created by ClusterAPI. This statement would lead to deletion of any ports which were not created by ClusterAPI but were created for other purposes (either manually or via some other automation).

One common use case for this: In cases where a loadbalancer is not available (or the user does not use if for whatever reasons), and instead a VIP (Virtual IP) along with keepalived is used for load balancing, then an openstack port is created (to reserve the IP and to create proper routes), but is not attached to any VM. Since this openstack port is not attached to any VM, it has device_owner == "".

The DeleteOrphanedPorts function would delete this port unnecessarily and would cause service disruption elsewhere.

Copy link
Copy Markdown
Member Author

@sbueringer sbueringer Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rustycl0ck Understood. So I assume there is no real way to cleanup those ports safely. I'll drop this for now from the e2e tests (see my next comment). Not sure how we can get rid of the problems during cluster deletion though. I'll try if I can get the e2e tests clean without this change but I'm not sure if it's possible

Copy link
Copy Markdown
Member Author

@sbueringer sbueringer Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure about the general statement "we shouldn't delete anything which is not created by ClusterAPI". In my opinion there are cases where it's totally valid. One case is that the cluster as a whole should be deleted and somebody has the clean up the loadbalancers which has been created by the cloud provider openstack. This is also done in CAPA and the ccm itself is unable to delete them as it doesn't exist anymore.

Are those service interruptions really an issue when the whole cluster is deleted? I mean of course somebody can build entirely different infrastructure in the same OpenStack tenant as the CAPO-managed cluster. But I"m not sure I want to support this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My use case was referring to a setup that I worked on, where a single openstack network (IPv6) was being used by multiple teams/products. While I was trying to use CAPI to setup a k8s cluster there, a few other teams had used the VIP approach for making their applications HA (since a loadbalancer wasn't available). Just FYI, the other products weren't k8s based, but legacy VM based systems/appliccations.

#723 was caused at the time of cluster creation where the port is created first, but if the image (or the ssh key or the flavor) does not exist, server creation would error out and the port would be left behind without any cleanup on server creation failure. This issue seems to have been addressed in #705 correctly (where a rollback is performed if the server creation failed). Similarly, we should try to fix the root cause instead of performing a general cleanup.

I think the correct way to fix this would be to figure out why this error occurred in the first place.
From the following code block: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/master/pkg/cloud/services/compute/instance.go#L631-L648, I don't see how/why the port would have been left behind while the server got deleted successfully. If we can find the root cause, we can fix the problem in a cleaner way.

Copy link
Copy Markdown
Member Author

@sbueringer sbueringer Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rustycl0ck Thx for providing some more details :).

I think overall you're right. The better approach would be to cleanup the ports wherever they're orphaned. I'm currently testing if this PR also works without the DeleteOrphanedPorts func: #816
I assume this will be successful and then I'll drop the func on the current PR.

Out of curiosity (I'm not sure what is possible with OpenStack). Did you have:

  • a separate project and where using the same project and network as the other teams or
  • have you been using the same network but separate projects?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a single project and same network, because all the applications (k8s hosted as well as legacy VM based) had to interact with each other (on a private network internally).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I feared the tests are now failing because they are timing out because the cluster deletion does not work: https://prow.k8s.io/log?container=test&id=1376784008490258432&job=pull-cluster-api-provider-openstack-e2e-test

I would suggest merging the PR anyway without the DeleteOrphanedPort func. I would rather have those tests with a known issue on master and unblock a lot of other PRs instead of having to rely only on the conformance tests.Those tests are not mandatory yet and I try to find the bug in a follow-up PR.

What do you think?
(@jichenjc )

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got the exact results. Looks like the error only happens when server are not successfully created right now. I'll check if I can just fix this...

Copy link
Copy Markdown
Member Author

@sbueringer sbueringer Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it here: 2e012c6
Manual tests against a devstack running AWS were green. I'll squash and re-run the tests on this PR

@sbueringer sbueringer force-pushed the pr-add-more-e2e-tests branch from 2e012c6 to 9fa2481 Compare March 30, 2021 09:28
@sbueringer
Copy link
Copy Markdown
Member Author

/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-conformance-test

@jichenjc
Copy link
Copy Markdown
Contributor

/lgtm

thanks :)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 30, 2021
@sbueringer
Copy link
Copy Markdown
Member Author

tests are green, so:
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 30, 2021
@k8s-ci-robot k8s-ci-robot merged commit 6227252 into kubernetes-sigs:master Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

4 participants