🏃 Add e2e tests for all flavors#798
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sbueringer The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test ? |
|
@sbueringer: The following commands are available to trigger jobs:
Use
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
13 similar comments
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/test pull-cluster-api-provider-openstack-e2e-test |
75441a5 to
c05394e
Compare
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
@sbueringer: The specified target(s) for
Use
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test pull-cluster-api-provider-openstack-conformance-test |
2410d12 to
4562b14
Compare
|
/test pull-cluster-api-provider-openstack-make-conformance Also dropped the commit here, so ready for review/merge now. @jichenjc @hidekazuna @prankul88 PTAL, if you have some time :) |
|
/test pull-cluster-api-provider-openstack-conformance-test |
| } | ||
|
|
||
| // Delete other things | ||
| if err = networkingService.DeleteOrphanedPorts(); err != nil { |
4562b14 to
b16b7ab
Compare
|
/test pull-cluster-api-provider-openstack-e2e-test |
b16b7ab to
af3f553
Compare
|
Some final polishing (and subsequent linter fixes...), but now I'm done. |
jichenjc
left a comment
There was a problem hiding this comment.
overall quick , some nit question
| if orphanedPort.DeviceOwner != "" { | ||
| continue | ||
| } | ||
|
|
There was a problem hiding this comment.
I'd think we at least log this behavior
There was a problem hiding this comment.
Yup I"ll add log statements where we do (i.e. delete) something
| } | ||
|
|
||
| var projectID string | ||
| if clientOpts.AuthInfo != nil { |
There was a problem hiding this comment.
is it possible clientOpts.AuthInfo is nil?
if so the projectID will be "" , will it impact follow up features?
There was a problem hiding this comment.
I think it cannot be nil, but it has very high impact if it is. With an empty project id we would just list and delete orphaned ports depending on the rights of the user.
This means when you have some kind of admin / domain admin credentials in the cloud.yaml it will delete stuff all over the OpenStack.
I think we should fail hard here if we're unable to determine a project id
test/e2e/shared/exec_test.go
Outdated
| machineIP: "10.6.0.230", | ||
| bastionIP: "172.24.4.58", | ||
| machineIP: "10.6.0.209", | ||
| bastionIP: "172.24.4.10", |
There was a problem hiding this comment.
I am not sure why we need such change? do we have hard dependency on IP?
There was a problem hiding this comment.
This was just for debugging locally. The test is not executed (t.Skip() above)
It's just way easier to develop the exec method locally when you don't have to execute the whole e2e tests to do it :)
I'll rollback the IPs, I just changed them when debugging the func
There was a problem hiding this comment.
I am just curious as, not a big deal
There was a problem hiding this comment.
Absolutely fine, happy to explain :)
| if ! gcloud compute networks describe "${GCP_NETWORK_NAME}" --project "${GCP_PROJECT}" > /dev/null; | ||
| then | ||
| gcloud compute networks create --project "$GCP_PROJECT" "${GCP_NETWORK_NAME}" --subnet-mode auto --quiet | ||
| gcloud compute networks create --project "$GCP_PROJECT" "${GCP_NETWORK_NAME}" --subnet-mode custom |
There was a problem hiding this comment.
Ah I got this from testing locally. I would like to keep it anyway. The idea is to not automatically create a lot of subnets as GCP is doing it with subnet mode auto
|
/test pull-cluster-api-provider-openstack-e2e-test @jichenjc thx for the review, good findings :) Should be all fixed now ptal |
|
/test pull-cluster-api-provider-openstack-e2e-test (small fixup, was necessary because of the change to the subnet) |
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
@sbueringer please help to squash commit then I think we might merge this |
6270053 to
dfd6283
Compare
@jichenjc done :) |
| } | ||
|
|
||
| for _, orphanedPort := range orphanedPorts { | ||
| if orphanedPort.DeviceOwner != "" { |
There was a problem hiding this comment.
We shouldn't delete anything which is not created by ClusterAPI. This statement would lead to deletion of any ports which were not created by ClusterAPI but were created for other purposes (either manually or via some other automation).
One common use case for this: In cases where a loadbalancer is not available (or the user does not use if for whatever reasons), and instead a VIP (Virtual IP) along with keepalived is used for load balancing, then an openstack port is created (to reserve the IP and to create proper routes), but is not attached to any VM. Since this openstack port is not attached to any VM, it has device_owner == "".
The DeleteOrphanedPorts function would delete this port unnecessarily and would cause service disruption elsewhere.
There was a problem hiding this comment.
@rustycl0ck Understood. So I assume there is no real way to cleanup those ports safely. I'll drop this for now from the e2e tests (see my next comment). Not sure how we can get rid of the problems during cluster deletion though. I'll try if I can get the e2e tests clean without this change but I'm not sure if it's possible
There was a problem hiding this comment.
I'm not entirely sure about the general statement "we shouldn't delete anything which is not created by ClusterAPI". In my opinion there are cases where it's totally valid. One case is that the cluster as a whole should be deleted and somebody has the clean up the loadbalancers which has been created by the cloud provider openstack. This is also done in CAPA and the ccm itself is unable to delete them as it doesn't exist anymore.
Are those service interruptions really an issue when the whole cluster is deleted? I mean of course somebody can build entirely different infrastructure in the same OpenStack tenant as the CAPO-managed cluster. But I"m not sure I want to support this.
There was a problem hiding this comment.
My use case was referring to a setup that I worked on, where a single openstack network (IPv6) was being used by multiple teams/products. While I was trying to use CAPI to setup a k8s cluster there, a few other teams had used the VIP approach for making their applications HA (since a loadbalancer wasn't available). Just FYI, the other products weren't k8s based, but legacy VM based systems/appliccations.
#723 was caused at the time of cluster creation where the port is created first, but if the image (or the ssh key or the flavor) does not exist, server creation would error out and the port would be left behind without any cleanup on server creation failure. This issue seems to have been addressed in #705 correctly (where a rollback is performed if the server creation failed). Similarly, we should try to fix the root cause instead of performing a general cleanup.
I think the correct way to fix this would be to figure out why this error occurred in the first place.
From the following code block: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/master/pkg/cloud/services/compute/instance.go#L631-L648, I don't see how/why the port would have been left behind while the server got deleted successfully. If we can find the root cause, we can fix the problem in a cleaner way.
There was a problem hiding this comment.
@rustycl0ck Thx for providing some more details :).
I think overall you're right. The better approach would be to cleanup the ports wherever they're orphaned. I'm currently testing if this PR also works without the DeleteOrphanedPorts func: #816
I assume this will be successful and then I'll drop the func on the current PR.
Out of curiosity (I'm not sure what is possible with OpenStack). Did you have:
- a separate project and where using the same project and network as the other teams or
- have you been using the same network but separate projects?
There was a problem hiding this comment.
It was a single project and same network, because all the applications (k8s hosted as well as legacy VM based) had to interact with each other (on a private network internally).
There was a problem hiding this comment.
As I feared the tests are now failing because they are timing out because the cluster deletion does not work: https://prow.k8s.io/log?container=test&id=1376784008490258432&job=pull-cluster-api-provider-openstack-e2e-test
I would suggest merging the PR anyway without the DeleteOrphanedPort func. I would rather have those tests with a known issue on master and unblock a lot of other PRs instead of having to rely only on the conformance tests.Those tests are not mandatory yet and I try to find the bug in a follow-up PR.
What do you think?
(@jichenjc )
There was a problem hiding this comment.
Got the exact results. Looks like the error only happens when server are not successfully created right now. I'll check if I can just fix this...
There was a problem hiding this comment.
Fixed it here: 2e012c6
Manual tests against a devstack running AWS were green. I'll squash and re-run the tests on this PR
2e012c6 to
9fa2481
Compare
|
/test pull-cluster-api-provider-openstack-e2e-test |
|
/lgtm thanks :) |
|
tests are green, so: |
What this PR does / why we need it:
This PR adds e2e tests for all our flavors (and some additional ones).
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #577
Fixes #723
Fixes #741
Notes:
This PR can also be reviewed but won't be merged before the test framework and the linter PR and also contains those commits/PRs
/hold