Skip to content

Conversation

@mdbooth
Copy link
Contributor

@mdbooth mdbooth commented Sep 3, 2021

What this PR does / why we need it:

Fixes multi-network support by making 'primary' IP selection deterministic.

Currently contains only an E2E test which reproduces the bug and therefore fails. Note that the current behaviour is non-deterministic, so it is possible the test may occasionally pass. It should usually fail, though.

Fixes #926

/hold

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 3, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mdbooth
To complete the pull request process, please assign detiber after the PR has been reviewed.
You can assign the PR to them by writing /assign @detiber in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mdbooth
Copy link
Contributor Author

mdbooth commented Sep 3, 2021

/test

@k8s-ci-robot
Copy link
Contributor

@mdbooth: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-openstack-build
  • /test pull-cluster-api-provider-openstack-test
  • /test pull-cluster-api-provider-openstack-e2e-test

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-openstack-conformance-test

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-openstack-build
  • pull-cluster-api-provider-openstack-test
  • pull-cluster-api-provider-openstack-e2e-test
Details

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mdbooth
Copy link
Contributor Author

mdbooth commented Sep 3, 2021

/test all

@mdbooth
Copy link
Contributor Author

mdbooth commented Sep 3, 2021

Something doesn't look right here: the multi-network test appears to fail (successfully), but the suite reports that capo-e2e: e2e tests MachineDeployment misconfigurations Should fail to create MachineDeployment with invalid subnet or invalid availability zone failed instead. Need to investigate.

@jichenjc
Copy link
Contributor

jichenjc commented Sep 5, 2021

I took a quick look at test result, seems like a familiar pattern, I knew our test is tested in parellel so not sure it's affected by newly added test case..

let's give another try

@jichenjc
Copy link
Contributor

jichenjc commented Sep 5, 2021

/test pull-cluster-api-provider-openstack-e2e-test

@mdbooth
Copy link
Contributor Author

mdbooth commented Sep 6, 2021

I looked at the logs, and it's kinda interesting. We seem to get a different IP every time we reconcile an OpenStackMachine, so you see the logs full of deleting an LB member because its IP address changed and then failing to add a new one. However, eventually it picks the right one and succeeds. Each member will eventually come up, but it won't stay up. This is enough to make the create/delete cluster test pass, though.

I don't want to over-complicate this test just to exercise a weird non-deterministic bug which we're about to fix. It's still a reasonable test IMHO, so I'll probably leave it. I might look at a unit test instead.

@jichenjc
Copy link
Contributor

jichenjc commented Sep 7, 2021

/test pull-cluster-api-provider-openstack-e2e-test

ok, let's give another try, and need check the logs you mentioned so that we can open an issue to track

@k8s-ci-robot
Copy link
Contributor

@mdbooth: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
pull-cluster-api-provider-openstack-e2e-test df88060 link /test pull-cluster-api-provider-openstack-e2e-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mdbooth
Copy link
Contributor Author

mdbooth commented Sep 7, 2021

It's ok, now that I think of it the behaviour is obvious and expected given the bug that we're already tracking in #926. This test is really just intended to reproduce that bug. It does reproduce it, but it still doesn't cause the test to fail.

You can find examples here: https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-openstack/984/pull-cluster-api-provider-openstack-e2e-test/1435117248539791360/artifacts/clusters/bootstrap/controllers/capo-controller-manager/capo-controller-manager-6c94d569b7-gjwp5/manager.log

e.g.

I0907 06:36:37.880119       1 loadbalancer.go:279] controller-runtime/manager/controller/openstackmachine "msg"="Deleting load balancer member (because the IP of the machine changed)" "cluster"="cluster-e2e-rsuqi3" "machine"="cluster-e2e-rsuqi3-control-plane-hvjf6" "namespace"="e2e-rsuqi3" "openStackCluster"="cluster-e2e-rsuqi3" "openStackMachine"="cluster-e2e-rsuqi3-control-plane-6vj5q" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackMachine" "name"="k8s-clusterapi-cluster-e2e-rsuqi3-cluster-e2e-rsuqi3-kubeapi-6443-cluster-e2e-rsuqi3-control-plane-6vj5q"

@mdbooth
Copy link
Contributor Author

mdbooth commented Sep 30, 2021

Closed in favour of #1004

@mdbooth mdbooth closed this Sep 30, 2021
@mdbooth mdbooth deleted the multi-network branch September 30, 2021 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Instance.IP is set randomly when a server has multiple networks

3 participants