Skip to content

Conversation

@mandre
Copy link
Member

@mandre mandre commented Nov 6, 2024

Supersedes #326 and #330

k8s-ci-robot and others added 30 commits April 14, 2024 23:24
The v1alpha7 tests were previously testing v1alpha6.
…_outdated_warnings

📖  remove outdated warnings
This commit makes security linting easier by never setting a TLS version
outside v1.2 or v1.3, even in case of an unacceptable user input.
🌱 Refactoring: never assign unacceptable TLS versions
…sion-proposal

📖 Proposal for microversion support
Also adds fuzz tests which would have caught this and any similar
issues.
🐛 Fix webhook panic when adding managed security groups
🐛 Remove a duplication for setting default port settings
It's some work we're preparing to build a new controller in charge of
creating servers and their dependencies (ports, etc). We don't want it
to depent on the OpenStack Machine object.
Drop this after v0.11, as these are no longer present in main.

Generated with:

    cd hack/codegen
    go mod tidy
    go work vendor
    cd ../../orc/hack/codegen
    go mod tidy
    go work vendor
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 6, 2024
@openshift-ci-robot
Copy link

@mandre: This pull request references Jira Issue OCPBUGS-43892, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.0) matches configured target version for branch (4.18.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Supersedes #326 and #330

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Nov 6, 2024
@openshift-ci openshift-ci bot requested review from MaysaMacedo and gryf November 6, 2024 14:15
return nil, fmt.Errorf("listing control plane machines: %w", err)
}

if mapiMachines.Items == nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be

  if len(mapiMachines.Items) == 0 {

Comparing a slice to nil is almost never correct.

When getting the subnet from machines, let's return an error if no
control plane machine was found. This will make it clearer when
debugging.
@mandre
Copy link
Member Author

mandre commented Nov 8, 2024

Looking at the logs from the last run, I see a couple of issues:

Errors in capo-controller-manager

E1107 04:01:08.489937       1 kind.go:71] "if kind is a CRD, it should be installed before calling Start" err="no matches for kind \"Image\" in version \"openstack.k-orc.cloud/v1alpha1\"" logger="controller-runtime.source.EventHandler" kind="Image.openstack.k-orc.cloud"

Also some cert issues

2024/11/07 04:01:13 http: TLS handshake error from 10.129.0.2:33060: remote error: tls: bad certificate

Eventually, CAPO stops:

I1107 04:03:08.481743       1 server.go:249] "Shutting down webhook server with timeout of 1 minute" logger="controller-runtime.webhook"
I1107 04:03:08.482153       1 internal.go:544] "Stopping and waiting for HTTP servers"
I1107 04:03:08.482178       1 server.go:68] "shutting down server" name="health probe" addr="[::]:9440"
I1107 04:03:08.482205       1 server.go:254] "Shutting down metrics server with timeout of 1 minute" logger="controller-runtime.metrics"
I1107 04:03:08.482229       1 internal.go:548] "Wait completed, proceeding to shutdown the manager"
E1107 04:03:08.487353       1 main.go:285] "problem running manager" err="failed to wait for image caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.Image" logger="setup"

That likely explains the webhook errors we see in the capi-controller-manager logs:

E1107 03:37:25.937789       1 controller.go:329] "Reconciler error" err="failed to update Machines: failed to update InfrastructureMachine openshift-cluster-api/openstack-machineset-zqtqb: failed to update openshift-cluster-api/openstack-machineset-zqtqb: failed to apply OpenStackMachine openshift-cluster-api/openstack-machineset-zqtqb: Internal error occurred: failed calling webhook \"validation.openstackmachine.infrastructure.cluster.x-k8s.io\": failed to call webhook: Post \"https://capo-webhook-service.openshift-cluster-api.svc:443/validate-infrastructure-cluster-x-k8s-io-v1beta1-openstackmachine?timeout=10s\": no endpoints available for service \"capo-webhook-service\"" controller="machineset" controllerGroup="cluster.x-k8s.io" controllerKind="MachineSet" MachineSet="openshift-cluster-api/openstack-machineset" namespace="openshift-cluster-api" name="openstack-machineset" reconcileID="b55c74c9-f2b5-4dd5-af07-939d897ee5ec"

@mdbooth
Copy link

mdbooth commented Nov 8, 2024

We load the ORC CRDs from the default kustomization base in CAPO here: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/7f7f0f0fd51307bd9e87925212d6c672f625dfbd/config/default/kustomization.yaml#L16-L17

However, I vaguely recall we don't use the default kustomization base downstream because it pulls in too many things we don't want which we then have to take out again. i.e. It's less code to duplicate it. If that's correct, we probably just need to do the same for ORC in our downstream kustomization base.

@openshift-ci
Copy link

openshift-ci bot commented Nov 8, 2024

@mandre: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security 44509be link false /test security

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@EmilienM
Copy link
Member

EmilienM commented Nov 8, 2024

/approve

@EmilienM
Copy link
Member

EmilienM commented Nov 8, 2024

discussed with Matt offline, we decided to ship it.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 8, 2024
@openshift-ci
Copy link

openshift-ci bot commented Nov 8, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: EmilienM

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 8, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit f8d9625 into openshift:main Nov 8, 2024
@openshift-merge-bot openshift-merge-bot bot deleted the sync-main-martin branch November 8, 2024 14:30
@openshift-ci-robot
Copy link

@mandre: Jira Issue OCPBUGS-43892: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-43892 has been moved to the MODIFIED state.

Details

In response to this:

Supersedes #326 and #330

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mdbooth
Copy link

mdbooth commented Nov 8, 2024

After the fact, but
/lgtm

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: openstack-cluster-api-controllers
This PR has been included in build openstack-cluster-api-controllers-container-v4.18.0-202411081611.p0.gf8d9625.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.