Skip to content

Comments

[CORS-2666] Add Nutanix support for CPMSO#200

Merged
openshift-merge-robot merged 1 commit intoopenshift:mainfrom
thunderboltsid:add-ntnx-cpmso
Jun 12, 2023
Merged

[CORS-2666] Add Nutanix support for CPMSO#200
openshift-merge-robot merged 1 commit intoopenshift:mainfrom
thunderboltsid:add-ntnx-cpmso

Conversation

@thunderboltsid
Copy link
Contributor

@thunderboltsid thunderboltsid commented Apr 21, 2023

This adds basic support for Nutanix control plane machines for the purpose of self-healing control plane nodes. We would enhance this at a later point to add failure domain support.

How was this tested?

  1. Create a CPMSO image with the changes in this PR and use the installer built from installer#7119
  2. Create a new release image based on a recent 4.14 release image using oc adm release new --from-release registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-04-19-125337 cluster-control-plane-machine-set-operator=image-registry:cluster-control-plane-machine-set-operator --to-image image-registry:release and set the release image override environment variable to the newly created release image using export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=image-registry:release
  3. Create a install-config using openshift-installer create install-config
  4. Run openshift-installer create manifests and verify ControlPlaneMachineSet manifest generated is as expected
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  creationTimestamp: null
  labels:
    machine.openshift.io/cluster-api-cluster: openshiftsid-wvs7z
  name: cluster
  namespace: openshift-machine-api
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: openshiftsid-wvs7z
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  strategy: {}
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        platform: ""
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: openshiftsid-wvs7z
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        lifecycleHooks: {}
        metadata: {}
        providerSpec:
          value:
            apiVersion: machine.openshift.io/v1
            bootType: ""
            categories: null
            cluster:
              type: uuid
              uuid: 0005b0f1-8f43-a0f2-02b7-3cecef193712
            credentialsSecret:
              name: nutanix-credentials
            image:
              name: openshiftsid-wvs7z-rhcos
              type: name
            kind: NutanixMachineProviderConfig
            memorySize: 16Gi
            metadata:
              creationTimestamp: null
            project:
              type: ""
            subnets:
            - type: uuid
              uuid: c7938dc6-7659-453e-a688-e26020c68e43
            systemDiskSize: 120Gi
            userDataSecret:
              name: master-user-data
            vcpuSockets: 8
            vcpusPerSocket: 1
status: {}
  1. Create a cluster successfully using openshift-install create cluster
$ oc get machines -A
NAMESPACE               NAME                              PHASE         TYPE   REGION    ZONE    AGE
openshift-machine-api   openshiftsid-v2tv8-master-0       Running       AHV    Unnamed   ganon   22m
openshift-machine-api   openshiftsid-v2tv8-master-1       Running       AHV    Unnamed   ganon   22m
openshift-machine-api   openshiftsid-v2tv8-master-2       Running       AHV    Unnamed   ganon   22m
openshift-machine-api   openshiftsid-v2tv8-worker-6xc8s   Running       AHV    Unnamed   ganon   12m
openshift-machine-api   openshiftsid-v2tv8-worker-c57mc   Running       AHV    Unnamed   ganon   12m
openshift-machine-api   openshiftsid-v2tv8-worker-pnc5c   Running   AHV    Unnamed   ganon   12m
  1. Verify that the ControlPlaneMachineSet is Active and deleting ControlPlane and Worker machines leads to creation of new ones
$ oc -n openshift-machine-api get controlplanemachineset
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE    AGE
cluster   3         3         3       3                       Active   75m

$ oc -n openshift-machine-api delete machine openshiftsid-v2tv8-master-0
machine.machine.openshift.io "openshiftsid-v2tv8-master-0" deleted

$ oc -n openshift-machine-api delete machine openshiftsid-v2tv8-worker-pnc5c
machine.machine.openshift.io "openshiftsid-v2tv8-worker-pnc5c" deleted

$ oc -n openshift-machine-api get machines
NAME                                PHASE     TYPE   REGION    ZONE    AGE
openshiftsid-v2tv8-master-1         Running   AHV    Unnamed   ganon   75m
openshiftsid-v2tv8-master-2         Running   AHV    Unnamed   ganon   75m
openshiftsid-v2tv8-master-4rpfd-0   Running   AHV    Unnamed   ganon   24m
openshiftsid-v2tv8-worker-6xc8s     Running   AHV    Unnamed   ganon   65m
openshiftsid-v2tv8-worker-c57mc     Running   AHV    Unnamed   ganon   65m
openshiftsid-v2tv8-worker-ps5tn     Running   AHV    Unnamed   ganon   21m

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 21, 2023
@openshift-ci openshift-ci bot requested review from damdo and odvarkadaniel April 21, 2023 11:55
@thunderboltsid thunderboltsid force-pushed the add-ntnx-cpmso branch 3 times, most recently from 9ed95a8 to 57ea80a Compare April 21, 2023 12:36
@thunderboltsid
Copy link
Contributor Author

/test e2e-nutanix-ovn
/test e2e-nutanix-sdn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 24, 2023

@thunderboltsid: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test e2e-aws-operator
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-etcd-scaling
  • /test e2e-aws-ovn-serial
  • /test e2e-aws-ovn-upgrade
  • /test fmt
  • /test generate
  • /test images
  • /test lint
  • /test unit
  • /test vendor
  • /test vet

The following commands are available to trigger optional jobs:

  • /test e2e-aws-periodic-pre
  • /test e2e-azure-operator
  • /test e2e-azure-ovn-etcd-scaling
  • /test e2e-gcp-operator
  • /test e2e-gcp-ovn-etcd-scaling
  • /test e2e-vsphere-ovn-etcd-scaling

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-e2e-aws-operator
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-e2e-aws-ovn
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-e2e-aws-ovn-etcd-scaling
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-e2e-aws-ovn-serial
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-e2e-aws-ovn-upgrade
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-fmt
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-generate
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-images
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-lint
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-unit
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-vendor
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-vet
Details

In response to this:

/test e2e-nutanix-ovn
/test e2e-nutanix-sdn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@thunderboltsid thunderboltsid changed the title [WIP] Add Nutanix support for CPMSO Add Nutanix support for CPMSO Apr 24, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 24, 2023
@thunderboltsid
Copy link
Contributor Author

thunderboltsid commented Apr 24, 2023

@damdo @JoelSpeed Any suggestions regarding the linting error?

pkg/controllers/controlplanemachinesetgenerator/utils.go:63:40: `genericControlPlaneMachineSetSpec` - `replicas` always receives `replicas` (`3`) (unparam)
[18](https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-control-plane-machine-set-operator/200/pull-ci-openshift-cluster-control-plane-machine-set-operator-main-lint/1649391925758791680#1:build-log.txt%3A18)
func genericControlPlaneMachineSetSpec(replicas int32, clusterID string) machinev1builder.ControlPlaneMachineSetSpecApplyConfiguration {

Should we just refactor that function to remove replicas as a parameter?

@thunderboltsid
Copy link
Contributor Author

/assign @damdo
/assign @JoelSpeed

@thunderboltsid
Copy link
Contributor Author

/test e2e-nutanix-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 28, 2023

@thunderboltsid: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test e2e-aws-operator
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-etcd-scaling
  • /test e2e-aws-ovn-serial
  • /test e2e-aws-ovn-upgrade
  • /test fmt
  • /test generate
  • /test images
  • /test lint
  • /test unit
  • /test vendor
  • /test vet

The following commands are available to trigger optional jobs:

  • /test e2e-aws-periodic-pre
  • /test e2e-azure-operator
  • /test e2e-azure-ovn-etcd-scaling
  • /test e2e-gcp-operator
  • /test e2e-gcp-ovn-etcd-scaling
  • /test e2e-openstack-operator
  • /test e2e-vsphere-ovn-etcd-scaling

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-e2e-aws-operator
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-e2e-aws-ovn
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-e2e-aws-ovn-etcd-scaling
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-e2e-aws-ovn-serial
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-e2e-aws-ovn-upgrade
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-fmt
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-generate
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-images
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-lint
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-unit
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-vendor
  • pull-ci-openshift-cluster-control-plane-machine-set-operator-main-vet
Details

In response to this:

/test e2e-nutanix-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JoelSpeed
Copy link
Contributor

@damdo @JoelSpeed Any suggestions regarding the linting error?

pkg/controllers/controlplanemachinesetgenerator/utils.go:63:40: `genericControlPlaneMachineSetSpec` - `replicas` always receives `replicas` (`3`) (unparam)
[18](https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-control-plane-machine-set-operator/200/pull-ci-openshift-cluster-control-plane-machine-set-operator-main-lint/1649391925758791680#1:build-log.txt%3A18)
func genericControlPlaneMachineSetSpec(replicas int32, clusterID string) machinev1builder.ControlPlaneMachineSetSpecApplyConfiguration {

Should we just refactor that function to remove replicas as a parameter?

Lets add a nolint for now, in the future we may expand to support 5 replicas so we will want the parameter there

@JoelSpeed
Copy link
Contributor

Can we please make sure tests are added for Nutanix, similar to the tests for the other platforms

@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 25, 2023
@thunderboltsid
Copy link
Contributor Author

/retest-required

@thunderboltsid thunderboltsid changed the title Add Nutanix support for CPMSO [CORS-2666] Add Nutanix support for CPMSO May 26, 2023
@thunderboltsid
Copy link
Contributor Author

/retest-required

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a linter unparam complaint, you can add a nolint:unparam to that line to fix that, otherwise I think this is good to go

@thunderboltsid
Copy link
Contributor Author

There's a linter unparam complaint, you can add a nolint:unparam to that line to fix that, otherwise I think this is good to go

Thanks for the suggestion. The lint check is now passing. Waiting on the e2e now.

@thunderboltsid
Copy link
Contributor Author

/retest-required

@thunderboltsid
Copy link
Contributor Author

/test e2e-nutanix-ovn

@thunderboltsid
Copy link
Contributor Author

/test e2e-nutanix-ovn

@openshift-ci openshift-ci bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 3, 2023
@thunderboltsid thunderboltsid force-pushed the add-ntnx-cpmso branch 2 times, most recently from 59f55f0 to 7380fda Compare June 3, 2023 23:38
@thunderboltsid
Copy link
Contributor Author

@JoelSpeed @damdo Added NutanixProviderConfig type in order to make the e2e tests pass. PTAL

This adds basic support for Nutanix control plane machines for the purpose of
self-healing control plane nodes. We would enhance this at a later point to add
failure domain support.
@thunderboltsid
Copy link
Contributor Author

/retest-required

3 similar comments
@thunderboltsid
Copy link
Contributor Author

/retest-required

@thunderboltsid
Copy link
Contributor Author

/retest-required

@damdo
Copy link
Member

damdo commented Jun 5, 2023

/retest-required

@damdo
Copy link
Member

damdo commented Jun 5, 2023

/test e2e-azure-operator

@damdo
Copy link
Member

damdo commented Jun 5, 2023

/test e2e-azure-ovn-etcd-scaling

@thunderboltsid
Copy link
Contributor Author

thunderboltsid commented Jun 5, 2023

/test e2e-azure-ovn-etcd-scaling

This doesn't seem to be marked as required and looking at the job history, it barely ever passes. Any reason we're retesting this?

update: seems to have passed.

@JoelSpeed
Copy link
Contributor

The E2E job name should be e2e-nutanix-operator not e2e-nutanix-ovn, but that can be fixed separately, tests are green, no additional feedback on the code from me
/approve
/assign @damdo for final LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 5, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 5, 2023

@thunderboltsid: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn-etcd-scaling bd95749 link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-openstack-operator bd95749 link false /test e2e-openstack-operator

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your efforts here @thunderboltsid !
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 5, 2023
@deepsm007
Copy link

/label jra/valid-bug

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 12, 2023

@deepsm007: The label(s) /label jra/valid-bug cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, downstream-change-needed, rebase/manual, approved, backport-risk-assessed, bugzilla/valid-bug, cherry-pick-approved, jira/valid-bug, staff-eng-approved. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

Details

In response to this:

/label jra/valid-bug

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@deepsm007
Copy link

/label jira/valid-bug

@openshift-ci openshift-ci bot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Jun 12, 2023
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD cc291dc and 2 for PR HEAD bd95749 in total

@openshift-merge-robot openshift-merge-robot merged commit 684bae1 into openshift:main Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants