Skip to content

OCPBUGS-4466: Add check for compact-cluster install on GCP, AWS & Azure#8226

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
sadasu:sadasu-bug-4466
Apr 26, 2024
Merged

OCPBUGS-4466: Add check for compact-cluster install on GCP, AWS & Azure#8226
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
sadasu:sadasu-bug-4466

Conversation

@sadasu
Copy link
Contributor

@sadasu sadasu commented Apr 2, 2024

Add additional checks around detecting 0 worker machinesets when generating terraform.tfvars for compact clusters.

This bug was raised when https://issues.redhat.com/browse/CORS-2420 was tested. The associated feature https://issues.redhat.com/browse/OCPSTRAT-341 had epics for AWS, GCP, Azure and vSphere. Support for compact clusters was not implemented and hence this is a good way error out of the install with an informational message to the customer (instead of crashing).

@openshift-ci-robot openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Apr 2, 2024
@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-4466, which is invalid:

  • expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Add additional checks around detecting 0 worker machinesets when generating terraform.tfvars for compact clusters.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Apr 2, 2024
@openshift-ci openshift-ci bot requested review from andfasano and bfournie April 2, 2024 13:05
@sadasu
Copy link
Contributor Author

sadasu commented Apr 2, 2024

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-4466, which is invalid:

  • expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sadasu
Copy link
Contributor Author

sadasu commented Apr 2, 2024

/test ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 2, 2024

@sadasu: The following commands are available to trigger required jobs:

  • /test agent-integration-tests
  • /test altinfra-images
  • /test aro-unit
  • /test e2e-agent-compact-ipv4
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-edge-zones-manifest-validation
  • /test e2e-aws-ovn-upi
  • /test e2e-azure-ovn
  • /test e2e-azure-ovn-upi
  • /test e2e-gcp-ovn
  • /test e2e-gcp-ovn-upi
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-openstack-ovn
  • /test e2e-vsphere-ovn
  • /test e2e-vsphere-upi
  • /test gofmt
  • /test golint
  • /test govet
  • /test images
  • /test okd-images
  • /test okd-scos-images
  • /test okd-unit
  • /test okd-verify-codegen
  • /test openstack-manifests
  • /test shellcheck
  • /test terraform-images
  • /test terraform-verify-vendor
  • /test tf-lint
  • /test unit
  • /test verify-codegen
  • /test verify-vendor
  • /test yaml-lint

The following commands are available to trigger optional jobs:

  • /test altinfra-e2e-aws-custom-security-groups
  • /test altinfra-e2e-aws-ovn
  • /test altinfra-e2e-aws-ovn-fips
  • /test altinfra-e2e-aws-ovn-imdsv2
  • /test altinfra-e2e-aws-ovn-localzones
  • /test altinfra-e2e-aws-ovn-proxy
  • /test altinfra-e2e-aws-ovn-public-ipv4-pool
  • /test altinfra-e2e-aws-ovn-shared-vpc
  • /test altinfra-e2e-aws-ovn-shared-vpc-edge-zones
  • /test altinfra-e2e-aws-ovn-single-node
  • /test altinfra-e2e-aws-ovn-wavelengthzones
  • /test altinfra-e2e-azure-capi-ovn
  • /test altinfra-e2e-gcp-capi-ovn
  • /test altinfra-e2e-gcp-ovn-byo-network-capi
  • /test altinfra-e2e-gcp-ovn-secureboot-capi
  • /test altinfra-e2e-gcp-ovn-xpn-capi
  • /test altinfra-e2e-ibmcloud-capi-ovn
  • /test altinfra-e2e-nutanix-capi-ovn
  • /test altinfra-e2e-openstack-capi-ovn
  • /test altinfra-e2e-vsphere-capi-ovn
  • /test altinfra-e2e-vsphere-capi-static-ovn
  • /test altinfra-e2e-vsphere-capi-zones
  • /test azure-ovn-marketplace-images
  • /test e2e-agent-compact-ipv4-appliance
  • /test e2e-agent-compact-ipv4-appliance-diskimage
  • /test e2e-agent-compact-ipv4-none-platform
  • /test e2e-agent-ha-dualstack
  • /test e2e-agent-sno-ipv4-pxe
  • /test e2e-agent-sno-ipv6
  • /test e2e-aws-custom-security-groups
  • /test e2e-aws-overlay-mtu-ovn-1200
  • /test e2e-aws-ovn-edge-zones
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-imdsv2
  • /test e2e-aws-ovn-proxy
  • /test e2e-aws-ovn-public-ipv4-pool
  • /test e2e-aws-ovn-public-subnets
  • /test e2e-aws-ovn-shared-vpc
  • /test e2e-aws-ovn-shared-vpc-edge-zones
  • /test e2e-aws-ovn-single-node
  • /test e2e-aws-ovn-upgrade
  • /test e2e-aws-ovn-workers-rhel8
  • /test e2e-aws-upi-proxy
  • /test e2e-azure-ovn-resourcegroup
  • /test e2e-azure-ovn-shared-vpc
  • /test e2e-azurestack
  • /test e2e-azurestack-upi
  • /test e2e-crc
  • /test e2e-gcp-ovn-shared-vpc
  • /test e2e-gcp-ovn-xpn
  • /test e2e-gcp-secureboot
  • /test e2e-gcp-upgrade
  • /test e2e-gcp-upi-xpn
  • /test e2e-ibmcloud-ovn
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi-ovn
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-swapped-hosts
  • /test e2e-metal-ipi-ovn-virtualmedia
  • /test e2e-metal-single-node-live-iso
  • /test e2e-nutanix-ovn
  • /test e2e-openstack-ccpmso
  • /test e2e-openstack-ccpmso-zone
  • /test e2e-openstack-dualstack
  • /test e2e-openstack-dualstack-upi
  • /test e2e-openstack-externallb
  • /test e2e-openstack-nfv-intel
  • /test e2e-openstack-proxy
  • /test e2e-vsphere-static-ovn
  • /test e2e-vsphere-upi-zones
  • /test e2e-vsphere-zones
  • /test e2e-vsphere-zones-techpreview
  • /test okd-e2e-agent-compact-ipv4
  • /test okd-e2e-agent-ha-dualstack
  • /test okd-e2e-agent-sno-ipv6
  • /test okd-e2e-aws-ovn
  • /test okd-e2e-aws-ovn-upgrade
  • /test okd-e2e-gcp
  • /test okd-e2e-gcp-ovn-upgrade
  • /test okd-e2e-vsphere
  • /test okd-scos-e2e-aws-ovn
  • /test okd-scos-e2e-aws-upgrade
  • /test okd-scos-e2e-gcp
  • /test okd-scos-e2e-gcp-ovn-upgrade
  • /test okd-scos-e2e-vsphere
  • /test okd-scos-unit
  • /test okd-scos-verify-codegen
  • /test tf-fmt

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-installer-master-altinfra-images
  • pull-ci-openshift-installer-master-aro-unit
  • pull-ci-openshift-installer-master-e2e-aws-ovn
  • pull-ci-openshift-installer-master-gofmt
  • pull-ci-openshift-installer-master-golint
  • pull-ci-openshift-installer-master-govet
  • pull-ci-openshift-installer-master-images
  • pull-ci-openshift-installer-master-okd-e2e-aws-ovn-upgrade
  • pull-ci-openshift-installer-master-okd-images
  • pull-ci-openshift-installer-master-okd-scos-images
  • pull-ci-openshift-installer-master-okd-scos-unit
  • pull-ci-openshift-installer-master-okd-scos-verify-codegen
  • pull-ci-openshift-installer-master-okd-unit
  • pull-ci-openshift-installer-master-okd-verify-codegen
  • pull-ci-openshift-installer-master-shellcheck
  • pull-ci-openshift-installer-master-tf-fmt
  • pull-ci-openshift-installer-master-tf-lint
  • pull-ci-openshift-installer-master-unit
  • pull-ci-openshift-installer-master-verify-codegen
  • pull-ci-openshift-installer-master-verify-vendor
  • pull-ci-openshift-installer-master-yaml-lint
Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Contributor Author

sadasu commented Apr 2, 2024

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 2, 2024
@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-4466, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianli-wei

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from jianli-wei April 2, 2024 13:15
@sadasu sadasu force-pushed the sadasu-bug-4466 branch 2 times, most recently from bc1cef3 to aefd976 Compare April 8, 2024 20:59
@sadasu
Copy link
Contributor Author

sadasu commented Apr 8, 2024

/test e2e-gcp-ovn

@sadasu sadasu force-pushed the sadasu-bug-4466 branch from aefd976 to febec0a Compare April 9, 2024 13:55
@sadasu
Copy link
Contributor Author

sadasu commented Apr 9, 2024

/test e2e-gcp-ovn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we cannot check this during install-config validation because this happens after the manifests have been generated and (possibly) edited by users.
If this becomes a pattern, we should consider adding a manifest validation stage.

Copy link
Contributor Author

@sadasu sadasu Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r4f4 yes. I was surprised too that we allow customers to edit the manifests post creation but don't check if it has been modified in a sane way.

Also, this fix is added in the code path to generate terraform inputs. Let us invest the time to figure out how to add a manifest validation stage within the CAPI flow so we detect modified manifests and validate them.

@sadasu
Copy link
Contributor Author

sadasu commented Apr 22, 2024

/retest-required

1 similar comment
@sadasu
Copy link
Contributor Author

sadasu commented Apr 25, 2024

/retest-required

Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

This LGTM, but left a suggestion. Let me know if you need another tag.

fyi we can handle pre-provisioning validation in https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/platformprovisioncheck.go
but i think handling this here is perfectly fine.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 25, 2024
@sadasu
Copy link
Contributor Author

sadasu commented Apr 25, 2024

fyi we can handle pre-provisioning validation in https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/platformprovisioncheck.go but i think handling this here is perfectly fine.

I think this location has the advantage that it validates any updates to manifests too.

Copy link
Contributor

@r4f4 r4f4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you've decided to add the check for aws, azure, and gcp?

@sadasu sadasu changed the title OCPBUGS-4466: Improve error handling around compact-cluster install on GCP OCPBUGS-4466: Add check for compact-cluster install on GCP, AWS & Azure Apr 25, 2024
@sadasu
Copy link
Contributor Author

sadasu commented Apr 25, 2024

So you've decided to add the check for aws, azure, and gcp?

Yes, although the bug was raised only for GCP.

@sadasu
Copy link
Contributor Author

sadasu commented Apr 25, 2024

/retest

Copy link
Contributor

@r4f4 r4f4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// 2. workers = 0, masters schedulable, valid compact cluster but currently unsupported on GCP

Are we sure this statement also applies to AWS and Azure?

Worker machinesets generated, during create manifests can be deleted
before creating the cluster. Detect this case and take action based
on whether masters are marked as schedulable.
This fixes compact cluster installs on GCP, AWS and Azure.
@sadasu
Copy link
Contributor Author

sadasu commented Apr 25, 2024

// 2. workers = 0, masters schedulable, valid compact cluster but currently unsupported on GCP

Are we sure this statement also applies to AWS and Azure?

Yes. This bug was raised when https://issues.redhat.com/browse/CORS-2420 was tested. The associated feature https://issues.redhat.com/browse/OCPSTRAT-341 has epics for AWS, GCP, Azure and vSphere. Support for compact clusters was not implemented and hence this is a good way error out and inform the customer (instead of crashing).

@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-4466, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianli-wei

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Add additional checks around detecting 0 worker machinesets when generating terraform.tfvars for compact clusters.

This bug was raised when https://issues.redhat.com/browse/CORS-2420 was tested. The associated feature https://issues.redhat.com/browse/OCPSTRAT-341 had epics for AWS, GCP, Azure and vSphere. Support for compact clusters was not implemented and hence this is a good way error out of the install with an informational message to the customer (instead of crashing).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sadasu
Copy link
Contributor Author

sadasu commented Apr 25, 2024

/test e2e-gcp-ovn

Copy link
Contributor

@r4f4 r4f4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@sadasu
Copy link
Contributor Author

sadasu commented Apr 26, 2024

/test e2e-gcp-ovn

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 26, 2024
@sadasu
Copy link
Contributor Author

sadasu commented Apr 26, 2024

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 26, 2024

@sadasu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-e2e-aws-ovn-upgrade 94eeb5d link false /test okd-e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 35acc7c into openshift:master Apr 26, 2024
@openshift-ci-robot
Copy link
Contributor

@sadasu: Jira Issue OCPBUGS-4466: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-4466 has been moved to the MODIFIED state.

Details

In response to this:

Add additional checks around detecting 0 worker machinesets when generating terraform.tfvars for compact clusters.

This bug was raised when https://issues.redhat.com/browse/CORS-2420 was tested. The associated feature https://issues.redhat.com/browse/OCPSTRAT-341 had epics for AWS, GCP, Azure and vSphere. Support for compact clusters was not implemented and hence this is a good way error out of the install with an informational message to the customer (instead of crashing).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.16.0-0.nightly-2024-04-29-222758

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants