Skip to content

Conversation

@barbacbd
Copy link
Contributor

installconfig/gcp:

** Add a check during Validate() for the base domain of the public zone.
** Validation tests updated.

@openshift-ci-robot openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jul 17, 2025
@openshift-ci-robot
Copy link
Contributor

@barbacbd: This pull request references Jira Issue OCPBUGS-59430, which is invalid:

  • expected the bug to target the "4.20.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

installconfig/gcp:

** Add a check during Validate() for the base domain of the public zone.
** Validation tests updated.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from patrickdillon and tthvo July 17, 2025 14:29
@barbacbd
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jul 17, 2025
@openshift-ci-robot
Copy link
Contributor

@barbacbd: This pull request references Jira Issue OCPBUGS-59430, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianli-wei

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from jianli-wei July 17, 2025 14:41
Copy link
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good when I tested with an invalid baseDomain: The installer fails with an error.

$ ./openshift-install create manifests --dir=. --log-level=debug
DEBUG OpenShift Installer unreleased-master-12069-gd54eceed15eab76361351f681b6975a65926c507-dirty 
DEBUG Built from commit d54eceed15eab76361351f681b6975a65926c507 
DEBUG Fetching Master Machines...                  
DEBUG Loading Master Machines...                   
DEBUG   Loading Cluster ID...                      
DEBUG     Loading Install Config...                
DEBUG       Loading SSH Key...                     
DEBUG       Loading Base Domain...                 
DEBUG         Loading Platform...                  
DEBUG       Loading Cluster Name...                
DEBUG         Loading Base Domain...               
DEBUG         Loading Platform...                  
DEBUG       Loading Pull Secret...                 
DEBUG       Loading Platform...                    
WARNING Release Image Architecture not detected. Release Image Architecture is unknown 
INFO Credentials loaded from file "/home/thvo/.gcp/osServiceAccount.json" 
ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: baseDomain: Invalid value: "gcp.devcluster.openshift.com": baseDomain: Internal error: no matching public DNS Zone found 

I just have some small questions below :D

allErrs = append(allErrs, ValidateCredentialMode(client, ic)...)
allErrs = append(allErrs, validatePreexistingServiceAccount(client, ic)...)
if err := ValidatePreExistingPublicDNS(client, ic); err != nil {
allErrs = append(allErrs, field.Invalid(field.NewPath("baseDomain"), ic.BaseDomain, err.Error()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this block was never reached if I used a non-existing domain.

if IsNotFound(err) {
return field.NotFound(field.NewPath("baseDomain"), fmt.Sprintf("Private DNS Zone (%s/%s)", ic.Platform.GCP.ProjectID, ic.BaseDomain))
}

I guess the returned error is a custom error instead of a gcpError.

return nil, errors.New("no matching public DNS Zone found")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fix the error check too? Or since the installer fails with error anyway, it's no big deal? 🤔

Copy link
Contributor

@sadasu sadasu Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the error check in ValidatePreExistingPublicDNS() is similar to the one done in ValidatePrivateDNSZone().
Within ValidatePreExistingPublicDNS(), should

if IsNotFound(err) {

at https://github.com/openshift/installer/blob/main/pkg/asset/installconfig/gcp/validation.go#L401 be

if zone == nil {

Also, in the same method, should checkRecordSets() be called when zone != nil ? I concluded that after reading https://github.com/openshift/installer/blob/main/pkg/asset/installconfig/gcp/validation.go#L389-L392

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also lean towards reporting the error as "not-found" instead of "internalError" 🤔

ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config:baseDomain: Invalid value: "gcp.devcluster.openshift.com": baseDomain: Internal error: no matching public DNS Zone found

Though, the final message seems clear enough to know the basedomain doesn't exist...I am OK either way :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sadasu If you look at GetDNSZone there is a reason that the public and private are similar but not the same. If we go searching for a public zone we must find it other wise it is an error (this is only run during external installs). If we go looking for a private zone and it does not exist then no harm no foul. That means we are safe to create one. So it is subtle but also very different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tthvo I switched to a NotFound error. The reason I used InternalError is that it allows you to add an error message where the NotFound error type does not.

gcpClient.EXPECT().GetKeyRing(gomock.Any(), &validKeyRing).Return(validKeyRingRet, nil).AnyTimes()
gcpClient.EXPECT().GetKeyRing(gomock.Any(), &invalidKeyRing).Return(nil, fmt.Errorf("failed to find key ring invalidKeyRingName: data")).AnyTimes()

gcpClient.EXPECT().GetDNSZone(gomock.Any(), validProjectName, validBaseDomain, true).Return(&dns.ManagedZone{Name: "zone-name"}, nil).AnyTimes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we can use the variable validPublicDNSZone (i.e zone name is defined in validPublicZone) already defined above for these mocks?

gcpClient.EXPECT().GetKeyRing(gomock.Any(), &invalidKeyRing).Return(nil, fmt.Errorf("failed to find key ring invalidKeyRingName: data")).AnyTimes()

gcpClient.EXPECT().GetDNSZone(gomock.Any(), validProjectName, validBaseDomain, true).Return(&dns.ManagedZone{Name: "zone-name"}, nil).AnyTimes()
gcpClient.EXPECT().GetDNSZone(gomock.Any(), invalidProjectName, validBaseDomain, true).Return(&dns.ManagedZone{Name: "zone-name"}, nil).AnyTimes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the project is invalid (i.e. using invalidProjectName), this should mock return error, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no that isnt the point of that test. The reason I had to throw that in there was because of another test failing for other reasons. We could change it but I didnt want to change what that test was running.

Copy link
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 17, 2025
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 18, 2025
@patrickdillon
Copy link
Contributor

This validation should probably be part of the platform provisioning check rather than the installconfig asset validation. If it’s part of the asset check that requires the DNS zone to exist when validating the installconfig, which may be overly strict for UPI. With UPI you could create the domain after the installconfig.

@jianli-wei
Copy link
Contributor

/verified-by

@jianli-wei
Copy link
Contributor

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Jul 21, 2025
@openshift-ci-robot
Copy link
Contributor

@barbacbd: This pull request references Jira Issue OCPBUGS-59430, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianli-wei

Details

In response to this:

installconfig/gcp:

** Add a check during Validate() for the base domain of the public zone.
** Validation tests updated.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jianli-wei
Copy link
Contributor

/verified by jiwei

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jul 21, 2025
@openshift-ci-robot
Copy link
Contributor

@jianli-wei: This PR has been marked as verified by jiwei. Jira issue(s) in the title of this PR will be moved to the VERIFIED state on merge.

Details

In response to this:

/verified by jiwei

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation should probably be part of the platform provisioning check rather than the installconfig asset validation. If it’s part of the asset check that requires the DNS zone to exist when validating the installconfig, which may be overly strict for UPI. With UPI you could create the domain after the installconfig.

No, nevermind, I was wrong. To correctly generate the DNS manifests we need to make API calls, so even for UPI the public basedomain needs to exist when create manifests is called. Adding the validation as is done here is the correct approach.

Just one small change requested and this looks good to me.

@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Jul 22, 2025
…alid

installconfig/gcp:

** Add a check during Validate() for the base domain of the public zone.
** Validation tests updated.
@barbacbd
Copy link
Contributor Author

With the latest, the NotFound Error check will make sense. The following example output was observed when the base domain was purposefully bad in the install-config.

ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: baseDomain: Not found: "Public DNS Zone (openshift-dev-installer/badinstaller.gcp.devcluster.openshift.com)" 

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 24, 2025

@barbacbd: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn-multi-network 1c9589a link false /test e2e-vsphere-ovn-multi-network
ci/prow/e2e-vsphere-externallb-ovn 1c9589a link false /test e2e-vsphere-externallb-ovn
ci/prow/e2e-vsphere-host-groups-ovn-custom-no-upgrade 1c9589a link false /test e2e-vsphere-host-groups-ovn-custom-no-upgrade
ci/prow/e2e-vsphere-ovn-multi-network-techpreview 1c9589a link false /test e2e-vsphere-ovn-multi-network-techpreview
ci/prow/e2e-gcp-secureboot 1c9589a link false /test e2e-gcp-secureboot

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@patrickdillon
Copy link
Contributor

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 24, 2025
Copy link
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

I can reproduce the outcome too :D

ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: baseDomain: Not found: "Public DNS Zone (openshift-dev-installer/idk.this.domain)"

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit fc9c0c6 into openshift:main Jul 24, 2025
21 of 26 checks passed
@openshift-ci-robot
Copy link
Contributor

@barbacbd: Jira Issue OCPBUGS-59430: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-59430 has been moved to the MODIFIED state.

Details

In response to this:

installconfig/gcp:

** Add a check during Validate() for the base domain of the public zone.
** Validation tests updated.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer
This PR has been included in build ose-installer-container-v4.20.0-202507241814.p0.gfc9c0c6.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-baremetal-installer
This PR has been included in build ose-baremetal-installer-container-v4.20.0-202507241814.p0.gfc9c0c6.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-artifacts
This PR has been included in build ose-installer-artifacts-container-v4.20.0-202507241814.p0.gfc9c0c6.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants