-
Notifications
You must be signed in to change notification settings - Fork 1.5k
OCPBUGS-59430: Fail install config validation when base domain is invalid #9840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@barbacbd: This pull request references Jira Issue OCPBUGS-59430, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@barbacbd: This pull request references Jira Issue OCPBUGS-59430, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
tthvo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good when I tested with an invalid baseDomain: The installer fails with an error.
$ ./openshift-install create manifests --dir=. --log-level=debug
DEBUG OpenShift Installer unreleased-master-12069-gd54eceed15eab76361351f681b6975a65926c507-dirty
DEBUG Built from commit d54eceed15eab76361351f681b6975a65926c507
DEBUG Fetching Master Machines...
DEBUG Loading Master Machines...
DEBUG Loading Cluster ID...
DEBUG Loading Install Config...
DEBUG Loading SSH Key...
DEBUG Loading Base Domain...
DEBUG Loading Platform...
DEBUG Loading Cluster Name...
DEBUG Loading Base Domain...
DEBUG Loading Platform...
DEBUG Loading Pull Secret...
DEBUG Loading Platform...
WARNING Release Image Architecture not detected. Release Image Architecture is unknown
INFO Credentials loaded from file "/home/thvo/.gcp/osServiceAccount.json"
ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: baseDomain: Invalid value: "gcp.devcluster.openshift.com": baseDomain: Internal error: no matching public DNS Zone found I just have some small questions below :D
| allErrs = append(allErrs, ValidateCredentialMode(client, ic)...) | ||
| allErrs = append(allErrs, validatePreexistingServiceAccount(client, ic)...) | ||
| if err := ValidatePreExistingPublicDNS(client, ic); err != nil { | ||
| allErrs = append(allErrs, field.Invalid(field.NewPath("baseDomain"), ic.BaseDomain, err.Error())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this block was never reached if I used a non-existing domain.
installer/pkg/asset/installconfig/gcp/validation.go
Lines 419 to 421 in 97030df
| if IsNotFound(err) { | |
| return field.NotFound(field.NewPath("baseDomain"), fmt.Sprintf("Private DNS Zone (%s/%s)", ic.Platform.GCP.ProjectID, ic.BaseDomain)) | |
| } |
I guess the returned error is a custom error instead of a gcpError.
| return nil, errors.New("no matching public DNS Zone found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we fix the error check too? Or since the installer fails with error anyway, it's no big deal? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the error check in ValidatePreExistingPublicDNS() is similar to the one done in ValidatePrivateDNSZone().
Within ValidatePreExistingPublicDNS(), should
if IsNotFound(err) {
if zone == nil {
Also, in the same method, should checkRecordSets() be called when zone != nil ? I concluded that after reading https://github.com/openshift/installer/blob/main/pkg/asset/installconfig/gcp/validation.go#L389-L392
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also lean towards reporting the error as "not-found" instead of "internalError" 🤔
ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config:baseDomain: Invalid value: "gcp.devcluster.openshift.com": baseDomain: Internal error: no matching public DNS Zone foundThough, the final message seems clear enough to know the basedomain doesn't exist...I am OK either way :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sadasu If you look at GetDNSZone there is a reason that the public and private are similar but not the same. If we go searching for a public zone we must find it other wise it is an error (this is only run during external installs). If we go looking for a private zone and it does not exist then no harm no foul. That means we are safe to create one. So it is subtle but also very different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tthvo I switched to a NotFound error. The reason I used InternalError is that it allows you to add an error message where the NotFound error type does not.
| gcpClient.EXPECT().GetKeyRing(gomock.Any(), &validKeyRing).Return(validKeyRingRet, nil).AnyTimes() | ||
| gcpClient.EXPECT().GetKeyRing(gomock.Any(), &invalidKeyRing).Return(nil, fmt.Errorf("failed to find key ring invalidKeyRingName: data")).AnyTimes() | ||
|
|
||
| gcpClient.EXPECT().GetDNSZone(gomock.Any(), validProjectName, validBaseDomain, true).Return(&dns.ManagedZone{Name: "zone-name"}, nil).AnyTimes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we can use the variable validPublicDNSZone (i.e zone name is defined in validPublicZone) already defined above for these mocks?
| gcpClient.EXPECT().GetKeyRing(gomock.Any(), &invalidKeyRing).Return(nil, fmt.Errorf("failed to find key ring invalidKeyRingName: data")).AnyTimes() | ||
|
|
||
| gcpClient.EXPECT().GetDNSZone(gomock.Any(), validProjectName, validBaseDomain, true).Return(&dns.ManagedZone{Name: "zone-name"}, nil).AnyTimes() | ||
| gcpClient.EXPECT().GetDNSZone(gomock.Any(), invalidProjectName, validBaseDomain, true).Return(&dns.ManagedZone{Name: "zone-name"}, nil).AnyTimes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the project is invalid (i.e. using invalidProjectName), this should mock return error, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no that isnt the point of that test. The reason I had to throw that in there was because of another test failing for other reasons. We could change it but I didnt want to change what that test was running.
tthvo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
This validation should probably be part of the platform provisioning check rather than the installconfig asset validation. If it’s part of the asset check that requires the DNS zone to exist when validating the installconfig, which may be overly strict for UPI. With UPI you could create the domain after the installconfig. |
|
/verified-by |
|
/label qe-approved |
|
@barbacbd: This pull request references Jira Issue OCPBUGS-59430, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified by jiwei |
|
@jianli-wei: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
patrickdillon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation should probably be part of the platform provisioning check rather than the installconfig asset validation. If it’s part of the asset check that requires the DNS zone to exist when validating the installconfig, which may be overly strict for UPI. With UPI you could create the domain after the installconfig.
No, nevermind, I was wrong. To correctly generate the DNS manifests we need to make API calls, so even for UPI the public basedomain needs to exist when create manifests is called. Adding the validation as is done here is the correct approach.
Just one small change requested and this looks good to me.
…alid installconfig/gcp: ** Add a check during Validate() for the base domain of the public zone. ** Validation tests updated.
|
With the latest, the NotFound Error check will make sense. The following example output was observed when the base domain was purposefully bad in the install-config. |
|
@barbacbd: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: patrickdillon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
tthvo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
I can reproduce the outcome too :D
ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: baseDomain: Not found: "Public DNS Zone (openshift-dev-installer/idk.this.domain)"
fc9c0c6
into
openshift:main
|
@barbacbd: Jira Issue OCPBUGS-59430: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-59430 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[ART PR BUILD NOTIFIER] Distgit: ose-installer |
|
[ART PR BUILD NOTIFIER] Distgit: ose-baremetal-installer |
|
[ART PR BUILD NOTIFIER] Distgit: ose-installer-artifacts |
installconfig/gcp:
** Add a check during Validate() for the base domain of the public zone.
** Validation tests updated.