-
Notifications
You must be signed in to change notification settings - Fork 438
Fix issues with None platform #764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Namespace: hcp.Namespace, | ||
| Name: hcp.Name, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another fix for this is to make ReconcileCAPIInfraCR return an object instead of nil, so the InfrastructureRef can be populated, but I'm not sure if that's a valid approach (since in the None case, we're not creating any Infrastructure, we'd end up returning and presumably not persisting a generic Cluster object?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related prior discussion along these lines: #719 (comment)
If some platforms don't even imply a CAPI Infra or Cluster resource at all, that to me seems like further justification for a contract more like:
type Platform interface {
DesiredCAPIInfrastructure(...) (client.Object, error)
DesiredCAPICluster(capiInfrastructure client.Object...) (client.Object, error)
}Where the call side (hostedcluster controller) handles persistence (if the impl returns anything to persist), and where the implementation may return nil for either if they don't require those resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's no infraCR it means there's no CAPI support. So please just return early at the top of the func:
if infraCR != nil {
return nil
}
Even better only run this chunk
// Reconcile the CAPI Cluster resource
capiCluster := controlplaneoperator.CAPICluster(controlPlaneNamespace.Name, hcluster.Spec.InfraID)
_, err = createOrUpdate(ctx, r.Client, capiCluster, func() error {
return reconcileCAPICluster(capiCluster, hcluster, hcp, infraCR)
})
if infra != nil
No need to create the cluster CR at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's no infraCR it means there's no CAPI support. So please just return early at the top of the func:
if infraCR != nil { return nil }Even better only run this chunk
// Reconcile the CAPI Cluster resource capiCluster := controlplaneoperator.CAPICluster(controlPlaneNamespace.Name, hcluster.Spec.InfraID) _, err = createOrUpdate(ctx, r.Client, capiCluster, func() error { return reconcileCAPICluster(capiCluster, hcluster, hcp, infraCR) })
if infra != nilNo need to create the cluster CR at all.
This makes me wonder about the Platform.ReconcileCAPIInfraCR interface method. Here are its docs:
// ReconcileCAPIInfraCR is called during HostedCluster reconciliation prior to reconciling the CAPI Cluster CR.
// Implementations should use the given input and client to create and update the desired state of the
// platform infrastructure CAPI CR, which will then be referenced by the CAPI Cluster CR.
// TODO (alberto): Pass createOrUpdate construct instead of client.
ReconcileCAPIInfraCR(ctx context.Context, c client.Client, createOrUpdate upsert.CreateOrUpdateFN,Sounds like we're saying if the platform doesn't support CAPI then ReconcileCAPIInfraCR should return nil. If that's the case, we should document it (and document the follow on implications — does it mean this controller won't call CAPIProviderDeploymentSpec?). Seems like there should be a way to make this more clear. For example, what if CAPI was split out into a different interface, like:
type CAPIPlatform interface {
ReconcileCAPIInfraCR()
CAPIProviderDeploymentSpec()
}and in the controller, only call those methods at the appropriate time if the platform implements CAPI support. Then ReconcileCAPIInfraCR() returning nil would be considered an error.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh that all makes total sense to me as a consolidation step. This is intentionally loose atm so we can identify unknowns, platform peculiarities and better boundaries while introducing #760 and having empiric feedback.
I'd suggest we iterate to tie the contract farther and go more rigid with something like what you are suggesting after merging this PR and #760.
| infra.Status.PlatformStatus = &configv1.PlatformStatus{} | ||
| infra.Status.PlatformStatus.Type = configv1.IBMCloudPlatformType | ||
| case hyperv1.NonePlatform: | ||
| infra.Status.PlatformStatus = &configv1.PlatformStatus{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just set
infra.Status.PlatformStatus.Type = configv1.PlatformType(hcp.Spec.Platform.Type)
for any case, no need to discriminate by platform for that. Then for case hyperv1.AWSPlatform: keep setting the infra.Status.PlatformStatus.AWS specifics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack thanks, I see that's already the default so I'll remove setting the Type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can drop ibm/none specifics and just do:
infra.Status.PlatformStatus = &configv1.PlatformStatus{}
infra.Status.Platform = configv1.PlatformType(hcp.Spec.Platform.Type)
switch hcp.Spec.Platform.Type {
case hyperv1.AWSPlatform:
infra.Spec.PlatformSpec.AWS = &configv1.AWSPlatformSpec{}
infra.Status.PlatformStatus.Type = configv1.AWSPlatformType
infra.Status.PlatformStatus.AWS = &configv1.AWSPlatformStatus{
Region: hcp.Spec.Platform.AWS.Region,
}
tags := []configv1.AWSResourceTag{}
for _, tag := range hcp.Spec.Platform.AWS.ResourceTags {
tags = append(tags, configv1.AWSResourceTag{
Key: tag.Key,
Value: tag.Value,
})
}
infra.Status.PlatformStatus.AWS.ResourceTags = tags
I'll do that in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missed in openshift#719 which breaks the None platform support added in openshift#630
Since openshift#728 and openshift#719 the None platform support added via openshift#630 is broken, since there is no infraCR for None platform
|
Ok I addressed the feedback, thanks for the reviews! Leaving as WIP since this still doesn't result in a fully working None HostedCluster in my environment - feel free to remove the hold if you want to land this part of the fix and we can work on the remainder as a follow-up. |
|
/lgtm |
|
/approve
Thanks @hardys Feel free to unhold/continue in this PR and proceed as you feel more confortable with. Actually can you please rephrase the third commit to be more specific? it reads a bit ambiguous atm. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre, hardys The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@hardys can the wip label be removed? |
|
/hold cancel |
Yes, although as mentioned this doesn't result in a fully working None platform deployment yet (in my test environment at least), so there may be follow-up additional fixes (help welcome finding those, I've not had time to fully debug yet) |
Have it. That should fix it: diff --git a/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go b/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
index 631056f7..15e46524 100644
--- a/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
+++ b/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
@@ -502,17 +502,22 @@ func (r *HostedControlPlaneReconciler) update(ctx context.Context, hostedControl
}
}
- // If the cluster is marked paused, don't do any reconciliation work at all.
- if cluster, err := util.GetOwnerCluster(ctx, r.Client, hostedControlPlane.ObjectMeta); err != nil {
- return fmt.Errorf("failed to get owner cluster: %w", err)
- } else {
- if cluster == nil {
- r.Log.Info("Cluster Controller has not yet set OwnerRef")
- return nil
- }
- if annotations.IsPaused(cluster, hostedControlPlane) {
- r.Log.Info("HostedControlPlane or linked Cluster is marked as paused. Won't reconcile")
- return nil
+ switch hostedControlPlane.Spec.Platform.Type {
+ case hyperv1.NonePlatform:
+ // No CAPI cluster exists, checking for its existence is hopeless.
+ default:
+ // If the cluster is marked paused, don't do any reconciliation work at all.
+ if cluster, err := util.GetOwnerCluster(ctx, r.Client, hostedControlPlane.ObjectMeta); err != nil {
+ return fmt.Errorf("failed to get owner cluster: %w", err)
+ } else {
+ if cluster == nil {
+ r.Log.Info("Cluster Controller has not yet set OwnerRef")
+ return nil
+ }
+ if annotations.IsPaused(cluster, hostedControlPlane) {
+ r.Log.Info("HostedControlPlane or linked Cluster is marked as paused. Won't reconcile")
+ return nil
+ }
}
}
The control plane comes up again. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
5 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@hardys: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Since #719 None platform is broken, this aims to restore the functionality added in #630
WIP pending further testing