Skip to content

Conversation

@hardys
Copy link
Contributor

@hardys hardys commented Dec 8, 2021

Since #719 None platform is broken, this aims to restore the functionality added in #630

WIP pending further testing

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 8, 2021
@openshift-ci openshift-ci bot requested review from enxebre and sjenning December 8, 2021 11:58
Namespace: hcp.Namespace,
Name: hcp.Name,
},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre @csrwng I wonder if this implies another Platform interface method to enable platform implementations to build the CAPI Cluster like we do with the CAPI Infrastructure (ReconcileCAPIInfraCR())?

https://github.com/openshift/hypershift/blob/main/hypershift-operator/controllers/hostedcluster/internal/platform/platform.go#L25

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another fix for this is to make ReconcileCAPIInfraCR return an object instead of nil, so the InfrastructureRef can be populated, but I'm not sure if that's a valid approach (since in the None case, we're not creating any Infrastructure, we'd end up returning and presumably not persisting a generic Cluster object?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related prior discussion along these lines: #719 (comment)

If some platforms don't even imply a CAPI Infra or Cluster resource at all, that to me seems like further justification for a contract more like:

type Platform interface {
  DesiredCAPIInfrastructure(...) (client.Object, error)
  DesiredCAPICluster(capiInfrastructure client.Object...) (client.Object, error)
}

Where the call side (hostedcluster controller) handles persistence (if the impl returns anything to persist), and where the implementation may return nil for either if they don't require those resources.

Copy link
Member

@enxebre enxebre Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no infraCR it means there's no CAPI support. So please just return early at the top of the func:

if infraCR != nil {
 return nil
}

Even better only run this chunk

	// Reconcile the CAPI Cluster resource
	capiCluster := controlplaneoperator.CAPICluster(controlPlaneNamespace.Name, hcluster.Spec.InfraID)
	_, err = createOrUpdate(ctx, r.Client, capiCluster, func() error {
		return reconcileCAPICluster(capiCluster, hcluster, hcp, infraCR)
	})

if infra != nil

No need to create the cluster CR at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no infraCR it means there's no CAPI support. So please just return early at the top of the func:

if infraCR != nil {
 return nil
}

Even better only run this chunk

	// Reconcile the CAPI Cluster resource
	capiCluster := controlplaneoperator.CAPICluster(controlPlaneNamespace.Name, hcluster.Spec.InfraID)
	_, err = createOrUpdate(ctx, r.Client, capiCluster, func() error {
		return reconcileCAPICluster(capiCluster, hcluster, hcp, infraCR)
	})

if infra != nil

No need to create the cluster CR at all.

This makes me wonder about the Platform.ReconcileCAPIInfraCR interface method. Here are its docs:

// ReconcileCAPIInfraCR is called during HostedCluster reconciliation prior to reconciling the CAPI Cluster CR.
// Implementations should use the given input and client to create and update the desired state of the
// platform infrastructure CAPI CR, which will then be referenced by the CAPI Cluster CR.
// TODO (alberto): Pass createOrUpdate construct instead of client.
ReconcileCAPIInfraCR(ctx context.Context, c client.Client, createOrUpdate upsert.CreateOrUpdateFN,

Sounds like we're saying if the platform doesn't support CAPI then ReconcileCAPIInfraCR should return nil. If that's the case, we should document it (and document the follow on implications — does it mean this controller won't call CAPIProviderDeploymentSpec?). Seems like there should be a way to make this more clear. For example, what if CAPI was split out into a different interface, like:

type CAPIPlatform interface {
  ReconcileCAPIInfraCR()
  CAPIProviderDeploymentSpec()
}

and in the controller, only call those methods at the appropriate time if the platform implements CAPI support. Then ReconcileCAPIInfraCR() returning nil would be considered an error.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh that all makes total sense to me as a consolidation step. This is intentionally loose atm so we can identify unknowns, platform peculiarities and better boundaries while introducing #760 and having empiric feedback.
I'd suggest we iterate to tie the contract farther and go more rigid with something like what you are suggesting after merging this PR and #760.

infra.Status.PlatformStatus = &configv1.PlatformStatus{}
infra.Status.PlatformStatus.Type = configv1.IBMCloudPlatformType
case hyperv1.NonePlatform:
infra.Status.PlatformStatus = &configv1.PlatformStatus{}
Copy link
Member

@enxebre enxebre Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just set
infra.Status.PlatformStatus.Type = configv1.PlatformType(hcp.Spec.Platform.Type)
for any case, no need to discriminate by platform for that. Then for case hyperv1.AWSPlatform: keep setting the infra.Status.PlatformStatus.AWS specifics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack thanks, I see that's already the default so I'll remove setting the Type

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can drop ibm/none specifics and just do:

infra.Status.PlatformStatus = &configv1.PlatformStatus{}
infra.Status.Platform = configv1.PlatformType(hcp.Spec.Platform.Type)

switch hcp.Spec.Platform.Type {
	case hyperv1.AWSPlatform:
		infra.Spec.PlatformSpec.AWS = &configv1.AWSPlatformSpec{}
		infra.Status.PlatformStatus.Type = configv1.AWSPlatformType
		infra.Status.PlatformStatus.AWS = &configv1.AWSPlatformStatus{
			Region: hcp.Spec.Platform.AWS.Region,
		}
		tags := []configv1.AWSResourceTag{}
		for _, tag := range hcp.Spec.Platform.AWS.ResourceTags {
			tags = append(tags, configv1.AWSResourceTag{
				Key:   tag.Key,
				Value: tag.Value,
			})
		}
		infra.Status.PlatformStatus.AWS.ResourceTags = tags

I'll do that in a follow up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steven Hardy added 3 commits December 9, 2021 17:30
This was missed in openshift#719 which breaks the None platform support
added in openshift#630
Since openshift#728 and openshift#719 the None platform support added via openshift#630
is broken, since there is no infraCR for None platform
@hardys
Copy link
Contributor Author

hardys commented Dec 9, 2021

Ok I addressed the feedback, thanks for the reviews!

Leaving as WIP since this still doesn't result in a fully working None HostedCluster in my environment - feel free to remove the hold if you want to land this part of the fix and we can work on the remainder as a follow-up.

@enxebre
Copy link
Member

enxebre commented Dec 10, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 10, 2021
@nirarg nirarg mentioned this pull request Dec 13, 2021
@enxebre
Copy link
Member

enxebre commented Dec 13, 2021

/approve

Leaving as WIP since this still doesn't result in a fully working None HostedCluster in my environment - feel free to remove the hold if you want to land this part of the fix and we can work on the remainder as a follow-up.

Thanks @hardys Feel free to unhold/continue in this PR and proceed as you feel more confortable with.

Actually can you please rephrase the third commit to be more specific? it reads a bit ambiguous atm.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 13, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre, hardys

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 13, 2021
@rmohr
Copy link
Contributor

rmohr commented Dec 13, 2021

@hardys can the wip label be removed?

@enxebre enxebre changed the title [WIP] Fix issues with None platform Fix issues with None platform Dec 13, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 13, 2021
@enxebre
Copy link
Member

enxebre commented Dec 13, 2021

/hold cancel

@hardys
Copy link
Contributor Author

hardys commented Dec 13, 2021

@hardys can the wip label be removed?

Yes, although as mentioned this doesn't result in a fully working None platform deployment yet (in my test environment at least), so there may be follow-up additional fixes (help welcome finding those, I've not had time to fully debug yet)

@rmohr
Copy link
Contributor

rmohr commented Dec 13, 2021

Yes, although as mentioned this doesn't result in a fully working None platform deployment yet (in my test environment at least), so there may be follow-up additional fixes (help welcome finding those, I've not had time to fully debug yet)

Have it. That should fix it:

diff --git a/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go b/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
index 631056f7..15e46524 100644
--- a/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
+++ b/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
@@ -502,17 +502,22 @@ func (r *HostedControlPlaneReconciler) update(ctx context.Context, hostedControl
                }
        }
 
-       // If the cluster is marked paused, don't do any reconciliation work at all.
-       if cluster, err := util.GetOwnerCluster(ctx, r.Client, hostedControlPlane.ObjectMeta); err != nil {
-               return fmt.Errorf("failed to get owner cluster: %w", err)
-       } else {
-               if cluster == nil {
-                       r.Log.Info("Cluster Controller has not yet set OwnerRef")
-                       return nil
-               }
-               if annotations.IsPaused(cluster, hostedControlPlane) {
-                       r.Log.Info("HostedControlPlane or linked Cluster is marked as paused. Won't reconcile")
-                       return nil
+       switch hostedControlPlane.Spec.Platform.Type {
+       case hyperv1.NonePlatform:
+               // No CAPI cluster exists, checking for its existence is hopeless.
+       default:
+               // If the cluster is marked paused, don't do any reconciliation work at all.
+               if cluster, err := util.GetOwnerCluster(ctx, r.Client, hostedControlPlane.ObjectMeta); err != nil {
+                       return fmt.Errorf("failed to get owner cluster: %w", err)
+               } else {
+                       if cluster == nil {
+                               r.Log.Info("Cluster Controller has not yet set OwnerRef")
+                               return nil
+                       }
+                       if annotations.IsPaused(cluster, hostedControlPlane) {
+                               r.Log.Info("HostedControlPlane or linked Cluster is marked as paused. Won't reconcile")
+                               return nil
+                       }
                }
        }

The control plane comes up again.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 14, 2021

@hardys: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants