Skip to content

Conversation

@dgoodwin
Copy link
Contributor

A number of invasive changes required here not limited to:

  • Reconcile func now requires a context
  • Broad use of client.Object rather than runtime.Object
  • New isUnstructured boolean arg for RESTClientForGVK
  • Events now require Object not an ObjectKey (namespace/name)
  • Changes to EnqueueRequestsFromMapFunc

The highest risk areas here are:

  1. dnszone and dnsendpoint controllers. We have a GenericEvent being emitted used to trigger reconciles which switched from an object key to full object. Plumbing this through was tricky and wreaked havoc on test assertions.
  2. Use of context instead of stop functions. (not an area I'm super familiar with)

x-ref: https://issues.redhat.com/browse/HIVE-1452

/assign @abhinavdahiya @joelddiaz

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 26, 2021
@dgoodwin
Copy link
Contributor Author

Tests are passing and verify is clear, I am doing manual testing of a manageDNS cluster at the moment and it has made it past DNS and launched the install, so looking good so far.

@dgoodwin
Copy link
Contributor Author

/test e2e-azure
/test e2e-gcp

apis/go.mod Outdated
k8s.io/api v0.20.0
k8s.io/apimachinery v0.20.0
sigs.k8s.io/controller-runtime v0.6.2
sigs.k8s.io/controller-runtime v0.7.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we bump this to 0.7.3 as the PR title suggests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. or maybe rename the PR title as we are bumping to 0.7.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup my mistake, wanted to stick with 0.7.0 to match agent controllers, but this is going to be unsustainable if we have to stay in lockstep. Maybe we can get their controllers into Hive.

Comment on lines 18 to 27
handler.EnqueueRequestsFromMapFunc(
func(a client.Object) []reconcile.Request {
cpKey := clusterPoolKey(a.(*hivev1.ClusterDeployment))
if cpKey == nil {
return nil
}
return []reconcile.Request{{NamespacedName: *cpKey}}
},
),
r,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgoodwin
Copy link
Contributor Author

Updated per review. Thanks!

@dgoodwin
Copy link
Contributor Author

/test e2e-azure
/test e2e-gcp

Copy link
Contributor

@joelddiaz joelddiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2021
@dgoodwin
Copy link
Contributor Author

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2021
@dgoodwin
Copy link
Contributor Author

dgoodwin commented Mar 31, 2021

I didn't see changes to cover these changes in https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.7.0 @dgoodwin

Change default webhook port to 9443 (https://github.com/kubernetes-sigs/controller-runtime/pull/1076)

Impact: update your deployment configuration to use port 9443, or manually
configure the webhook port in manager Options.

I think we have webhooks and making sure the correct port was configured.

Looks like this is ok, we specify --secure-port 9443 in our Deployment and the Service forwards 443 to 9443. I can see the admission hook getting called if I try to make a bad change.

Remove logs from internal controller (https://github.com/kubernetes-sigs/controller-runtime/pull/1096)

Impact: automatic logs for "succesful reconcile" no longer exist
(it tended to duplicate manual logs that folks had in place).

dropping the logs means that we won't get the logs from errors that aren't explicity logged. we tried to capture those here #1263 but i think moving to this version will again drop those error logs.

Reading the discussion in there, I don't think there's any action for us to take. We don't want to log at V(5), we should be logging our own errors, and it sounds like that is the direction controller-runtime is going. Feels like we'll have to handle any situations where we missed logging as they arise.

EDIT: actually may have been restored: kubernetes-sigs/controller-runtime#1245

pkg/webhook/admission: upgrade v1beta1 admission types to v1 (https://github.com/kubernetes-sigs/controller-runtime/pull/1284)

Impact: construct v1 Go types instead of v1beta1. Both versions are
supported on the wire.

Does this affect Hive running on 3.11 cluster as maybe that version is not supported there??

@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented Mar 31, 2021

I didn't see changes to cover these changes in https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.7.0 @dgoodwin

Change default webhook port to 9443 (https://github.com/kubernetes-sigs/controller-runtime/pull/1076)

Impact: update your deployment configuration to use port 9443, or manually
configure the webhook port in manager Options.

I think we have webhooks and making sure the correct port was configured.

Looks like this is ok, we specify --secure-port 9443 in our Deployment and the Service forwards 443 to 9443. I can see the admission hook getting called if I try to make a bad change.

Remove logs from internal controller (https://github.com/kubernetes-sigs/controller-runtime/pull/1096)

Impact: automatic logs for "succesful reconcile" no longer exist
(it tended to duplicate manual logs that folks had in place).

dropping the logs means that we won't get the logs from errors that aren't explicity logged. we tried to capture those here #1263 but i think moving to this version will again drop those error logs.

Reading the discussion in there, I don't think there's any action for us to take. We don't want to log at V(5), we should be logging our own errors,

If controller-runtime return from Reconcile will not log the errors, we should log any error returned by Reconcile... that does not happen today. And the alternative that you have proposed of logging before returning error when we see a bug/requirement is not sustainable.

logging the error returned by Reconcile is important info that should not be left to each return. If the controller-runtime cannot be our backstop, we should enforce all Reconcile functions look like

func (r *Reconciler) Reconcile(_, _) (reconcile.Result, error) {
    // setup logger

    r, err := r.reconcile(_, _, logger)
    if err != nil {
        logger.WithError(err).Error("failed to reconcile")
    }
    return r, err
}

and it sounds like that is the direction controller-runtime is going. Feels like we'll have to handle any situations where we missed logging as they arise.

EDIT: actually may have been restored: kubernetes-sigs/controller-runtime#1245

pkg/webhook/admission: upgrade v1beta1 admission types to v1 (https://github.com/kubernetes-sigs/controller-runtime/pull/1284)

Impact: construct v1 Go types instead of v1beta1. Both versions are
supported on the wire.

Does this affect Hive running on 3.11 cluster as maybe that version is not supported there??

@dgoodwin
Copy link
Contributor Author

I have confirmed that we still get reconciler errors logged by controller-runtime by modifying clusterdeployment_controller to just immediately return an error:

time="2021-03-31T17:00:40.379Z" level=error msg="Reconciler error" _name=clusterdeployment-controller error="new error!!!!!!!!" name=dgoodwin-gcp namespace=hive

Which looks like Alberto's commit restored what we were counting on.

Per slack I'm going to remove the hold, let this roll to stage/int so we can test it on one of the v3 clusters and make sure admission webhooks are working properly.

@dgoodwin
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2021
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@suhanime
Copy link
Contributor

/hold for rebase

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2021
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 1, 2021
@dgoodwin
Copy link
Contributor Author

dgoodwin commented Apr 1, 2021

/hold cancel

Updated with a merge.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 1, 2021
Copy link
Contributor

@joelddiaz joelddiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 1, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, joelddiaz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 7a0d639 into openshift:master Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants