WRKLDS-728: controller/apiservice: allow to disable/delete an API service resources#1534
Conversation
|
@ingvagabund: This pull request references WRKLDS-728 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
| // GetAPIServicesToMangeFunc provides a full list of managed APIService items. | ||
| // The list needs to always contain all the managed APIServices so only managed | ||
| // objects are removed without removing user-created/unmanaged objects. | ||
| type GetAPIServicesToMangeFunc func() ([]ManagedAPIService, error) |
There was a problem hiding this comment.
why this function couldn't return services that are enabled :) ?
There was a problem hiding this comment.
also this is not a backward-compatible change which will break users of this function
There was a problem hiding this comment.
how will you know if an api service is not enabled ?
There was a problem hiding this comment.
why this function couldn't return services that are enabled :) ?
The controller needs a list of APIServices that are disabled. So they can be deleted if present.
how will you know if an api service is not enabled ?
Configurable by anyone who consumes the APIService controller. Atm, the OA operator constructs the list based on enabled capabilities.
|
also the title of this pr is a bit misleading for me, are we actually going to delete something? |
| curr := coordinate{namespace: apiService.Spec.Service.Namespace, name: apiService.Spec.Service.Name} | ||
| for _, managedAPIService := range apiServices { | ||
| if !managedAPIService.Enabled { | ||
| continue |
There was a problem hiding this comment.
since we are skipping disabled services I would simply return only enabled services in GetAPIServicesToMangeFunc
There was a problem hiding this comment.
Both newEndpointPrecondition and checkDiscoveryForByAPIServices can be probably given only a list of enabled services.
There was a problem hiding this comment.
yes, I would prefer that :)
| for _, managedApiService := range apiServices { | ||
| // Remove disabled APIService | ||
| if !managedApiService.Enabled { | ||
| if err := c.apiregistrationv1Client.APIServices().Delete(ctx, managedApiService.APIService.Name, metav1.DeleteOptions{}); err != nil && !apierrors.IsNotFound(err) { |
There was a problem hiding this comment.
ah, sorry, i was reading top-to-bottom, i think this is the place that will remove an api service
| // GetAPIServicesToMangeFunc provides a full list of managed APIService items. | ||
| // The list needs to always contain all the managed APIServices so only managed | ||
| // objects are removed without removing user-created/unmanaged objects. | ||
| type GetAPIServicesToMangeFunc func() ([]ManagedAPIService, error) |
There was a problem hiding this comment.
have you considered adding a new function which will return services to be removed ?
There was a problem hiding this comment.
or changing the signature of this method to return enabled []*apiregistrationv1.APIService, disabled []*apiregistrationv1.APIService, error)
There was a problem hiding this comment.
That would be an alternative. In any case, still a breaking change.
| type ManagedAPIService struct { | ||
| // Enabled signals whether an APIService is enabled. | ||
| // Any disabled APIService is removed from a cluster. | ||
| Enabled bool |
There was a problem hiding this comment.
will this field be used by all clients or only for some?
There was a problem hiding this comment.
Once such struct is created, it's intended to be used only by the controller.
There was a problem hiding this comment.
What I wanted to ask was if all operands will be required to provide this information.
In general my preference would be to check how many operators consume this function.
If the number is low then I would change the signature of the method to enabled []*apiregistrationv1.APIService, disabled []*apiregistrationv1.APIService, error).
There was a problem hiding this comment.
Based on https://github.com/search?q=org%3Aopenshift%20NewAPIServiceController&type=code NewAPIServiceController is used only once.
Based on https://github.com/search?q=org%3Aopenshift%20WithAPIServiceController&type=code WithAPIServiceController is used only by openshift-apiserver and cluster authentication operators.
There was a problem hiding this comment.
What I wanted to ask was if all operands will be required to provide this information.
Yeah. They have to. https://github.com/openshift/cluster-authentication-operator/blob/1b2c022034879e09e324bbdc50b84b78ce8de4ee/pkg/operator/starter.go#L793-L798 builds the list explicitly as well.
There was a problem hiding this comment.
An alternative would be to provide a filter function that if not provide will include every api service. On the other hand such a filter will get invoked all over again instead of constructing a list of enabled/disabled service only once at the beginning. The list of enabled/disabled api services is expected to be static or changed very rarely.
There was a problem hiding this comment.
OK, since there are only two operators using this controller I would:
- Change the signature of the method to
GetAPIServicesToMangeFunc func() (enabled []*apiregistrationv1.APIService, disabled []*apiregistrationv1.APIService, error) - Rename
preconditiontopreconditionForEnabledAPIServices - Split the
syncmethod into two flows, one for reconcilingenabledservices and the second for reconcilingdisabledservices.
Does it make sense ?
There was a problem hiding this comment.
Done. It looks better now.
There was a problem hiding this comment.
Thanks, I have added a few more comments/questions, PTAL.
faf15e3 to
3ca87e7
Compare
| } | ||
|
|
||
| err = c.syncAPIServices(ctx, apiServices, syncCtx.Recorder()) | ||
| err = c.syncAPIServices(ctx, enabledApiServices, disabledApiServices, syncCtx.Recorder()) |
There was a problem hiding this comment.
How about adding a separate method for reconciling disalbedApiServices and calling it before preconditionForEnabledAPIServices (line 101)?
Having two separate methods would allow us to reconcile enabledApiServices and disabledApiServices independently. I think we always want to reconcile both and aggregate the errors. What do you think ?
|
|
||
| for _, apiService := range apiServices { | ||
| for _, apiService := range disabledApiServices { | ||
| if err := c.apiregistrationv1Client.APIServices().Delete(ctx, apiService.Name, metav1.DeleteOptions{}); err != nil && !apierrors.IsNotFound(err) { |
There was a problem hiding this comment.
I think we should get an APIService from the cache first as that would allow us to put less pressure on the server.
We should issue a live request only when we know the resource must be deleted.
What do you think?
| type GetAPIServicesToMangeFunc func() ([]*apiregistrationv1.APIService, error) | ||
| // GetAPIServicesToMangeFunc provides a two list of managed APIService items. | ||
| // Both lists need to always contain all the managed APIServices so only managed | ||
| // objects are removed without removing user-created/unmanaged objects. |
There was a problem hiding this comment.
what do you mean by without removing user-created/unmanaged objects ?
There was a problem hiding this comment.
Comment updated. The controller needs to avoid deleting APIServices that are not part of the OpenShift API. I.e. the controller needs to explicitly know which APIServices are disabled instead of deleting all not enabled APIServices.
|
/retest-required |
| operatorClient v1helpers.OperatorClient | ||
| kubeClient kubernetes.Interface | ||
| apiregistrationv1Client apiregistrationv1client.ApiregistrationV1Interface | ||
| apiservicelister apiregistrationv1lister.APIServiceLister |
There was a problem hiding this comment.
isn't that strange that this controller didn't use apiservicelister ?
There was a problem hiding this comment.
Probably not taken into account when #667 was reviewed.
| for _, apiService := range apiServices { | ||
| if err := c.apiregistrationv1Client.APIServices().Delete(ctx, apiService.Name, metav1.DeleteOptions{}); err != nil { | ||
| for _, apiService := range append(enabledApiServices, disabledApiServices...) { | ||
| if _, err := c.apiservicelister.Get(apiService.Name); err == nil { |
There was a problem hiding this comment.
how about simply calling syncDisabledAPIServices with combined enabledApiServices, disabledApiServices ?
| var availableConditionMessages []string | ||
|
|
||
| for _, apiService := range apiServices { | ||
| if _, err := c.apiservicelister.Get(apiService.Name); err == nil { |
There was a problem hiding this comment.
should we also consider a case in which a service has been already marked for deletion but hasn't been deleted?
(I think that would require checking the DeletionTimestamp field)
There was a problem hiding this comment.
That should be idempotent. Deleting an object that's in the process of deletion should be a no-op. While Get should still return the object without an error.
There was a problem hiding this comment.
It is idempotent but we will hammer the API unnecessary. Does it make sense?
There was a problem hiding this comment.
I agree this will reduce the toll. On the other hand I wonder if checking the DeletionTimestamp is something we already do in other controllers or whether this is a novel improvement. If the latter, I prefer to refactor the code more and implement generic methods like DeleteOnlyIfPresent. Probably automatically generated for all our APIs. So we don't re-invent the check all over the place. Nevertheless, as part of another PR. What do you think?
There was a problem hiding this comment.
I don't think it is novel approach. For example, it is being used in:
I can try to find more examples if you want.
There was a problem hiding this comment.
I was wondering if we have an example in our operators. Nevertheless, checking for the DeletionTimestamp is easy to implement. Will do.
|
|
||
| disErr := c.syncDisabledAPIServices(ctx, disabledApiServices) | ||
|
|
||
| ready, err := c.preconditionForEnabledAPIServices(enabledApiServices) |
There was a problem hiding this comment.
what if both syncDisabledAPIServices and preconditionForEnabledAPIServices return some error?
do we want to report errors from syncDisabledAPIServices ?
i think the errors will be ignored (not reported) as it is right now.
There was a problem hiding this comment.
If syncDisabledAPIServices returns an error, there are two cases that can happen:
preconditionForEnabledAPIServicesreturnserr != nilor!readyin which case theAPIServicesAvailablecondition gets reported as false. Making error returned bysyncDisabledAPIServiceskinda obsolete since theAPIServicesAvailableis already false.preconditionForEnabledAPIServicesreturnserr == nilandreadyin which case the error returned bysyncDisabledAPIServicesgets checked. ProducingAPIServicesAvailablecondition set to false. Just with a different error string.
There was a problem hiding this comment.
Does setting APIServicesAvailable when syncDisabledAPIServices returns some errors make sense ?
If all services were available and we failed to remove one service would you expect to find the error in APIServicesAvailable ? Maybe we should report it as a separate condition? (Especially that we already made distinction between enabled/disabled on the API level)
There was a problem hiding this comment.
The current implementation does not make the reported condition worse. Right now it will get reported with Reason: Error. Whether that happens before or after preconditionForEnabledAPIServices makes no difference. Failing to remove or create an APIservice object falls under the same category of an error. If we decide to distinguish between error related to removing and creating an APIService, we should probably introduce a new condition for both Failed to create and Failed to check discovery as well (implemented under syncEnabledAPIServices).
There was a problem hiding this comment.
I wonder whether there's a value in reporting non-nil error for return c.syncDisabledAPIServices(ctx, append(enabledApiServices, disabledApiServices...)) through a condition under:
switch operatorConfigSpec.ManagementState {
...
case operatorsv1.Removed:
enabledApiServices, disabledApiServices, err := c.getAPIServicesToManageFn()
if err != nil {
return err
}
return c.syncDisabledAPIServices(ctx, append(enabledApiServices, disabledApiServices...))There was a problem hiding this comment.
I'm not sure if we want to go Degraded. I think that:
Type: "APIServicesDeleted",
Status: operatorv1.ConditionFalse,
Reason: "DisabledAPIServicesPresent",
Message: err.Error(),
should be fine.
There was a problem hiding this comment.
Also, We could split the APIServiceController.sync method into two flows for simplicity. I think I would do:
func (c *APIServiceController) sync(ctx context.Context, syncCtx factory.SyncContext) error {
operatorConfigSpec, _, _, err := c.operatorClient.GetOperatorState()
if err != nil {
return err
}
switch operatorConfigSpec.ManagementState {
case operatorsv1.Managed:
case operatorsv1.Unmanaged:
return nil
case operatorsv1.Removed:
enabledApiServices, disabledApiServices, err := c.getAPIServicesToManageFn()
if err != nil {
return err
}
return c.syncDisabledAPIServices(ctx, append(enabledApiServices, disabledApiServices...))
default:
syncCtx.Recorder().Warningf("ManagementStateUnknown", "Unrecognized operator management state %q", operatorConfigSpec.ManagementState)
return nil
}
var errors []err
// or if we rename syncDisabledAPIServices to something else then
// we could call it syncDisabledAPIServices (which would also set the new status)
err = c.reconcileDisabledServices()
if err != nil { // append to errors }
err = c.syncEnabledServices()
if err != nil { // append to errors }
return errors
}
There was a problem hiding this comment.
Would syncEnabledServices include calling the preconditionForEnabledAPIServices as well?
There was a problem hiding this comment.
as an optimisation step we could collect conditions from the individual sync* methods and update the config just once :)
|
/label tide/merge-method-squash |
8d1b529 to
fcd73e9
Compare
|
/unlabel tide/merge-method-squash |
|
/remove-label tide/merge-method-squash |
|
/shrug |
05aff02 to
ea20618
Compare
ea20618 to
8a3367e
Compare
| for _, apiService := range apiServices { | ||
| if apiServiceObj, err := c.apiservicelister.Get(apiService.Name); err == nil { | ||
| if apiServiceObj.DeletionTimestamp != nil { | ||
| continue |
There was a problem hiding this comment.
Technically objects can stuck in deletion (finalisers) for a long time.
Do we want to handle this case somehow (log, set something on the condition) ?
There was a problem hiding this comment.
syncEnabledAPIServices returns an error when the corresponding api service Available condition is not ready. Updated the code to return error when a corresponding service is not deleted yet.
There was a problem hiding this comment.
update: we decided to log instead of returning the error
| Message: err.Error(), | ||
| })); updateErr != nil { | ||
| return errors.NewAggregate([]error{err, updateErr}) | ||
| }), v1helpers.UpdateConditionFn(conditionAPIServicesDegraded)); updateErr != nil { |
There was a problem hiding this comment.
I don't want to complicate the code but it looks like it might be easy to forget which conditions should be updated. I like the way it was done in the workload controller (
). Maybe we could do something similar here?I think it boils down to:
- define conditions with default values
- register an update function for conditions in the defer statement
- when you need to end processing, just set the condition and return (it will be updated in the defer statement).
There was a problem hiding this comment.
Done. I did not find any use for updateGenerationFn though.
8a3367e to
ac3fc9b
Compare
| Status: operatorv1.ConditionTrue, | ||
| } | ||
|
|
||
| if syncEnabledAPIServicesErr != nil { |
There was a problem hiding this comment.
I think we should check if for any error or always for simplicity, wdyt?
| // the next process will perform the checks immediately after the startup | ||
| select { | ||
| case <-ctx.Done(): | ||
| nerr := fmt.Errorf("the operator is shutting down, skipping updating availability of the aggreaged APIs, err = %v", syncEnabledAPIServicesErr) |
There was a problem hiding this comment.
just adjust comment so that it is more generic
ac3fc9b to
75c7d51
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ingvagabund, p0lyn0mial The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@ingvagabund: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/hold |
|
just one more thing, it looks like we are going to always set the new condition, even if the |
it seems that setting the new condition conditionally would make the code more complicated. Besides on clusters with this feature disabled it will always be set to /hold cancel |
SSIA
Needed by openshift/cluster-openshift-apiserver-operator#532