Skip to content

Conversation

@Miciah
Copy link
Contributor

@Miciah Miciah commented Sep 27, 2018

Update the status conditions of ClusterIngress objects to indicate synchronization status and availability.

Watch Services in cluster-ingress-operator's namespace in order to reflect their status and any associated load-balancer ingresses to the corresponding ClusterIngress objects.

  • cmd/cluster-ingress-operator/main.go: Watch services and endpoints in the operator's namespace.
  • manifests/00-cluster-role.yaml: Allow operator to get services and endpoints.
  • pkg/apis/ingress/v1alpha1/status.go: Add new functions for setting status on ClusterIngress objects: setStatusCondition, SetStatusSyncCondition, SetStatusAvailableCondition, clusterIngressStatusEqual, UpdateStatus.
  • pkg/apis/ingress/v1alpha1/types.go: Add fields to ClusterIngressStatus for operator and load balancer status.
  • pkg/apis/ingress/v1alpha1/status_test.go: Add tests.
  • pkg/stub/handler.go: Use SetStatusSyncCondition, SetStatusAvailableCondition, and UpdateStatus in syncIngressUpdate.
    Add new functions: getClusterIngress, syncServiceUpdate, syncEndpointsUpdate, and syncServiceOrEndpointsUpdate.
    Use these new functions in Handle to handle Service and Endpoints events and to update the corresponding ClusterIngress status according to the Service and Endpoints status.

This PR depends on #47.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 27, 2018
@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 27, 2018
@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 27, 2018
@Miciah Miciah force-pushed the update-ClusterIngress-status branch from d1dd7ad to 55e50d0 Compare September 27, 2018 18:01
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 27, 2018
@Miciah Miciah force-pushed the update-ClusterIngress-status branch from 55e50d0 to b782675 Compare September 28, 2018 06:14
@Miciah
Copy link
Contributor Author

Miciah commented Sep 28, 2018

Rebased.

ci.SetStatusCondition(condition)
}

// clusterIngressStatusEqual returns true iff the status conditions (ignoring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "iff"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's short for "if and only if". I see it used a lot in included packages, but I can expand it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}

err := sdk.Update(ci)
for tries := 3; tries > 0; tries-- {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the loop needed?
the err variable is updated outside the loop or am i missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! err := sdk.Update(ci) needs to be inside the loop. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted the loop. Now UpdateStatus returns an error value, and callers return any error from UpdateStatus to the handler so that the event gets reprocessed.

@Miciah Miciah force-pushed the update-ClusterIngress-status branch 3 times, most recently from 693970d to f7c0d9f Compare October 2, 2018 14:49
@Miciah
Copy link
Contributor Author

Miciah commented Oct 2, 2018

Latest push deletes the retry logic in UpdateStatus; instead, UpdateStatus returns an error value, and its callers return any error from UpdateStatus to the handler so that the operator will retry processing the event.

@Miciah Miciah changed the title [WIP] Update ClusterIngress status and OperatorStatus Update ClusterIngress status and OperatorStatus Oct 2, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2018
@Miciah Miciah force-pushed the update-ClusterIngress-status branch 3 times, most recently from 30ce1e0 to 5c07898 Compare October 2, 2018 15:03
@imcsk8
Copy link
Contributor

imcsk8 commented Oct 2, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 2, 2018
@ironcladlou
Copy link
Contributor

I plan to review this Friday, I need the branch to be as stable as possible through Thursday. Thanks!

Copy link

@pravisankar pravisankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, rest of the changes lgtm.

if numConditions != len(oldCi.Status.Conditions) {
return false
}
for i := 0; i < numConditions; i++ {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on various circumstances, we could end up with:
Old ci conditions in order: { A, B, C }
New ci conditions in order: { B, C, A }
I think we should return true even if they are in out of order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

if numIngresses != len(oldCi.Status.LoadBalancer.Ingress) {
return false
}
for i := 0; i < numIngresses; i++ {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, check should be agnostic to ingress order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

operatorName, err := k8sutil.GetOperatorName()
if err != nil {
logrus.Fatalf("Failed to get operator name: %v", err)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8sutil.GetWatchNamespace(), k8sutil.GetOperatorName() can be initially cached in Handler struct and reused in every status update (Nice to do but i'm fine with this for now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the fields to Handler. Since this means CreateOrUpdateOperatorStatus needs the Handler, and since it didn't really need the ClusterIngress, I changed its receiver to h *Handler, moved it to handler.go, and changed it to createOrUpdateOperatorStatus.


return nil
case *corev1.Service:
isDeleted := o.GetDeletionTimestamp() != nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a case where event.Deleted = true and o.GetDeletionTimestamp() == nil?
or is it better to check isDeleted := (event.Deleted || (o.GetDeletionTimestamp() != nil))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that event.Deleted will go away that and o.GetDeletionTimestamp() will be the source of truth.

}

numIngresses := len(ci.Status.LoadBalancer.Ingress)
if numIngresses != len(oldCi.Status.LoadBalancer.Ingress) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perform load balancer checks only when ci.Spec.HighAvailability.Type == ingressv1alpha1.CloudClusterIngressHA ?
When ci.Spec.HighAvailability.Type != CloudClusterIngressHA , need to clear old Status.LoadBalancer field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you point out in another comment, the logic in syncServiceUpdate needs to be fixed for ci.Spec.HighAvailability.Type != CloudClusterIngressHA, but once that is fixed, I think the equality check should still always compare .Status.LoadBalancer.Ingress. If we ever support changing from CloudClusterIngressHA to some other value, we want the equality check to return false so that we update with the status with the empty list of ingresses. Am I missing anything?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Once we fix syncServiceUpdate, the equality check need to be performed to ensure there is no change in cluster ingress type.

return nil
}

if len(service.Status.LoadBalancer.Ingress) == 0 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check valid only in case of ci.Spec.HighAvailability.Type == ingressv1alpha1.CloudClusterIngressHA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Thanks!

}

ci.SetStatusAvailableCondition(true, "")
ci.Status.LoadBalancer.Ingress = service.Status.LoadBalancer.Ingress

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set this field only when ci.Spec.HighAvailability.Type == ingressv1alpha1.CloudClusterIngressHA ?

@Miciah
Copy link
Contributor Author

Miciah commented Oct 5, 2018

Latest push makes the following changes per Ravi's comments:

• Move ClusterIngressStatusEqual from status.go to handler.go, rename it to clusterIngressStatusEqual, and change its receiver from ci *ClusterIngress to h *Handler
• Add operatorNamespace and operatorName to Handler and initialize them in NewHandler.
• Use h.operatorNamespace and h.operatorName in clusterIngressStatusEqual.
• Change the equality check in clusterIngressStatusEqual to be agnostic to the ordering of conditions and ingresses.
• Change syncServiceUpdate to set the ClusterIngress's "Available" condition to "True" as soon as endpoints are ready without checking ingresses if the ClusterIngress's type is not CloudClusterIngressHA

@Miciah Miciah force-pushed the update-ClusterIngress-status branch from 5c07898 to 30f4f50 Compare October 5, 2018 04:13
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2018
// objects are equal.
func clusterIngressStatusEqual(oldCi, ci *ClusterIngress) bool {
numConditions := len(ci.Status.Conditions)
if numConditions != len(oldCi.Status.Conditions) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: numConditions var is no longer needed after this check, we could do len(ci.Status.Conditions) != len(oldCi.Status.Conditions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}
}

numIngresses := len(ci.Status.LoadBalancer.Ingress)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}

numIngresses := len(ci.Status.LoadBalancer.Ingress)
if numIngresses != len(oldCi.Status.LoadBalancer.Ingress) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Once we fix syncServiceUpdate, the equality check need to be performed to ensure there is no change in cluster ingress type.


ciName := service.Name[7:]

ciNamespace, err := k8sutil.GetWatchNamespace()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use h.operatorNamespace instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@Miciah Miciah force-pushed the update-ClusterIngress-status branch from 30f4f50 to f5a0c27 Compare October 5, 2018 20:40
@Miciah
Copy link
Contributor Author

Miciah commented Oct 5, 2018

Latest push makes the following changes per Ravi's comments:

• Delete numConditions and numIngresses in clusterIngressStatusEqual.
• Use h.operatorNamespace in syncServiceUpdate.

I made the following additional changes:

• Added some unit tests in status_test.go.
• Renamed createOrUpdateOperatorStatus to syncOperatorStatus.

After some further consideration, I think we should discuss during the next planning call how the operator should handle the "Available" condition when ci.Spec.HighAvailability.Type is not CloudClusterIngressHA.

Copy link

@pravisankar pravisankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2018
@Miciah Miciah force-pushed the update-ClusterIngress-status branch from f5a0c27 to 0c4d519 Compare October 5, 2018 21:15
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2018
Copy link

@pravisankar pravisankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: imcsk8, Miciah, pravisankar
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: ironcladlou

If they are not already assigned, you can assign the PR to them by writing /assign @ironcladlou in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 11, 2018
* Gopkg.lock:
* Gopkg.toml:
* vendor/github.com/openshift/cluster-version-operator: Import package
in order to have the necessary API types for updating OperatorStatus.
Create (if necessary) and update an OperatorStatus object to indicate
the operator's status.

* manifests/00-cluster-role.yaml: Allow the operator to create, get, and update
operatorstatuses.
* pkg/stub/handler.go: Add new fields to Handler: operatorNamespace and
operatorName, and initialize them in NewHandler.
Add new functions: syncOperatorStatus and syncOperatorStatusDegraded.
Use syncOperatorStatus in Handle to update OperatorStatus.
Use syncOperatorStatusDegraded in syncIngressUpdate for errors that
require intervention (such as RBAC problems).
Update the status conditions of ClusterIngress objects to indicate
synchronization status and availability.

Watch Services in cluster-ingress-operator's namespace in order to
reflect their status and any associated load-balancer ingresses to the
corresponding ClusterIngress objects.

* cmd/cluster-ingress-operator/main.go: Watch services and endpoints in the
operator's namespace.
* manifests/00-cluster-role.yaml: Allow operator to get services and
endpoints.
* pkg/apis/ingress/v1alpha1/status.go: Add new functions for setting
status on ClusterIngress objects: setStatusCondition,
SetStatusSyncCondition, SetStatusAvailableCondition,
clusterIngressStatusEqual, UpdateStatus.
* pkg/apis/ingress/v1alpha1/types.go: Add fields to ClusterIngressStatus
for operator and load balancer status.
* pkg/apis/ingress/v1alpha1/status_test.go: Add tests.
* pkg/stub/handler.go: Use SetStatusSyncCondition,
SetStatusAvailableCondition, and UpdateStatus in syncIngressUpdate.
Add new functions: getClusterIngress, syncServiceUpdate,
syncEndpointsUpdate, and syncServiceOrEndpointsUpdate.
Use these new functions in Handle to handle Service and Endpoints events
and to update the corresponding ClusterIngress status according to the
Service and Endpoints status.
@Miciah Miciah force-pushed the update-ClusterIngress-status branch from 0c4d519 to 00c747f Compare October 12, 2018 22:24
@openshift-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 12, 2018
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 12, 2018
@Miciah
Copy link
Contributor Author

Miciah commented Oct 12, 2018

Latest push removes the OperatorStatus code; this PR now depends on #47 to import github.com/openshift/cluster-version-operator, which has some required types. This push also adds a watch on endpoints and event handling for the same in order to resolve a race condition that would leave the "Available" condition as false with the message "service has no endpoints" when a service's ingresses are ready before its endpoints.

@Miciah Miciah changed the title Update ClusterIngress status and OperatorStatus Publish ClusterIngress status Oct 12, 2018
@imcsk8
Copy link
Contributor

imcsk8 commented Oct 30, 2018

Looks ok, just rebase and squash.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2018
@ironcladlou
Copy link
Contributor

This is probably hopelessly out of date; going to close it and revisit later.

/close

@openshift-ci-robot
Copy link
Contributor

@ironcladlou: Closing this PR.

Details

In response to this:

This is probably hopelessly out of date; going to close it and revisit later.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants