Update the progressive policy rollout design #118

mprahl · 2024-04-20T18:11:51Z

Note that since there's so much change, it's best to treat this like a new document when reviewing the changes (e.g. look at the markdown file directly instead of the diff).

Part of the proposal applies to all of Open Cluster Management.

mprahl · 2024-04-20T18:12:19Z

/cc @dhaiducek @JustinKuli @yiraeChristineKim @gparvin

mprahl · 2024-04-20T18:13:50Z

/cc @imiller0 in case you are interested!

mprahl · 2024-04-26T18:00:25Z

/cc @serngawy @qiujian16 @jnpacker

yiraeChristineKim · 2024-04-26T18:59:37Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

-version of the policy, and GitOps is the recommended method for this.
+- **Rollbacks** - Reverting a policy does not necessarily rollback the change. It's up to the user to determine how to
+  rollback in the event of a rollout failure.
+- **Group Rollouts** - Rollouts of a group of policies is not a goal at this moment due to technical and user experience


Group of policy means policy set?

Group of policy means policy set?

Correct

yiraeChristineKim · 2024-04-26T19:19:42Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

+
+If the user wants the approval to be conditional on a policy version, the user must set `spec.approvalsForVersion` to a
+value that matches the `policy.open-cluster-management.io/version` annotation on the root policy. By default, the
+approvals apply to all policy versions.


apply to lastest?

all versions little confusing me

You're right. I'll update that.

yiraeChristineKim · 2024-04-26T19:23:22Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

+
+| Status          | Description                                                                                           |
+| --------------- | ----------------------------------------------------------------------------------------------------- |
+| `ToApply`       | The cluster is waiting for the new version of the policy but the old version is still applied.        |


Opt: hmm to me, await , ready are better to apply not clear to me

This enum has been out there for a little bit, so I suspect there may be opposition to updating this particular one.

yiraeChristineKim · 2024-04-26T19:26:05Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

+  namespace: <same as root policy>
+spec:
+  approvalsForVersion: "2.0" <-- optional if the user wants to tie the approval to a specific policy "version".
+  decisionGroups:


So this is the order of groups?

The order here doesn't actually matter. I prefer using a map in these cases but most Kubernetes APIs use lists with objects with name keys, so this copies that. The order in the placement matters in terms of priority but are not dependencies like in progressivePerGroup.

yiraeChristineKim · 2024-04-26T19:27:30Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

+    progressivePerGroup:
+      minSuccessTime: 5m
+        progressDeadline: 10m
+        maxFailures: 2%


Do we have some like fail-fast? wherever is fail, stop progressing

Do we have some like fail-fast? wherever is fail, stop progressing

The default is to stop rollouts on failures. These tuning points allow it to continue within customizable thresholds.

yiraeChristineKim · 2024-04-26T19:29:43Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

+1. The updated policy applies to all clusters in the `stage` group. These clusters have the `rolloutStatus` of
+   `InProgress`.
+1. All the clusters in the `stage` group become compliant. Their `rolloutStatus` is set to `Succeeded`.
+1. The updated policy applies to all clusters in the `prod` group. These clusters have the `rolloutStatus` of


Individual status of group only appears on kind: rollout?

I'm not proposing statuses per group but only rollout statuses per cluster and a summary rollout status to reflect all clusters.

yiraeChristineKim · 2024-04-26T19:31:47Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

+#### Updated Policy With manualPerGroup in Order
+
+1. The root policy is updated.
+1. All clusters have the `rolloutStatus` of `ToApply` and keep the previous policy version. The root policy


So if user set both dev and prod to true at the same time. The order would be dev -> prod?

Yes, that's correct.

mprahl · 2024-04-26T19:33:49Z

/hold for reviews

yiraeChristineKim · 2024-04-26T19:42:43Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

+new cluster would receive the last successful policy version, if any, until the rollout progressed to the `stage` group.
+
+If the cluster is added to an earlier group than the group that is `InProgress`, the rollout switches back to that group
+and waits for the cluster to rollout before resuming progress at the point previous to the new cluster being added.


My understanding is. a newbie cluster is in progress as soon as it comes. And then the previously progressed cluster restarts its progress.. is this correct?

Not quite. The new cluster always gets a policy. Either the last successful version or the latest version. Which one is dependent on if the cluster is in a group that is being rolled out to or has been rolled out to.

For example, if you have dev, stage, and prod groups and the rollout is on the stage group.

If the new cluster is a dev cluster, the rollout goes back to dev and applies the latest policy to the new cluster before going back to stage.

If the new cluster is a prod cluster, the new cluster gets the last successful policy until the rollout is on the prod group. Then the new cluster gets thte latest policy.

yiraeChristineKim · 2024-04-26T19:47:36Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

+In this situation, the first option is to update the policy spec in some way which would completly restart the rollout.
+
+The second option is to use the `spec.retryRollout.rolloutUID` field on the `Rollout` object. When set to a UID that
+matches `status.rolloutUID`, the Governance Policy Propagator will restart the rollout and create a new rollout UID.


Are we going to save snaps of policy + rollout?
I expected second option retry failed group

Could you please rephrase the question?

yiraeChristineKim · 2024-04-26T19:50:42Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

+  remediationAction: enforce
+  rolloutStrategy:
+    ignoreClusterRolloutStatus:
+      matchExpressions:


Why use this instead of simple group name?

You might have prod clusters that are flaky but you still want to apply the policy to them without ignoring the status of all prod clusters.

yiraeChristineKim · 2024-04-26T20:07:05Z

enhancements/sig-policy/99-policy-placement-strategy/README.md


 ## Design Details

 ### Open Questions

-1. Should the per-cluster status on the root policy be grouped similar to how they're grouped in the
-   `PlacementDecisions`?
+1. Should the `spec.rolloutStrategy.ignoreClusterRolloutStatus` field be contributed to the rollout strategy API?


Does ignoreClusterRolloutStatus ignore cluster group failure? Is it excluded from maxfailure calculation?

Essentially, the cluster gets the new policy version when its group is being rolled out to, but its status is ignored and not counted towards failure or success.

dhaiducek

Looks really good! Thanks for the updates! I have a couple questions/comments.

enhancements/sig-policy/99-policy-placement-strategy/README.md

dhaiducek · 2024-04-30T19:35:16Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

+
+| Status          | Description                                                                                           |
+| --------------- | ----------------------------------------------------------------------------------------------------- |
+| `ToApply`       | The cluster is waiting for the new version of the policy but the old version is still applied.        |


This enum has been out there for a little bit, so I suspect there may be opposition to updating this particular one.

dhaiducek · 2024-04-30T19:39:00Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

-version of the policy, and GitOps is the recommended method for this.
+- **Rollbacks** - Reverting a policy does not necessarily rollback the change. It's up to the user to determine how to
+  rollback in the event of a rollout failure.
+- **Group Rollouts** - Rollouts of a group of policies is not a goal at this moment due to technical and user experience


Group of policy means policy set?

Correct

dhaiducek · 2024-04-30T19:39:18Z

enhancements/sig-policy/99-policy-placement-strategy/README.md

+    progressivePerGroup:
+      minSuccessTime: 5m
+        progressDeadline: 10m
+        maxFailures: 2%


Do we have some like fail-fast? wherever is fail, stop progressing

The default is to stop rollouts on failures. These tuning points allow it to continue within customizable thresholds.

enhancements/sig-policy/99-policy-placement-strategy/README.md

Signed-off-by: mprahl <[email protected]>

openshift-ci · 2024-05-01T19:26:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mprahl, yiraeChristineKim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~enhancements/sig-policy/OWNERS~~ [mprahl]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot requested review from deads2k and dhaiducek April 20, 2024 18:11

openshift-ci bot added the approved label Apr 20, 2024

openshift-ci bot requested review from gparvin, JustinKuli and yiraeChristineKim April 20, 2024 18:12

mprahl force-pushed the progressive-rollout branch 3 times, most recently from e8a7a60 to c1d8c13 Compare April 26, 2024 17:59

openshift-ci bot requested review from jnpacker, qiujian16 and serngawy April 26, 2024 18:00

yiraeChristineKim reviewed Apr 26, 2024

View reviewed changes

openshift-ci bot added the do-not-merge/hold label Apr 26, 2024

yiraeChristineKim reviewed Apr 26, 2024

View reviewed changes

dhaiducek reviewed Apr 30, 2024

View reviewed changes

mprahl force-pushed the progressive-rollout branch from c1d8c13 to 5bdaf3c Compare May 1, 2024 14:15

mprahl requested review from dhaiducek and yiraeChristineKim May 1, 2024 14:15

mprahl force-pushed the progressive-rollout branch from 5bdaf3c to 119b634 Compare May 1, 2024 14:16

Update the progressive policy rollout design

c5d65cc

Signed-off-by: mprahl <[email protected]>

mprahl force-pushed the progressive-rollout branch from 119b634 to c5d65cc Compare May 1, 2024 14:29

yiraeChristineKim approved these changes May 1, 2024

View reviewed changes

openshift-ci bot assigned yiraeChristineKim May 1, 2024

openshift-ci bot added the lgtm label May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the progressive policy rollout design #118

Update the progressive policy rollout design #118

mprahl commented Apr 20, 2024 •

edited

Loading

mprahl commented Apr 20, 2024

mprahl commented Apr 20, 2024

mprahl commented Apr 26, 2024

yiraeChristineKim Apr 26, 2024

dhaiducek Apr 30, 2024

yiraeChristineKim Apr 26, 2024

yiraeChristineKim Apr 26, 2024

mprahl May 1, 2024

yiraeChristineKim Apr 26, 2024 •

edited

Loading

dhaiducek Apr 30, 2024

yiraeChristineKim Apr 26, 2024

mprahl May 1, 2024

yiraeChristineKim Apr 26, 2024

dhaiducek Apr 30, 2024

yiraeChristineKim Apr 26, 2024

mprahl May 1, 2024 •

edited

Loading

yiraeChristineKim Apr 26, 2024

mprahl May 1, 2024

mprahl commented Apr 26, 2024

yiraeChristineKim Apr 26, 2024

mprahl May 1, 2024

yiraeChristineKim Apr 26, 2024 •

edited

Loading

mprahl May 1, 2024

yiraeChristineKim Apr 26, 2024

mprahl May 1, 2024

yiraeChristineKim Apr 26, 2024 •

edited

Loading

mprahl May 1, 2024

dhaiducek left a comment

dhaiducek Apr 30, 2024

dhaiducek Apr 30, 2024

dhaiducek Apr 30, 2024

openshift-ci bot commented May 1, 2024

Update the progressive policy rollout design #118

Are you sure you want to change the base?

Update the progressive policy rollout design #118

Conversation

mprahl commented Apr 20, 2024 • edited Loading

mprahl commented Apr 20, 2024

mprahl commented Apr 20, 2024

mprahl commented Apr 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiraeChristineKim Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mprahl May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mprahl commented Apr 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiraeChristineKim Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiraeChristineKim Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhaiducek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented May 1, 2024

mprahl commented Apr 20, 2024 •

edited

Loading

yiraeChristineKim Apr 26, 2024 •

edited

Loading

mprahl May 1, 2024 •

edited

Loading

yiraeChristineKim Apr 26, 2024 •

edited

Loading

yiraeChristineKim Apr 26, 2024 •

edited

Loading