Skip to content

✨ Add rollout strategy support for KCP#4073

Merged
k8s-ci-robot merged 1 commit into
kubernetes-sigs:masterfrom
Nordix:kcp-rollout-strategy
Mar 9, 2021
Merged

✨ Add rollout strategy support for KCP#4073
k8s-ci-robot merged 1 commit into
kubernetes-sigs:masterfrom
Nordix:kcp-rollout-strategy

Conversation

@jan-est
Copy link
Copy Markdown
Contributor

@jan-est jan-est commented Jan 14, 2021

What this PR does / why we need it:
This PR implements proposal 3857 by adding rollout strategy support for KCP. PR will enables use of maxSurge and maxUnavailable fields during KCP upgrade.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 14, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @jan-est. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 14, 2021
@jan-est jan-est force-pushed the kcp-rollout-strategy branch from 4fa622d to 915623b Compare January 14, 2021 08:32
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 14, 2021
@vincepri
Copy link
Copy Markdown
Member

/ok-to-test
/milestone v0.4.0

@k8s-ci-robot k8s-ci-robot added this to the v0.4.0 milestone Jan 14, 2021
@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 14, 2021
@jan-est jan-est force-pushed the kcp-rollout-strategy branch from 915623b to 3c7d3f4 Compare January 14, 2021 17:59
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 14, 2021
@jan-est jan-est force-pushed the kcp-rollout-strategy branch from 3c7d3f4 to 2983b86 Compare January 14, 2021 18:57
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 14, 2021
@jan-est jan-est force-pushed the kcp-rollout-strategy branch 2 times, most recently from f0d3469 to 3a918f3 Compare January 15, 2021 09:10
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 15, 2021
@jan-est jan-est force-pushed the kcp-rollout-strategy branch 4 times, most recently from fb76378 to 3dae187 Compare January 22, 2021 12:00
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 22, 2021
@jan-est jan-est force-pushed the kcp-rollout-strategy branch from 3dae187 to 0728012 Compare January 22, 2021 12:46
@jan-est jan-est force-pushed the kcp-rollout-strategy branch 4 times, most recently from abf4836 to 620a288 Compare February 2, 2021 10:54
@vincepri
Copy link
Copy Markdown
Member

vincepri commented Feb 3, 2021

@jan-est Can you squash commits?

@jan-est jan-est force-pushed the kcp-rollout-strategy branch 3 times, most recently from df2290e to 7183044 Compare February 4, 2021 07:23
@jan-est
Copy link
Copy Markdown
Contributor Author

jan-est commented Feb 4, 2021

@jan-est Can you squash commits?

@vincepri Done

Comment thread controlplane/kubeadm/api/v1alpha4/kubeadm_control_plane_webhook.go Outdated
Comment thread controlplane/kubeadm/api/v1alpha4/kubeadm_control_plane_webhook.go Outdated
Comment thread controlplane/kubeadm/api/v1alpha4/kubeadm_control_plane_webhook.go Outdated
Comment thread controlplane/kubeadm/api/v1alpha4/kubeadm_control_plane_webhook.go Outdated
Comment thread controlplane/kubeadm/api/v1alpha4/kubeadm_control_plane_webhook.go Outdated
Comment thread controlplane/kubeadm/api/v1alpha4/kubeadm_control_plane_webhook_test.go Outdated
Comment thread controlplane/kubeadm/controllers/scale_test.go
Comment thread controlplane/kubeadm/controllers/upgrade.go Outdated
Comment thread controlplane/kubeadm/controllers/upgrade.go Outdated
Comment on lines +110 to +111
// We can ignore MaxUnavailable because we are enforcing health checks before we get here.
maxNodes := *kcp.Spec.Replicas + int32(kcp.Spec.RolloutStrategy.RollingUpdate.MaxSurge.IntValue())
if status.Nodes < maxNodes {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MaxUnavailable not used anywhere in the code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not used anywhere in kubeadmControlPlane controllers code.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it used anywhere else?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, we should probably remove it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using MaxUnavailable in webhook code

Copy link
Copy Markdown
Contributor

@CecileRobertMichon CecileRobertMichon Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in this implementation we don't need both MaxSurge and MaxUnavailable since one is the opposite of the other. The benefit of having MaxUnavailbale is that it makes it explicit for the user that setting maxSurge to 0 means that you'll have 1 control plane at a time go down during upgrade.

Makes sense to remove it and add it in the future if we actually need it to implement a more flexible rollout.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do remove it we also need to update any docs/proposal that reference it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do want to keep it for UX purposes, it should probably be a calculated field, under Status

Copy link
Copy Markdown
Contributor Author

@jan-est jan-est Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am +1 for removing it complete. I agree with @CecileRobertMichon that we can add it back in the future if necessary. I remove MaxUnavailable and update the proposal when this is approved and merged?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should possibly remove it in this same PR, so the changes go in together. +1 from me as well.

if status.Nodes <= *kcp.Spec.Replicas {
// RolloutStrategy is currently defaulted and validated to be RollingUpdate
// We can ignore MaxUnavailable because we are enforcing health checks before we get here.
maxNodes := *kcp.Spec.Replicas + int32(kcp.Spec.RolloutStrategy.RollingUpdate.MaxSurge.IntValue())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@detiber @sedefsavas @CecileRobertMichon @fabriziopandini Please double check these changes, given that you all have worked more or less on this codebase.

If we were to backport this to v1alpha3, would you foresee any issues?

@jan-est jan-est force-pushed the kcp-rollout-strategy branch from 7183044 to 00f3c8e Compare February 5, 2021 07:24
return err
}

dest.Spec.RolloutStrategy = restored.Spec.RolloutStrategy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do backport to v1alpha3 wouldn't this be a problem?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do backport, we need to remove this part of the code and say something like Required minimum v0.3.x version before upgrading is ...

Comment thread controlplane/kubeadm/api/v1alpha4/kubeadm_control_plane_types.go Outdated
Comment thread controlplane/kubeadm/api/v1alpha4/kubeadm_control_plane_webhook.go Outdated
allErrs = append(allErrs, field.Invalid(field.NewPath("spec", "version"), in.Spec.Version, "must be a valid semantic version"))
}

if in.Spec.RolloutStrategy != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we error out in case in.Spec.RolloutStrategy == nil?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can if in.Spec.RolloutStrategy == nil ever be true? I assumed that defaulting on line 68 makes sure that the value is never nil. Or have I misunderstood?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the defaulting function could change at some point, I personally prefer to have each func self consistent (or at least to explicitly document the assumption each function relies on, especially if those assumptions depends on something somewhere else in the codebase), but this is just a nit, feel free to ignore it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @fabriziopandini.

ios1 := intstr.FromInt(1)
ios0 := intstr.FromInt(0)

if *in.Spec.RolloutStrategy.RollingUpdate.MaxSurge == ios0 && *in.Spec.Replicas < int32(3) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for RollingUpdate != nil and error out in case this is not true?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabriziopandini do you see any cases where RollingUpdate != nil is not true? I assumed that defaulting on line 68 makes sure that the value is never nil. Or have I misunderstood?

Comment thread controlplane/kubeadm/controllers/upgrade.go Outdated
Comment thread controlplane/kubeadm/controllers/upgrade.go
Comment thread controlplane/kubeadm/controllers/upgrade_test.go Outdated
Comment thread controlplane/kubeadm/controllers/upgrade_test.go Outdated
Comment thread exp/addons/controllers/clusterresourceset_controller_test.go Outdated
@vincepri
Copy link
Copy Markdown
Member

vincepri commented Mar 3, 2021

LGTM

Let's squash commits before merging

@vincepri
Copy link
Copy Markdown
Member

vincepri commented Mar 3, 2021

/test pull-cluster-api-test-main

@vincepri
Copy link
Copy Markdown
Member

vincepri commented Mar 3, 2021

/retest

2 similar comments
@vincepri
Copy link
Copy Markdown
Member

vincepri commented Mar 3, 2021

/retest

@jan-est
Copy link
Copy Markdown
Contributor Author

jan-est commented Mar 4, 2021

/retest

@fabriziopandini
Copy link
Copy Markdown
Member

/lgtm

@jan-est
Copy link
Copy Markdown
Contributor Author

jan-est commented Mar 9, 2021

/retest

@fabriziopandini
Copy link
Copy Markdown
Member

/lgtm

@vincepri
Copy link
Copy Markdown
Member

vincepri commented Mar 9, 2021

/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants