Skip to content

Conversation

@enxebre
Copy link
Member

@enxebre enxebre commented Jul 16, 2020

With #608 we dropped the burden from the user to set the clusterID label on machines.
As elaborated in #608 (comment) the motivation is that this is an implementation detail that users shouldn't care about.

However as the labels are used by machineSet to determine ownership, the change introduced above might result in edge scenarios where the machineSet and machine label has a different value. This would result in machines going orphan and the machineSet recreating new instances. Bad.

Therefore we choose now to remove the burden from users by enforcing the label value via webhhooks and keeping the old behaviour in the backend to avoid any chance of breaking existing environments where bad input might have been set as in https://bugzilla.redhat.com/show_bug.cgi?id=1857175.

openshift/cluster-api-actuator-pkg#178
openshift/cluster-api-provider-gcp#106
openshift/cluster-api-provider-azure#153
openshift/cluster-api-provider-aws#341

@enxebre enxebre changed the title Bug 1857175 Bug 1857175: enforce clusterID label via webhook and preserve old behaviour in the backend Jul 16, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jul 16, 2020
@openshift-ci-robot
Copy link
Contributor

@enxebre: This pull request references Bugzilla bug 1857175, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1857175: enforce clusterID label via webhook and preserve old behaviour in the backend

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@enxebre
Copy link
Member Author

enxebre commented Jul 16, 2020

/hold
to do some manual test.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 16, 2020
@openshift-ci-robot
Copy link
Contributor

@enxebre: This pull request references Bugzilla bug 1857175, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1857175: enforce clusterID label via webhook and preserve old behaviour in the backend

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the validating webhooks so that the label can't be modified later as well? I expect we would want the clusterID label to be immutable?

Comment on lines 270 to 273
if m.GetLabels() == nil {
m.Labels = make(map[string]string)
}
m.Labels[MachineClusterIDLabel] = h.clusterID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the interface accessor on one line, then the struct member on another? Seems odd to me

Suggested change
if m.GetLabels() == nil {
m.Labels = make(map[string]string)
}
m.Labels[MachineClusterIDLabel] = h.clusterID
if m.Labels == nil {
m.Labels = make(map[string]string)
}
m.Labels[MachineClusterIDLabel] = h.clusterID

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, updated.

Comment on lines 131 to 134
if ms.GetLabels() == nil {
ms.Labels = make(map[string]string)
}
ms.Labels[MachineClusterIDLabel] = h.clusterID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Suggested change
if ms.GetLabels() == nil {
ms.Labels = make(map[string]string)
}
ms.Labels[MachineClusterIDLabel] = h.clusterID
if ms.Labels == nil {
ms.Labels = make(map[string]string)
}
ms.Labels[MachineClusterIDLabel] = h.clusterID

@enxebre
Copy link
Member Author

enxebre commented Jul 16, 2020

Should we update the validating webhooks so that the label can't be modified later as well? I expect we would want the clusterID label to be immutable?

Yeh but this might have other implications on existing resources difficult to predict. Let's scope this PR to fix the existing issue while improving the current experience for creation.

@JoelSpeed
Copy link
Contributor

Yeh but this might have other implications on existing resources difficult to predict. Let's scope this PR to fix the existing issue while improving the current experience for creation.

Make sense

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2020
@enxebre
Copy link
Member Author

enxebre commented Jul 16, 2020

tested this on a running cluster and realised I was defaulting the wrong labels for the machineSet. Fixed now.
I'll keep the hold while I put a counter PR for e2e testing the scenario described in the BZ.

…nd machineSet via webhook

With openshift#608 we dropped the burden from the user to set the clusterID label on machines.
As elaborated in openshift#608 (comment) the motivation is that this is an implementation detail that users shouldn't care about.

However as the labels are used by machineSet to determine ownership, the change introduced above might result in edge scenarios where the machineSet and machine label has a different value. This would result in machines going orphan and the machineSet recreating new instances. Bad.

Therefore we choose now to remove the burden from users by enforcing the label value via webhhooks and keeping the old behaviour in the backend to avoid any chance of breaking existing environments where bad input might have been set as in https://bugzilla.redhat.com/show_bug.cgi?id=1857175.
@enxebre
Copy link
Member Author

enxebre commented Jul 17, 2020

/retest

enxebre added a commit to enxebre/cluster-api-actuator-pkg that referenced this pull request Jul 17, 2020
…uster-api-cluster

With #608 we dropped the burden from the user to set the clusterID label on machines.
As elaborated in #608 (comment) the motivation is that this is an implementation detail that users shouldn't care about.

However as the labels are used by machineSet to determine ownership, the change introduced above might result in edge scenarios where the machineSet and machine label has a different value. This would result in machines going orphan and the machineSet recreating new instances. Bad.

Therefore we choose now to remove the burden from users by enforcing the label value via webhhooks and keeping the old behaviour in the backend to avoid any chance of breaking existing environments where bad input might have been set as in https://bugzilla.redhat.com/show_bug.cgi?id=1857175. This should be fixed by openshift/machine-api-operator#644 and this PR validates that scenario.
enxebre added a commit to enxebre/cluster-api-actuator-pkg that referenced this pull request Jul 17, 2020
…hift.io/cluster-api-cluster

With #608 we dropped the burden from the user to set the clusterID label on machines.
As elaborated in #608 (comment) the motivation is that this is an implementation detail that users shouldn't care about.

However as the labels are used by machineSet to determine ownership, the change introduced above might result in edge scenarios where the machineSet and machine label has a different value. This would result in machines going orphan and the machineSet recreating new instances. Bad.

Therefore we choose now to remove the burden from users by enforcing the label value via webhhooks and keeping the old behaviour in the backend to avoid any chance of breaking existing environments where bad input might have been set as in https://bugzilla.redhat.com/show_bug.cgi?id=1857175. This should be fixed by openshift/machine-api-operator#644 and this PR validates that scenario.
enxebre added a commit to enxebre/cluster-api-actuator-pkg that referenced this pull request Jul 17, 2020
…hift.io/cluster-api-cluster

With #608 we dropped the burden from the user to set the clusterID label on machines.
As elaborated in #608 (comment) the motivation is that this is an implementation detail that users shouldn't care about.

However as the labels are used by machineSet to determine ownership, the change introduced above might result in edge scenarios where the machineSet and machine label has a different value. This would result in machines going orphan and the machineSet recreating new instances. Bad.

Therefore we choose now to remove the burden from users by enforcing the label value via webhhooks and keeping the old behaviour in the backend to avoid any chance of breaking existing environments where bad input might have been set as in https://bugzilla.redhat.com/show_bug.cgi?id=1857175. This should be fixed by openshift/machine-api-operator#644 and this PR validates that scenario.
@enxebre
Copy link
Member Author

enxebre commented Jul 17, 2020

/hold cancel
Once this is merged I'll create the PR for https://github.com/openshift/cluster-api-actuator-pkg/compare/master...enxebre:bug-1857175-test?expand=1.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 17, 2020
@openshift-ci-robot
Copy link
Contributor

@enxebre: This pull request references Bugzilla bug 1857175, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1857175: enforce clusterID label via webhook and preserve old behaviour in the backend

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@enxebre
Copy link
Member Author

enxebre commented Jul 20, 2020

/test e2e-azure-operator

Copy link

@Danil-Grigorev Danil-Grigorev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 20, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 86cba08 into openshift:master Jul 20, 2020
@openshift-ci-robot
Copy link
Contributor

@enxebre: All pull requests linked via external trackers have merged: openshift/machine-api-operator#644. Bugzilla bug 1857175 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1857175: enforce clusterID label via webhook and preserve old behaviour in the backend

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

enxebre added a commit to enxebre/machine-api-operator that referenced this pull request Jul 22, 2020
This enabled webhooks to enforce the clusterID labels on machineSet/machine creation openshift#644.
To avoid any difficult to predict side effect we want to honour any existing value that is already set on existing resources.
enxebre added a commit to enxebre/cluster-api-provider-gcp-1 that referenced this pull request Jul 22, 2020
enxebre added a commit to enxebre/cluster-api-provider-azure that referenced this pull request Jul 22, 2020
enxebre added a commit to enxebre/cluster-api-provider-azure that referenced this pull request Jul 22, 2020
This is to bring openshift/machine-api-operator#644 and so restoring the previous backend behaviour in the machine controller openshift/machine-api-operator@261c337
enxebre added a commit to enxebre/cluster-api-provider-aws-2 that referenced this pull request Jul 22, 2020
This is to bring openshift/machine-api-operator#644 and so restoring the previous backend behaviour in the machine controller openshift/machine-api-operator@261c337
enxebre added a commit to enxebre/cluster-api-provider-gcp-1 that referenced this pull request Jul 22, 2020
This is to bring openshift/machine-api-operator#644 and so restoring the previous backend behaviour in the machine controller openshift/machine-api-operator@261c337
enxebre added a commit to enxebre/cluster-api-provider-azure that referenced this pull request Jul 22, 2020
@openshift-ci-robot
Copy link
Contributor

@enxebre: Some pull requests linked via external trackers have merged: . The following pull requests linked via external trackers have not merged:

Details

In response to this:

Bug 1857175: enforce clusterID label via webhook and preserve old behaviour in the backend

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-merge-robot added a commit to openshift/cluster-api-provider-gcp that referenced this pull request Jul 22, 2020
openshift-merge-robot added a commit to openshift/cluster-api-provider-aws that referenced this pull request Jul 22, 2020
openshift-merge-robot pushed a commit to openshift/cluster-api-provider-azure that referenced this pull request Jul 22, 2020
enxebre added a commit to enxebre/machine-api-operator that referenced this pull request Jul 27, 2020
This tries to fix the following scenario:
We set ms.Spec.Selector.MatchLabels[MachineClusterIDLabel] if it's not present. It's not present and we set it to the correct value. If there happens to be a bad label in `ms.Spec.Template.Labels` this would result in a miss match.

Follow for
openshift#608,
openshift#644 and
openshift#653.
enxebre added a commit to enxebre/machine-api-operator that referenced this pull request Jul 28, 2020
This enabled webhooks to enforce the clusterID labels on machineSet/machine creation openshift#644.
To avoid any difficult to predict side effect we want to honour any existing value that is already set on existing resources.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants