-
Notifications
You must be signed in to change notification settings - Fork 152
Introduce bootstrap scaling strategies #449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce bootstrap scaling strategies #449
Conversation
|
/test e2e-metal-assisted |
|
@ironcladlou: The specified target(s) for
Use
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test e2e-metal-assisted |
|
@ironcladlou: The specified target(s) for
Use
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test e2e-metal-assisted |
hexfusion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good added a note around the topic of accountability.
|
/test e2e-metal-assisted |
2 similar comments
|
/test e2e-metal-assisted |
|
/test e2e-metal-assisted |
|
@ironcladlou: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
pkg/etcdenvvar/envvarcontroller.go
Outdated
| case isManagedByAssistedInstaller: | ||
| // When managed by assisted installer, tolerate unsafe conditions only up | ||
| // until bootstrap is complete, and then enforce as in the supported case. | ||
| if nodeCount < 3 && bootstrapComplete { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the assisted installer flow there is a time gap between bootstrap complete and the time the 3rd master joins.
(the bootstrap node pivots to be the 3rd master once bootstrap is completed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bootstrap being bootkube service running on the bootstrap node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this decision has implications. technically speaking the install-config would represent 2 master replicas. In which case install complete should require 2 master nodes. Then scale up to 3 would be a secondary process to the install. Understanding a point in time where we are installComplete is important.
After install status is
completethe operator will go into a degraded state until the cluster has 3 or more master nodes.
So this allows a cluster to achieve install complete of less than 3 master nodes but not tolerate less than three after that point. So ideally you would resolve the scaling before install-complete but it would not be required. In the case where no 3rd master ever joined the cluster, it would remain degraded with clear message.
|
/test e2e-metal-assisted |
|
seems that masters are good: test-infra-cluster-assisted-installer-master-2:[{Type:MemoryPressure Status:False LastHeartbeatTime:2020-10-22 12:35:29 +0000 UTC LastTransitionTime:2020-10-22 12:34:19 +0000 UTC Reason:KubeletHasSufficientMemory Message:kubelet has sufficient memory available} {Type:DiskPressure Status:False LastHeartbeatTime:2020-10-22 12:35:29 +0000 UTC LastTransitionTime:2020-10-22 12:34:19 +0000 UTC Reason:KubeletHasNoDiskPressure Message:kubelet has no disk pressure} {Type:PIDPressure Status:False LastHeartbeatTime:2020-10-22 12:35:29 +0000 UTC LastTransitionTime:2020-10-22 12:34:19 +0000 UTC Reason:KubeletHasSufficientPID Message:kubelet has sufficient PID available} |
|
but bootkube timed out |
|
/test e2e-metal-assisted |
b52cf85 to
408284a
Compare
46612a7 to
84876ba
Compare
|
/retest |
|
e2e-metal-assisted failure is due to pending fixes in ci. |
|
fix has merged openshift/cluster-baremetal-operator#81 /test e2e-metal-assisted |
|
based on a basic review of /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hexfusion, ironcladlou The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel |
|
/retest |
|
/test e2e-agnostic |
|
infra |
|
infra.... |
|
/test e2e-agnostic |
1 similar comment
|
/test e2e-agnostic |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
strange error case here is this just prom down? |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@ironcladlou: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Going forward we will use a marker file written to the bootstrap node. This is implemented in the service in openshift/assisted-service#672 and the etcd operator in openshift/cluster-etcd-operator#449 Fixes MGMT-2454
Going forward we will use a marker file written to the bootstrap node. This is implemented in the service in openshift/assisted-service#672 and the etcd operator in openshift/cluster-etcd-operator#449 Fixes MGMT-2454
Going forward we will use a marker file written to the bootstrap node. This is implemented in the service in openshift/assisted-service#672 and the etcd operator in openshift/cluster-etcd-operator#449 Fixes MGMT-2454
Going forward we will use a marker file written to the bootstrap node. This is implemented in the service in openshift/assisted-service#672 and the etcd operator in openshift/cluster-etcd-operator#449 Fixes MGMT-2454
Going forward we will use a marker file written to the bootstrap node. This is implemented in the service in openshift/assisted-service#672 and the etcd operator in openshift/cluster-etcd-operator#449 Fixes MGMT-2454
Before this patch, there were two implicit etcd cluster scaling strategies
applied in different contexts. This patch make those strategies explicit
and adds a new strategy to support additional use cases.
The strategies are:
HAScalingStrategy (default): the etcd cluster will only be scaled up when at least
3 node are available so that HA is enforced at all times. This rule applies
during bootstrapping and in the steady state.
NonHAScalingStrategy means that during bootstrapping, the etcd cluster will
be allowed to scale when at least 2 members are available (which is not HA),
but after bootstrapping any further scaling will require 3 nodes in the same
way as HAScalingStrategy.
This strategy is selected by adding the
openshift.io/non-ha-bootstrapannotation to the openshift-etcd namespace.
UnsafeScalingStrategy means scaling will occur without regard to nodes and
any effect on quorum. Use of this strategy isn't officially tested or supported,
but is made available for ad-hoc use.
This strategy is selected by setting unsupportedConfigOverrides on the
operator config.
NonHAScalingStrategy is new and is intended to support use cases such as
assisted installer which don't use a dedicated bootstrap node and must
tolerate non-HA etcd during bootstrapping only. Currently the way to enable this
strategy is by looking for a marker file during manifest rendering. This is to
provide some measure of support without introducing new installer API.