Watch scheduler CR and make masters schedulable accordingly #937

ravisantoshgudimetla · 2019-07-08T15:45:26Z

- What I did
Added logic in node controller to watch for scheduler CR changes and update the master nodes to make them schedulable and unschedulable

- How to verify it

~~I did not test it. I want to check if the approach seems reasonable before I proceed further. Please review the c82786d~~

The e2e test verifies if the worker label has been added and master taint has been removed for all the masters when scheduler CR's mastersSchedulable field has been set or unset. That field was added to Scheduler CRD here openshift/api#366

- Description for the changelog

pkg/controller/node/node_controller.go

cgwalters · 2019-07-08T17:32:00Z

pkg/controller/node/node_controller.go

+			node.Spec.Taints = newTaints
+		})
+		if err != nil {
+			glog.Errorf("error making node %v schedulable with error %v", err)


As a general rule, rather than logging and continuing, please return the error and let the higher level logic retry.

In this case, it might be different - we cannot return error when it fails for a single node, perhaps we need to capture slice of errors for individual nodes and return. I don't have a strong preference but I see your point.

we cannot return error when it fails for a single node,

Why not?

perhaps we need to capture slice of errors for individual nodes and return.

Yes, that's a better pattern indeed, but I think "fast fail for single item" is still better than "log and continue".

Yeah, I have done slice collection and then returned because in the next iteration it could cause problems with other node. Let me know, if you prefer it other way

pkg/controller/node/node_controller.go

ravisantoshgudimetla · 2019-07-10T02:27:48Z

manifests/machineconfigcontroller/clusterrole.yaml

@@ -14,5 +14,5 @@ rules:
  resources: ["configmaps", "secrets"]
  verbs: ["*"]
 - apiGroups: ["config.openshift.io"]
-  resources: ["images", "clusterversions", "featuregates"]
+  resources: ["images", "clusterversions", "featuregates", "schedulers"]


This can be changed futher to just list, instead of giving *

pkg/controller/node/node_controller.go

runcom · 2019-07-10T09:34:13Z

@ravisantoshgudimetla this patch looks good - is there some testing I can perform manually?

pkg/controller/node/node_controller.go

runcom · 2019-07-10T14:42:05Z

--- FAIL: TestMastersSchedulable (1.15s)
    mco_test.go:52: Error while updating scheduler CR

test/e2e/mco_test.go

ravisantoshgudimetla · 2019-07-10T19:30:03Z

/test e2e-aws-upgrade

ravisantoshgudimetla · 2019-07-10T19:34:57Z

/test e2e-aws-upgrade

pkg/controller/node/node_controller.go

ravisantoshgudimetla · 2019-07-11T00:25:35Z

/retest

ravisantoshgudimetla · 2019-07-11T03:05:52Z

/retest

fail [k8s.io/kubernetes/test/e2e/upgrades/apps/deployments.go:167]: Unexpected error:

fail [k8s.io/kubernetes/test/e2e/framework/framework.go:338]: Jul 11 01:26:53.041: Couldn't delete ns: "svcaccounts-4488": namespace svcaccounts-4488 was not deleted with limit: timed out waiting for the condition, namespace is empty but is not yet removed (&errors.errorString{s:"namespace svcaccounts-4488 was not deleted with limit: timed out waiting for the condition, namespace is empty but is not yet removed"})

ravisantoshgudimetla · 2019-07-11T03:58:44Z

Jul 11 01:55:14.869 E clusterversion/version changed Failing to True: UpdatePayloadFailed: Could not update deployment "openshift-machine-config-operator/etcd-quorum-guard" (359 of 401)

runcom · 2019-07-11T08:42:21Z

/test e2e-aws

verify needs some care

pkg/controller/node/node_controller.go

cgwalters · 2019-07-11T20:42:24Z

Overall this looks good, thanks for working on it! Can you please squash it into one commit - since this is really one logical change. And also now the last commit has intermixed fixups from review. And please write a commit message, something like:

Watch scheduler CR and make masters schedulable accordingly

By default, the MCO's kubelet configuration injects a taint to disable scheduling. This adds support in the MCO for watching the API recently added to allow the masters to be schedulable. openshift/api#366

Use cases here are for 3 node bare metal clusters (which have significant resources on the masters), as well as CodeReady Containers which wants to make a single node OpenShift as a VM.

Closes: #763

ravisantoshgudimetla · 2019-07-11T21:22:55Z

Thanks for the review @runcom @cgwalters @kikisdeliveryservice. I made the changes suggested. PTAL

ravisantoshgudimetla · 2019-07-11T23:49:41Z

/retest

kikisdeliveryservice · 2019-07-12T00:02:43Z

this lgtm

will let @cgwalters / @runcom give the final approval

runcom · 2019-07-12T10:14:04Z

needs a rebase :(

cgwalters · 2019-07-12T13:43:42Z

needs a rebase :(

The bright side is it's an opportunity to use the 🏄‍♂️ emoji

ravisantoshgudimetla · 2019-07-12T15:57:52Z

Rebased 🙈

kikisdeliveryservice · 2019-07-12T16:14:53Z

ci didn't like that last commit :(

kikisdeliveryservice · 2019-07-12T16:17:18Z

uhoh hitting limits now:

./tmp/openshift-install-050178053/vpc/master-elb.tf line 1, in resource \"aws_lb\" \"api_internal\":"

kikisdeliveryservice · 2019-07-12T16:23:49Z

reported but ci has been hitting this for the last few hours.

ravisantoshgudimetla · 2019-07-12T20:55:06Z

/retest

cgwalters · 2019-07-12T20:56:30Z

/lgtm

By default, the MCO's kubelet configuration injects a taint to disable scheduling. This adds support in the MCO for watching the API recently added to allow the masters to be schedulable. openshift/api#366 Use cases here are for 3 node bare metal clusters (which have significant resources on the masters), as well as CodeReady Containers which wants to make a single node OpenShift as a VM. Closes: openshift#763

cgwalters · 2019-07-12T21:58:27Z

/lgtm

openshift-ci-robot · 2019-07-12T22:01:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, ravisantoshgudimetla, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kikisdeliveryservice · 2019-07-12T23:43:48Z

Still seeing ci problems:

level=error
level=error msg="  on ../tmp/openshift-install-001008030/vpc/master-elb.tf line 19, in resource \"aws_lb\" \"api_external\":"
level=error msg="  19: resource \"aws_lb\" \"api_external\" {"

/retest

ravisantoshgudimetla · 2019-07-13T03:40:31Z

/retest

The purpose of this change was to make masters able to run workloads by default. This is needed to complete a successful deployment of a 3-node bare metal install. This particular approach was only short term, while better interfaces were developed to control this behavior. The scheduler configuration resource now includes a "mastersSchedulable" boolean, enabled here: openshift#937 This installer PR made it the default behavior if no workers were defined at install time: openshift/installer#2004 With these changes in place, the custom kubelet config for the baremetal platform is no longer necessary.

openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 8, 2019

openshift-ci-robot requested review from LorbusChris and cgwalters July 8, 2019 15:45

ravisantoshgudimetla commented Jul 8, 2019

View reviewed changes

pkg/controller/node/node_controller.go Outdated Show resolved Hide resolved

runcom reviewed Jul 8, 2019

View reviewed changes

pkg/controller/node/node_controller.go Outdated Show resolved Hide resolved

cgwalters reviewed Jul 8, 2019

View reviewed changes

ravisantoshgudimetla force-pushed the add-scheduler-watcher branch from bc654d7 to 0304353 Compare July 9, 2019 18:09

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 9, 2019

ravisantoshgudimetla force-pushed the add-scheduler-watcher branch 2 times, most recently from 545902f to d23ccf7 Compare July 9, 2019 18:53

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 9, 2019

ravisantoshgudimetla force-pushed the add-scheduler-watcher branch from d23ccf7 to 96cd136 Compare July 9, 2019 18:57

ravisantoshgudimetla commented Jul 10, 2019

View reviewed changes

runcom reviewed Jul 10, 2019

View reviewed changes

pkg/controller/node/node_controller.go Outdated Show resolved Hide resolved

runcom reviewed Jul 10, 2019

View reviewed changes

pkg/controller/node/node_controller.go Outdated Show resolved Hide resolved

runcom reviewed Jul 10, 2019

View reviewed changes

pkg/controller/node/node_controller.go Outdated Show resolved Hide resolved

runcom reviewed Jul 10, 2019

View reviewed changes

test/e2e/mco_test.go Outdated Show resolved Hide resolved

runcom reviewed Jul 10, 2019

View reviewed changes

pkg/controller/node/node_controller.go Outdated Show resolved Hide resolved

ravisantoshgudimetla force-pushed the add-scheduler-watcher branch 2 times, most recently from 364cabb to 2cc6a1e Compare July 10, 2019 21:47

ravisantoshgudimetla force-pushed the add-scheduler-watcher branch 2 times, most recently from 57995a9 to f6b7b5f Compare July 11, 2019 12:29

cgwalters reviewed Jul 11, 2019

View reviewed changes

pkg/controller/node/node_controller.go Outdated Show resolved Hide resolved

ravisantoshgudimetla force-pushed the add-scheduler-watcher branch from f6b7b5f to b265bd4 Compare July 11, 2019 21:21

ravisantoshgudimetla force-pushed the add-scheduler-watcher branch from b265bd4 to 3c60a1a Compare July 12, 2019 15:36

openshift-ci-robot assigned cgwalters Jul 12, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2019

ravisantoshgudimetla force-pushed the add-scheduler-watcher branch from eb910e1 to 5144a15 Compare July 12, 2019 21:23

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2019

openshift-merge-robot merged commit 8367a76 into openshift:master Jul 13, 2019

russellb mentioned this pull request Jul 19, 2019

baremetal: Drop custom kubelet configuration. #993

Closed

russellb mentioned this pull request Jul 31, 2019

mastersSchedulable: true not working when set at install time #1024

Closed

Watch scheduler CR and make masters schedulable accordingly #937

Watch scheduler CR and make masters schedulable accordingly #937

Uh oh!

Conversation

ravisantoshgudimetla commented Jul 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cgwalters Jul 8, 2019

Choose a reason for hiding this comment

Uh oh!

ravisantoshgudimetla Jul 8, 2019

Choose a reason for hiding this comment

Uh oh!

cgwalters Jul 10, 2019

Choose a reason for hiding this comment

Uh oh!

ravisantoshgudimetla Jul 11, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ravisantoshgudimetla Jul 10, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

runcom commented Jul 10, 2019

Uh oh!

Uh oh!

Uh oh!

runcom commented Jul 10, 2019

Uh oh!

Uh oh!

ravisantoshgudimetla commented Jul 10, 2019

Uh oh!

ravisantoshgudimetla commented Jul 10, 2019

Uh oh!

Uh oh!

ravisantoshgudimetla commented Jul 11, 2019

Uh oh!

ravisantoshgudimetla commented Jul 11, 2019

Uh oh!

ravisantoshgudimetla commented Jul 11, 2019

Uh oh!

runcom commented Jul 11, 2019

Uh oh!

Uh oh!

cgwalters commented Jul 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ravisantoshgudimetla commented Jul 11, 2019

Uh oh!

ravisantoshgudimetla commented Jul 11, 2019

Uh oh!

kikisdeliveryservice commented Jul 12, 2019

Uh oh!

runcom commented Jul 12, 2019

Uh oh!

cgwalters commented Jul 12, 2019

Uh oh!

ravisantoshgudimetla commented Jul 12, 2019

Uh oh!

kikisdeliveryservice commented Jul 12, 2019

Uh oh!

kikisdeliveryservice commented Jul 12, 2019

Uh oh!

kikisdeliveryservice commented Jul 12, 2019

Uh oh!

ravisantoshgudimetla commented Jul 12, 2019

Uh oh!

cgwalters commented Jul 12, 2019

Uh oh!

cgwalters commented Jul 12, 2019

ravisantoshgudimetla commented Jul 8, 2019 •

edited

Loading

cgwalters commented Jul 11, 2019 •

edited

Loading

kikisdeliveryservice commented Jul 12, 2019 •

edited

Loading