Design doc of 'Schedule DS Pod by default scheduler'. #1714

k82cn · 2018-02-01T06:46:32Z

xref kubernetes/enhancements#548

/cc @bgrant0607 , @bsalamat , @kubernetes/sig-apps-feature-requests

k82cn · 2018-02-01T08:55:31Z

That should be alpha in 1.10, as TaintNodeByCondition is still alpha right now. Any comments?

mattfarina · 2018-02-01T15:15:08Z

/cc @kow3ns

DStorck · 2018-02-02T17:54:24Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

@@ -0,0 +1,71 @@
+#Schedule DaemonSet Pods by default scheduler, not DaemonSet controller


there should be a space after the #

DStorck · 2018-02-02T17:56:03Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+
+* **Q**: Will this change/constrain update strategies, such as scheduling an updated pod to a node before the previous pod is gone?
+
+  **A**: nop, this will NOT change update strategies.


DStorck · 2018-02-02T17:56:35Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+
+Before the discussion of solutions/options, there’s some requirements/questions on DaemonSet:
+
+* **Q**: DaemonSet controller can make pods even the network of node is unavailable, e.g. CNI network providers (Calico, Flannel), Will this impact bootstrapping, such as in the case that a DaemonSet is being used to provide the pod network?


DStorck · 2018-02-02T17:57:03Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+
+* **Q**: DaemonSet controller can make pods even the network of node is unavailable, e.g. CNI network providers (Calico, Flannel), Will this impact bootstrapping, such as in the case that a DaemonSet is being used to provide the pod network?
+
+  **A**: This will be handled in Support scheduling tolerating workloads on NotReady Nodes ([#45717](https://github.com/kubernetes/kubernetes/issues/45717)); after moving to check node’s taint, the DaemonSet pods will tolerant `NetworkUnavailable` taint. 


should say 'tolerate' , not 'tolerant'

DStorck · 2018-02-02T17:58:06Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+Currently, pods of DaemonSet are created and scheduled by DaemonSet controller:
+
+1. DS controller filter nodes by nodeSelector and scheduler’s predicates
+2. For each nodes, create a Pod for it by setting spec.hostName directly; it’ll skip default scheduler


should be 'node'

DStorck · 2018-02-02T17:59:12Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+        - dest_hostname
+```
+
+##Reference


need space after ##

DStorck · 2018-02-02T18:01:35Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+##Reference
+
+* [DaemonsetController can't feel it when node has more resources, e.g. other Pod exits](https://github.com/kubernetes/kubernetes/issues/46935)
+* [DaemonsetController can't feel it when node recovered from outofdisk state](https://github.com/kubernetes/kubernetes/issues/46935)


wrong link, this is the same link as above , should be kubernetes/kubernetes#45628

bsalamat

Thanks, @k82cn and sorry for my late review.

bsalamat · 2018-02-06T07:10:49Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+* Hard to debug why DaemonSet’s Pod is not created, e.g. not enough resources; it’s better to have a pending Pods with predicates’ event
+* Hard to support preemption in different components, e.g. DS and default scheduler
+
+After [discussions](https://docs.google.com/document/d/1v7hsusMaeImQrOagktQb40ePbK6Jxp1hzgFB9OZa_ew/edit#), we come to a agreement that making DaemonSet to just produce pods like every other controller, and let them be scheduled by the regular scheduler, than to be its own scheduler.


s/a agreement/an agreement/

I would rephrase the sentence "we come to an agreement..." to:
SIG scheduling approved changing DaemonSet controller to create DaemonSet Pods and set their node-affinity and let them be scheduled by default scheduler. After this change, DaemonSet controller will no longer schedule DaemonSet Pods directly.

bsalamat · 2018-02-06T07:16:34Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+
+* **Q**: DaemonSet controller can make pods even the network of node is unavailable, e.g. CNI network providers (Calico, Flannel), Will this impact bootstrapping, such as in the case that a DaemonSet is being used to provide the pod network?
+
+  **A**: This will be handled in Support scheduling tolerating workloads on NotReady Nodes ([#45717](https://github.com/kubernetes/kubernetes/issues/45717)); after moving to check node’s taint, the DaemonSet pods will tolerant `NetworkUnavailable` taint. 


s/in Support/by supporting/

also tolerant should be tolerate

bsalamat · 2018-02-06T07:18:05Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+
+* **Q**: DaemonSet controller can make pods even when the scheduler has not been started, which can help cluster bootstrap.
+
+  **A**: As the scheduling logic is moved to default scheduler, the kube-scheduler is required after this proposal.


s/the kube-scheduler is required after this proposal/the kube-scheduler must be started during cluster start-up/

bsalamat · 2018-02-06T07:19:58Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+
+This option is to leverage NodeAffinity feature to avoid introducing scheduler’s predicates in DS controller:
+
+1. DS controller filter nodes by nodeSelector, but did NOT check against scheduler’s predicates (e.g. PodFitHostResources)


s/did not/does not/

bsalamat · 2018-02-06T07:20:48Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+This option is to leverage NodeAffinity feature to avoid introducing scheduler’s predicates in DS controller:
+
+1. DS controller filter nodes by nodeSelector, but did NOT check against scheduler’s predicates (e.g. PodFitHostResources)
+2. For each nodes, DS controller creates a Pod for it with following NodeAffinity


s/each nodes/each node/
s/with following/with the following/

bsalamat · 2018-02-06T07:23:53Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+1. DS controller filter nodes by nodeSelector, but did NOT check against scheduler’s predicates (e.g. PodFitHostResources)
+2. For each nodes, DS controller creates a Pod for it with following NodeAffinity
+3. When sync Pods, DS controller will map nodes and pods by this NodeAffinity to check whether Pods are started for nodes
+4. In scheduler, the Daemon pods will keep pending if predicates failed, e.g. PodFitHostResources; for critical daemons, DS controller will create Pods with critical pods annotation and leverage scheduler/kubelet’s logic to handle it; similar practice to [priority/preemption](https://github.com/kubernetes/features/issues/268 )


As a part of enabling priority and preemption, we must ensure that all critical DaemonSet Pods get an appropriate critical priority. If they have critical priority, scheduler will ensure that they will be scheduled even when the cluster is under resource pressure. Scheduler preempts other Pods in such condition to schedule critical Pods.

k82cn · 2018-02-07T14:59:22Z

Comments addressed :) ping @kow3ns , @bsalamat

bsalamat · 2018-02-08T01:06:36Z

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md

+1. DS controller filter nodes by nodeSelector, but does NOT check against scheduler’s predicates (e.g. PodFitHostResources)
+2. For each node, DS controller creates a Pod for it with the following NodeAffinity
+3. When sync Pods, DS controller will map nodes and pods by this NodeAffinity to check whether Pods are started for nodes
+4. In scheduler, the Daemon pods will keep pending if predicates failed, e.g. PodFitHostResources; for critical daemons,


s/the Daemon pods will keep pending if predicates failed, e.g. PodFitHostResources; for critical daemons,/Daemon Pods will stay pending if scheduling predicates fail. To avoid this,/

Signed-off-by: Da K. Ma <[email protected]>

bsalamat · 2018-02-08T06:54:19Z

/lgtm

k8s-ci-robot · 2018-02-08T06:54:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, k82cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these OWNERS Files:

~~contributors/design-proposals/scheduling/OWNERS~~ [bsalamat]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Design doc of 'Schedule DS Pod by default scheduler'.

k8s-ci-robot requested review from bsalamat and timothysc February 1, 2018 06:46

k8s-github-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 1, 2018

k82cn mentioned this pull request Feb 1, 2018

Schedule DaemonSet Pods by default scheduler kubernetes/kubernetes#59194

Closed

7 tasks

k8s-ci-robot requested a review from kow3ns February 1, 2018 15:15

DStorck reviewed Feb 2, 2018

View reviewed changes

contributors/design-proposals/scheduling/schedule-DS-pod-by-scheduler.md Outdated

- dest_hostname

```

##Reference

Copy link

DStorck Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need space after ##

DStorck reviewed Feb 2, 2018

View reviewed changes

k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Feb 6, 2018

bsalamat reviewed Feb 6, 2018

View reviewed changes

k82cn force-pushed the k8s_548 branch 2 times, most recently from 6597e10 to f936c42 Compare February 7, 2018 14:57

bsalamat reviewed Feb 8, 2018

View reviewed changes

Design doc of 'Schedule DS Pod by default scheduler'.

c92aa8e

Signed-off-by: Da K. Ma <[email protected]>

k82cn force-pushed the k8s_548 branch from f936c42 to c92aa8e Compare February 8, 2018 06:19

k8s-ci-robot assigned bsalamat Feb 8, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2018

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 8, 2018

k8s-ci-robot merged commit 1cc2e01 into kubernetes:master Feb 8, 2018

k82cn deleted the k8s_548 branch February 8, 2018 07:21

Huang-Wei mentioned this pull request Jul 27, 2018

Promote ScheduleDaemonSetPods to Beta in 1.12 kubernetes/kubernetes#66526

Closed

8 tasks

MadhavJivrajani pushed a commit to MadhavJivrajani/community that referenced this pull request Nov 30, 2021

Merge pull request kubernetes#1714 from k82cn/k8s_548

39bbabe

Design doc of 'Schedule DS Pod by default scheduler'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design doc of 'Schedule DS Pod by default scheduler'. #1714

Design doc of 'Schedule DS Pod by default scheduler'. #1714

k82cn commented Feb 1, 2018

k82cn commented Feb 1, 2018

mattfarina commented Feb 1, 2018

DStorck Feb 2, 2018

DStorck Feb 2, 2018

DStorck Feb 2, 2018

DStorck Feb 2, 2018

DStorck Feb 2, 2018

DStorck Feb 2, 2018

DStorck Feb 2, 2018

bsalamat left a comment

bsalamat Feb 6, 2018

bsalamat Feb 6, 2018

DStorck Feb 7, 2018

bsalamat Feb 6, 2018

bsalamat Feb 6, 2018

bsalamat Feb 6, 2018

bsalamat Feb 6, 2018

k82cn commented Feb 7, 2018

bsalamat Feb 8, 2018

k82cn Feb 8, 2018

bsalamat commented Feb 8, 2018

k8s-ci-robot commented Feb 8, 2018

		@@ -0,0 +1,71 @@
		#Schedule DaemonSet Pods by default scheduler, not DaemonSet controller


		* Q: Will this change/constrain update strategies, such as scheduling an updated pod to a node before the previous pod is gone?

		A: nop, this will NOT change update strategies.


		Before the discussion of solutions/options, there’s some requirements/questions on DaemonSet:

		* Q: DaemonSet controller can make pods even the network of node is unavailable, e.g. CNI network providers (Calico, Flannel), Will this impact bootstrapping, such as in the case that a DaemonSet is being used to provide the pod network?


		* Q: DaemonSet controller can make pods even the network of node is unavailable, e.g. CNI network providers (Calico, Flannel), Will this impact bootstrapping, such as in the case that a DaemonSet is being used to provide the pod network?

		A: This will be handled in Support scheduling tolerating workloads on NotReady Nodes ([#45717](https://github.com/kubernetes/kubernetes/issues/45717)); after moving to check node’s taint, the DaemonSet pods will tolerant `NetworkUnavailable` taint.


		* Q: DaemonSet controller can make pods even when the scheduler has not been started, which can help cluster bootstrap.

		A: As the scheduling logic is moved to default scheduler, the kube-scheduler is required after this proposal.


		This option is to leverage NodeAffinity feature to avoid introducing scheduler’s predicates in DS controller:

		1. DS controller filter nodes by nodeSelector, but did NOT check against scheduler’s predicates (e.g. PodFitHostResources)

Design doc of 'Schedule DS Pod by default scheduler'. #1714

Design doc of 'Schedule DS Pod by default scheduler'. #1714

Conversation

k82cn commented Feb 1, 2018

k82cn commented Feb 1, 2018

mattfarina commented Feb 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsalamat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k82cn commented Feb 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsalamat commented Feb 8, 2018

k8s-ci-robot commented Feb 8, 2018