Skip to content

Commit 78faaba

Browse files
committed
updated test scenarios and links. Described what happens if external component sets NNN
1 parent 65a5008 commit 78faaba

File tree

1 file changed

+35
-3
lines changed
  • keps/sig-scheduling/5278-nominated-node-name-for-expectation

1 file changed

+35
-3
lines changed

keps/sig-scheduling/5278-nominated-node-name-for-expectation/README.md

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,21 @@ If we look from consumption point of view - these are effectively the same. We w
290290
to expose the information, that as of now a given node is considered as a potential placement
291291
for a given pod. It may change, but for now that's what is being considered.
292292

293+
#### External components may set `NominatedNodeName`
294+
295+
Currently `NominatedNodeName` field is intended as read-only for components other than kube-scheduler. However there are no measures preventing other actors from overwriting the field. This is not considered a substantial risk to scheduling.
296+
297+
Scheduler interprets `NominatedNodeName` as a suggestion for optimal placement for a pod. If at the beginning of a scheduling cycle NNN is set (e.g. to `N1`), the scheduler will start the scheduling attempt with trying to place the pod on `N1`. This could go two ways:
298+
299+
A. Pod fits on `N1`. Pod is bound, after binding NNN gets cleared in api-server. The only risk here is that `N1` could not be the optimal placement for the pod.
300+
301+
B. Pods does not fit on `N1` (or `N1` is invalid). Scheduler restarts the scheduling cycle, ignoring NNN value. Filtering, Scoring and other phases get executed, standard scheduling procedure continues. If the pod is deemed unschedulable, scheduler clears NNN field before moving the pod to unschedulable / backoff queue. The risk in this case is that the scheduler spends time trying to fit the pod on `N1` in the beginning - which is not a huge overhead compared to the entire scheduling cycle.
302+
303+
304+
If `NominatedNodeName` gets overwritten further into the scheduling cycle, or when the pod is waiting in a scheduling queue, it does not impact kube-scheduler's work.
305+
306+
Note that this logic is not newly introduced by this KEP, it's present in kube-scheduler since v1.22 and [KEP-1923](https://github.com/kubernetes/enhancements/tree/94277fd2b7683836465e97f1f7b974ff11fa58b0/keps/sig-scheduling/1923-prefer-nominated-node).
307+
293308
#### Node nominations need to be considered together with reserving DRA resources
294309

295310
The semantics of node nomination are in fact resource reservation, either in scheduler memory or in external components (after the nomination got persisted to the api-server). Since pods consume both node resources and DRA resources, it's important to persist both at the same (or almost the same) point in time.
@@ -401,12 +416,19 @@ to implement this enhancement.
401416

402417
##### Integration tests
403418

419+
Tests already implemented:
420+
[test/integration/scheduler/nominated_node_name](https://github.com/kubernetes/kubernetes/tree/master/test/integration/scheduler/nominated_node_name) : [integration master](https://testgrid.k8s.io/sig-release-master-blocking#integration-master&include-filter-by-regex=nominated_node_name) , [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=Test_PutNominatedNodeNameInBindingCycle)
421+
422+
Covering scenarios:
423+
- scheduler sets NNN before PreBind and WaitOnPermit, and does not set NNN when PreBind and Permit phases are skipped for the pod
424+
425+
More tests are WIP https://github.com/kubernetes/kubernetes/pull/133215
426+
404427
We're going to add these integration tests:
405428
- The scheduler prefers to picking up nodes based on NominatedNodeName on pods, if the nodes are available.
406429
- The scheduler ignores NominatedNodeName reservations on pods when it's scheduling higher priority pods.
407430
- The scheduler overwrites NominatedNodeName when it performs the preemption, or when it finds another spot in another node and proceeding to the binding cycle (assuming there's a PreBind plugin).
408-
- The scheduler puts NominatedNodeName at the beginning of binding cycles if Permit or PreBind plugin will do some work.
409-
- And, the scheduler (actually kube-apiserver, when receiving a binding request) clears NominatedNodeName when the pod is actually bound.
431+
- And, the scheduler (actually kube-apiserver, when receiving a binding request) clears NominatedNodeName when the pod is actually bound.
410432

411433
Also, with [scheduler-perf](https://github.com/kubernetes/kubernetes/tree/master/test/integration/scheduler_perf), we'll make sure the scheduling throughputs for pods that go through Permit or PreBind don't get regress too much.
412434
We need to accept a small regression to some extent since there'll be a new API call to set NominatedNodeName.
@@ -519,7 +541,17 @@ there'll be nothing behaving wrong in the scheduling flow, see [Version Skew Str
519541

520542
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
521543

522-
TODO: update the test scenario
544+
We will do the following manual test after implementing the feature:
545+
546+
1. upgrade
547+
2. request scheduling of a pod that will need a long preBinding phase (e.g. uses volumes)
548+
3. check that NNN gets set for that pod
549+
4. before binding completes, restart the scheduler with nominatedNodeNameForExpectationEnabled = false
550+
5. check that the pod gets scheduled and bound successfully to the same node
551+
6. request scheduling another pod with expected long preBinding phase
552+
7. check that NNN does not get set in PreBind
553+
8. restart the scheduler with nominatedNodeNameForExpectationEnabled = true
554+
9. check that the pod gets scheduled and bound on any node
523555

524556
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
525557

0 commit comments

Comments
 (0)