WIP: deploy with DaemonSet#288
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
DaemonSet has the advantage that "kubectl drain --ignore-daemonsets" keeps the driver pods running, which is required if normal pods on the node are to be evicted. When using a StatefulSet, it can happen that the driver pod gets evicted first and then the other pods cannot be evicted because their volumes can no longer be unpublished and unstaged. The downside is that we still have to ensure that the driver only runs on a single node. With DaemonSet, that is only possible by picking a node in advance and using that node in a node selector. This node selection tries to ensure that the pod can really run by checking various conditions (node ready, network ready, no taints), but this is less complete than the checks done by kube-scheduler (ignores load).
|
It's going to be more complex also in other places: the pod name is no longer deterministic, therefore sanity testing will have to be configured differently. The code which updates the test driver config must be updated. I haven't tackled that yet... |
|
@pohly: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
/remove-lifecycle stale @jsafrane do you think using a "proper" DaemonSet is worth the extra complexity or can I close this PR? Further work would be needed if we want to do this. |
|
I briefly looked if we can create a Pod that would like a DaemonSet pod without all the shell overhead here. It's probably not possible. kubelet checks Pod owner and even GETs the DaemonSet: On the other hand, there are other conditions when a pod is skipped during drain: They all seems to be hard to achieve, perhaps our driver Pod could disguise as mirror pod, with a special annotation? |
No, that annotation has very specific handling, such a pod is deleted unless it's really a mirror pod. |
|
Just brainstorming: would it be better to run a "probe" pod, see where it's scheduled and run the driver DaemonSet on the same node? |
That would be simpler. It's a bit slower (needs to wait for one pod to start), but that shouldn't matter much.
The hostpath tests in k/k do that. But all Kubernetes-CSI Prow jobs deploy the hostpath driver once per cluster using deploy.sh, then run the k/k e2e.test with an external driver config.
That would massively increase the number of pods during parallel k/k tests. Resource consumption and overloading the cluster are already a problem. |
|
@pohly: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
|
@k8s-triage-robot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
b12e407c Merge pull request kubernetes-csi#289 from nixpanic/k8s-v1.34 bbe5e547 Use Kubernetes v1.34 and Kind v0.30 by default 4e9eb2c9 Merge pull request kubernetes-csi#288 from gnufied/add-gnufied-for-csi-approver 064e260d Add myself as csi approver c852fa79 Merge pull request kubernetes-csi#287 from andyzhangx/patch-7 bce16c10 fix: upgrade to go1.24.11 to fix CVE-2025-61727 8d1258cc Merge pull request kubernetes-csi#286 from kubernetes-csi/dependabot/github_actions/actions/checkout-6 91e35981 Bump actions/checkout from 5 to 6 29413815 Merge pull request kubernetes-csi#285 from andyzhangx/patch-6 fa8b339e fix: upgrade to go1.24.9 to fix CVEs git-subtree-dir: release-tools git-subtree-split: b12e407cc9556acf6702ed8745d3f8a29c9169bb
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
DaemonSet has the advantage that "kubectl drain --ignore-daemonsets"
keeps the driver pods running, which is required if normal pods on the
node are to be evicted. When using a StatefulSet, it can happen that
the driver pod gets evicted first and then the other pods cannot be
evicted because their volumes can no longer be unpublished and
unstaged.
The downside is that we still have to ensure that the driver only runs
on a single node. With DaemonSet, that is only possible by picking a
node in advance and using that node in a node selector. This node
selection tries to ensure that the pod can really run by checking
various conditions (node ready, network ready, no taints), but this is
less complete than the checks done by kube-scheduler (ignores load).
Does this PR introduce a user-facing change?: