WIP: deploy with DaemonSet by pohly · Pull Request #288 · kubernetes-csi/csi-driver-host-path

pohly · 2021-05-11T08:14:48Z

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

DaemonSet has the advantage that "kubectl drain --ignore-daemonsets"
keeps the driver pods running, which is required if normal pods on the
node are to be evicted. When using a StatefulSet, it can happen that
the driver pod gets evicted first and then the other pods cannot be
evicted because their volumes can no longer be unpublished and
unstaged.

The downside is that we still have to ensure that the driver only runs
on a single node. With DaemonSet, that is only possible by picking a
node in advance and using that node in a node selector. This node
selection tries to ensure that the pod can really run by checking
various conditions (node ready, network ready, no taints), but this is
less complete than the checks done by kube-scheduler (ignores load).

Does this PR introduce a user-facing change?:

The non-distributed deployments use a DaemonSet to mimick best practices for CSI drivers. The driver still only runs on a single node which has to be chosen in advance based on various criteria (node ready, network ready, no taints) by the deploy script.

k8s-ci-robot · 2021-05-11T08:14:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [pohly]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

DaemonSet has the advantage that "kubectl drain --ignore-daemonsets" keeps the driver pods running, which is required if normal pods on the node are to be evicted. When using a StatefulSet, it can happen that the driver pod gets evicted first and then the other pods cannot be evicted because their volumes can no longer be unpublished and unstaged. The downside is that we still have to ensure that the driver only runs on a single node. With DaemonSet, that is only possible by picking a node in advance and using that node in a node selector. This node selection tries to ensure that the pod can really run by checking various conditions (node ready, network ready, no taints), but this is less complete than the checks done by kube-scheduler (ignores load).

pohly · 2021-05-11T09:09:02Z

It's going to be more complex also in other places: the pod name is no longer deterministic, therefore sanity testing will have to be configured differently. The code which updates the test driver config must be updated.

I haven't tackled that yet...

k8s-ci-robot · 2021-05-11T09:34:11Z

@pohly: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kubernetes-csi-csi-driver-host-path-1-19-on-kubernetes-1-19	`481fc1a`	link	`/test pull-kubernetes-csi-csi-driver-host-path-1-19-on-kubernetes-1-19`
pull-kubernetes-csi-csi-driver-host-path-1-18-on-kubernetes-1-18	`481fc1a`	link	`/test pull-kubernetes-csi-csi-driver-host-path-1-18-on-kubernetes-1-18`
pull-kubernetes-csi-csi-driver-host-path-1-21-on-kubernetes-1-21	`481fc1a`	link	`/test pull-kubernetes-csi-csi-driver-host-path-1-21-on-kubernetes-1-21`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-triage-robot · 2021-08-09T09:40:23Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pohly · 2021-08-09T14:58:02Z

/remove-lifecycle stale

@jsafrane do you think using a "proper" DaemonSet is worth the extra complexity or can I close this PR?

Further work would be needed if we want to do this.

jsafrane · 2021-08-12T13:29:11Z

I briefly looked if we can create a Pod that would like a DaemonSet pod without all the shell overhead here. It's probably not possible. kubelet checks Pod owner and even GETs the DaemonSet:
https://github.com/kubernetes/kubectl/blob/4defba0cec1f594eb410c69bff05b51cddfba8ff/pkg/drain/filters.go#L174-L204

On the other hand, there are other conditions when a pod is skipped during drain:
https://github.com/kubernetes/kubectl/blob/4defba0cec1f594eb410c69bff05b51cddfba8ff/pkg/drain/filters.go#L154-L159

They all seems to be hard to achieve, perhaps our driver Pod could disguise as mirror pod, with a special annotation?
https://github.com/kubernetes/kubectl/blob/4defba0cec1f594eb410c69bff05b51cddfba8ff/pkg/drain/filters.go#L207

jsafrane · 2021-08-12T13:34:56Z

perhaps our driver Pod could disguise as mirror pod, with a special annotation?

No, that annotation has very specific handling, such a pod is deleted unless it's really a mirror pod.

jsafrane · 2021-08-12T16:29:15Z

Just brainstorming: would it be better to run a "probe" pod, see where it's scheduled and run the driver DaemonSet on the same node?
Alternatively, what is actually goal of the deploy script? All tests run their own driver in go code, without this script. Would it make sense to run the DaemonSet everywhere and use distributed provisioning here?

pohly · 2021-08-13T06:26:12Z

Just brainstorming: would it be better to run a "probe" pod, see where it's scheduled and run the driver DaemonSet on the same node?

That would be simpler. It's a bit slower (needs to wait for one pod to start), but that shouldn't matter much.

Alternatively, what is actually goal of the deploy script? All tests run their own driver in go code, without this script.

The hostpath tests in k/k do that. But all Kubernetes-CSI Prow jobs deploy the hostpath driver once per cluster using deploy.sh, then run the k/k e2e.test with an external driver config.

Would it make sense to run the DaemonSet everywhere and use distributed provisioning here?

That would massively increase the number of pods during parallel k/k tests. Resource consumption and overloading the cluster are already a problem.

k8s-ci-robot · 2021-09-06T16:42:24Z

@pohly: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2021-12-05T17:23:50Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-01-04T17:31:53Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-02-03T18:02:44Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-02-03T18:02:47Z

@k8s-triage-robot: Closed this PR.

Details

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

b12e407c Merge pull request kubernetes-csi#289 from nixpanic/k8s-v1.34 bbe5e547 Use Kubernetes v1.34 and Kind v0.30 by default 4e9eb2c9 Merge pull request kubernetes-csi#288 from gnufied/add-gnufied-for-csi-approver 064e260d Add myself as csi approver c852fa79 Merge pull request kubernetes-csi#287 from andyzhangx/patch-7 bce16c10 fix: upgrade to go1.24.11 to fix CVE-2025-61727 8d1258cc Merge pull request kubernetes-csi#286 from kubernetes-csi/dependabot/github_actions/actions/checkout-6 91e35981 Bump actions/checkout from 5 to 6 29413815 Merge pull request kubernetes-csi#285 from andyzhangx/patch-6 fa8b339e fix: upgrade to go1.24.9 to fix CVEs git-subtree-dir: release-tools git-subtree-split: b12e407cc9556acf6702ed8745d3f8a29c9169bb

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 11, 2021

k8s-ci-robot requested review from msau42 and saad-ali May 11, 2021 08:14

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 11, 2021

pohly changed the title ~~deploy with DaemonSet~~ WIP: deploy with DaemonSet May 11, 2021

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 11, 2021

pohly force-pushed the daemonset branch from 00477d2 to 481fc1a Compare May 11, 2021 09:02

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 9, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 9, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 6, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 4, 2022

k8s-ci-robot closed this Feb 3, 2022

darshansreenivas mentioned this pull request Jan 11, 2026

Release tool update #623

Merged

Conversation

pohly commented May 11, 2021

Uh oh!

k8s-ci-robot commented May 11, 2021

Uh oh!

pohly commented May 11, 2021

Uh oh!

k8s-ci-robot commented May 11, 2021

Uh oh!

k8s-triage-robot commented Aug 9, 2021

Uh oh!

pohly commented Aug 9, 2021

Uh oh!

jsafrane commented Aug 12, 2021

Uh oh!

jsafrane commented Aug 12, 2021

Uh oh!

jsafrane commented Aug 12, 2021

Uh oh!

pohly commented Aug 13, 2021

Uh oh!

k8s-ci-robot commented Sep 6, 2021

Uh oh!

k8s-triage-robot commented Dec 5, 2021

Uh oh!

k8s-triage-robot commented Jan 4, 2022

Uh oh!

k8s-triage-robot commented Feb 3, 2022

Uh oh!

k8s-ci-robot commented Feb 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants