Skip to content

WIP: deploy with DaemonSet#288

Closed
pohly wants to merge 1 commit intokubernetes-csi:masterfrom
pohly:daemonset
Closed

WIP: deploy with DaemonSet#288
pohly wants to merge 1 commit intokubernetes-csi:masterfrom
pohly:daemonset

Conversation

@pohly
Copy link
Copy Markdown
Contributor

@pohly pohly commented May 11, 2021

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

DaemonSet has the advantage that "kubectl drain --ignore-daemonsets"
keeps the driver pods running, which is required if normal pods on the
node are to be evicted. When using a StatefulSet, it can happen that
the driver pod gets evicted first and then the other pods cannot be
evicted because their volumes can no longer be unpublished and
unstaged.

The downside is that we still have to ensure that the driver only runs
on a single node. With DaemonSet, that is only possible by picking a
node in advance and using that node in a node selector. This node
selection tries to ensure that the pod can really run by checking
various conditions (node ready, network ready, no taints), but this is
less complete than the checks done by kube-scheduler (ignores load).

Does this PR introduce a user-facing change?:

The non-distributed deployments use a DaemonSet to mimick best practices for CSI drivers. The driver still only runs on a single node which has to be chosen in advance based on various criteria (node ready, network ready, no taints) by the deploy script.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 11, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from msau42 and saad-ali May 11, 2021 08:14
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 11, 2021
@pohly pohly changed the title deploy with DaemonSet WIP: deploy with DaemonSet May 11, 2021
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 11, 2021
DaemonSet has the advantage that "kubectl drain --ignore-daemonsets"
keeps the driver pods running, which is required if normal pods on the
node are to be evicted. When using a StatefulSet, it can happen that
the driver pod gets evicted first and then the other pods cannot be
evicted because their volumes can no longer be unpublished and
unstaged.

The downside is that we still have to ensure that the driver only runs
on a single node. With DaemonSet, that is only possible by picking a
node in advance and using that node in a node selector. This node
selection tries to ensure that the pod can really run by checking
various conditions (node ready, network ready, no taints), but this is
less complete than the checks done by kube-scheduler (ignores load).
@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented May 11, 2021

It's going to be more complex also in other places: the pod name is no longer deterministic, therefore sanity testing will have to be configured differently. The code which updates the test driver config must be updated.

I haven't tackled that yet...

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@pohly: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-csi-csi-driver-host-path-1-19-on-kubernetes-1-19 481fc1a link /test pull-kubernetes-csi-csi-driver-host-path-1-19-on-kubernetes-1-19
pull-kubernetes-csi-csi-driver-host-path-1-18-on-kubernetes-1-18 481fc1a link /test pull-kubernetes-csi-csi-driver-host-path-1-18-on-kubernetes-1-18
pull-kubernetes-csi-csi-driver-host-path-1-21-on-kubernetes-1-21 481fc1a link /test pull-kubernetes-csi-csi-driver-host-path-1-21-on-kubernetes-1-21

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-triage-robot
Copy link
Copy Markdown

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 9, 2021
@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented Aug 9, 2021

/remove-lifecycle stale

@jsafrane do you think using a "proper" DaemonSet is worth the extra complexity or can I close this PR?

Further work would be needed if we want to do this.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 9, 2021
@jsafrane
Copy link
Copy Markdown
Contributor

I briefly looked if we can create a Pod that would like a DaemonSet pod without all the shell overhead here. It's probably not possible. kubelet checks Pod owner and even GETs the DaemonSet:
https://github.com/kubernetes/kubectl/blob/4defba0cec1f594eb410c69bff05b51cddfba8ff/pkg/drain/filters.go#L174-L204

On the other hand, there are other conditions when a pod is skipped during drain:
https://github.com/kubernetes/kubectl/blob/4defba0cec1f594eb410c69bff05b51cddfba8ff/pkg/drain/filters.go#L154-L159

They all seems to be hard to achieve, perhaps our driver Pod could disguise as mirror pod, with a special annotation?
https://github.com/kubernetes/kubectl/blob/4defba0cec1f594eb410c69bff05b51cddfba8ff/pkg/drain/filters.go#L207

@jsafrane
Copy link
Copy Markdown
Contributor

perhaps our driver Pod could disguise as mirror pod, with a special annotation?

No, that annotation has very specific handling, such a pod is deleted unless it's really a mirror pod.

@jsafrane
Copy link
Copy Markdown
Contributor

Just brainstorming: would it be better to run a "probe" pod, see where it's scheduled and run the driver DaemonSet on the same node?
Alternatively, what is actually goal of the deploy script? All tests run their own driver in go code, without this script. Would it make sense to run the DaemonSet everywhere and use distributed provisioning here?

@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented Aug 13, 2021

Just brainstorming: would it be better to run a "probe" pod, see where it's scheduled and run the driver DaemonSet on the same node?

That would be simpler. It's a bit slower (needs to wait for one pod to start), but that shouldn't matter much.

Alternatively, what is actually goal of the deploy script? All tests run their own driver in go code, without this script.

The hostpath tests in k/k do that. But all Kubernetes-CSI Prow jobs deploy the hostpath driver once per cluster using deploy.sh, then run the k/k e2e.test with an external driver config.

Would it make sense to run the DaemonSet everywhere and use distributed provisioning here?

That would massively increase the number of pods during parallel k/k tests. Resource consumption and overloading the cluster are already a problem.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@pohly: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 6, 2021
@k8s-triage-robot
Copy link
Copy Markdown

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2021
@k8s-triage-robot
Copy link
Copy Markdown

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 4, 2022
@k8s-triage-robot
Copy link
Copy Markdown

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@k8s-triage-robot: Closed this PR.

Details

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

darshansreenivas added a commit to darshansreenivas/csi-driver-host-path that referenced this pull request Jan 11, 2026
b12e407c Merge pull request kubernetes-csi#289 from nixpanic/k8s-v1.34
bbe5e547 Use Kubernetes v1.34 and Kind v0.30 by default
4e9eb2c9 Merge pull request kubernetes-csi#288 from gnufied/add-gnufied-for-csi-approver
064e260d Add myself as csi approver
c852fa79 Merge pull request kubernetes-csi#287 from andyzhangx/patch-7
bce16c10 fix: upgrade to go1.24.11 to fix CVE-2025-61727
8d1258cc Merge pull request kubernetes-csi#286 from kubernetes-csi/dependabot/github_actions/actions/checkout-6
91e35981 Bump actions/checkout from 5 to 6
29413815 Merge pull request kubernetes-csi#285 from andyzhangx/patch-6
fa8b339e fix: upgrade to go1.24.9 to fix CVEs

git-subtree-dir: release-tools
git-subtree-split: b12e407cc9556acf6702ed8745d3f8a29c9169bb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants