Minimizing need to delete provisioned volume #65100

sbezverk · 2018-06-14T13:39:46Z

Current behaviour of PV controller, if by some reason PV object cannot be saved by API server is to delete just provisioned volume. It is not desired behaviour because provision/de-provision operations could be long for some storage backends. This PR changes this behaviour and proposes to store successfully provisioned volume information as a annotation in PVC object. On PV object save failure, the controller will re-attempt to bind PVC to PV, but it will not need to provision a volume since it has already been done. It needs just to retrieve PV definition from PVC's annotation. On any error, PV controller will switch to old logic.

Signed-off-by: Serguei Bezverkhi [email protected]

NONE

Signed-off-by: Serguei Bezverkhi <[email protected]>

k8s-ci-robot · 2018-06-14T13:40:01Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sbezverk
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: saad-ali

Assign the PR to them by writing /assign @saad-ali in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

pkg/controller/volume/persistentvolume/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sbezverk · 2018-06-14T13:40:12Z

/sig storage
/assign @saad-ali

k8s-ci-robot · 2018-06-14T13:59:23Z

@sbezverk: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-kubernetes-bazel-test	`1fa644e`	link	`/test pull-kubernetes-bazel-test`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

jsafrane · 2018-06-14T16:44:00Z

There are several issues in this PR:

PV contains sensitive information, like ID of the volume, that can't be edited by regular users that can create PVC. With this pr, rogue user could craft a PVC with annStoredVolumeData with ID of some else's volume and steal data.
If saving PV fails due to API server throttling, it's very likely that patching PVC will fail too.

If we have troubles with in-tree volumes (and I haven't noticed any), I would propose to add exponential backoff to writing PV, now it tries 5x with 10 sec sleep in between.

sbezverk · 2018-06-14T17:13:13Z

@jsafrane Would it be acceptable solution if I find a way to to encrypt annStoredVolumeData?

wgliang · 2018-06-15T02:48:21Z

pkg/controller/volume/persistentvolume/pv_controller.go

@@ -22,6 +22,8 @@ import (
 	"strings"
 	"time"



There should be no empty line.

wgliang · 2018-06-15T02:52:10Z

pkg/controller/volume/persistentvolume/pv_controller.go

-		ctrl.eventRecorder.Event(claim, v1.EventTypeWarning, events.ProvisioningFailed, strerr)
-		return
+	volRecovered := false
+	if claim.ObjectMeta.Annotations[annVolumeAlreadyProvisioned] == "yes" {


if annotation,ok :=claim.ObjectMeta.Annotations[annVolumeAlreadyProvisioned]; ok && annotation== "yes" {

A little suggestion. :=)

jsafrane · 2018-06-15T08:40:27Z

Would it be acceptable solution if I find a way to to encrypt annStoredVolumeData?

It looks extremely ugly to me. You bring security somewhere it should not be. It opens whole new can on worms, e.g. you need a way how to prevent reply attacks on different PVC.

IMO, internal provisioners are quite fine as they are now. And the external ones have wide variety of ways how to fix themselves, starting from increasing the timeout or by using CRDs in their own namespace.

Note that deleting unwanted PVs is quite complex. Current code prefers deleting volumes to save space and keep the volumes in storage backend in sync with PVs. This PR may leave orphan volumes in the storage backend without PVs for them in case user deletes PVC between provisioning retries. Both ways have its pros and cons.

sbezverk · 2018-06-15T13:19:34Z

@jsafrane Thanks for the comments. One question though, should the external provisioner be in sync with the logic of the in-tree pv-controller? If they behave differently (with suggested changes to only external provisioner), it might result different experience for a user. I am not sure if it was a goal to provide seamless user experience whether they use in-tree or out of tree controllers.

liggitt · 2018-06-15T13:28:45Z

@jsafrane Would it be acceptable solution if I find a way to to encrypt annStoredVolumeData?

I don't think putting data on the PVC is a good idea.

a couple more fundamental questions:

shouldn't CreateVolume be idempotent? if the controller crashes after calling CreateVolume but before creating the PV, we shouldn't end up with orphaned volumes.
can the controller pre-create the PV in Pending state, then do the CSI provision call, then update the PV with capacity/attribute data from the results of the CreateVolume call and update the PV state?

sbezverk · 2018-06-15T13:56:33Z

@liggitt It should not, (really depends on CSI driver implementation) but even for hostpath driver we do check if request comes for already existing volume and if it is the case no new volume gets created and in addition external provisioner will generate the same volume name in CreateVolume request for the same PVC.
WRT point 2. I am building PoC to see how it behaves in large scale env.

jsafrane · 2018-06-18T08:27:37Z

can the controller pre-create the PV in Pending state, then do the CSI provision call, then update the PV with capacity/attribute data from the results of the CreateVolume call and update the PV state?

We did that in the first release of alpha dynamic provisioning and it has proven to be unreliable and error prone. For example, this PV will get bound to a PVC and scheduler will schedule pods that use this PVC, assuming it has complete topology labels (which it does not have) and A/D controller will try to attach it or kubelet will try to mount it.

Sure, we could extend PV controller not to bind to such PVs, but it IMO breaks API.

jsafrane · 2018-06-18T08:40:28Z

Another problem we had: such incomplete PVs go through admission plugin. We have a plugin that fills topology labels and it got confused by PVs with no real volumes behind. We can fix that easily, however, it again shows that it's API change and users would need to change their admission handlers too.

liggitt · 2018-06-18T13:19:24Z

does requiring CreateVolume to be idempotent resolve the need to delete the created volume if writing the PV encounters an error and the provision needs to be requeued?

msau42 · 2018-06-18T17:25:55Z

@liggitt makes a good point. If CreateVolume is idempotent then we shouldn't need to delete the volume if PV creation fails. I'm not sure if in-tree Provision() is idempotent, but at least CSI should be.

jsafrane · 2018-06-19T12:02:28Z

In-tree provisioner calls Delete() because there is no PV for the volume yet:

User creates a PVC
Driver creates a volume
Provisioner fails to save the PV
[Provisioner deletes the volume]
User deletes the PVC.

After 5., no provisioning is called, because there is nothing to provision - Kubernetes does not need the PV at this time. If Kubernetes did not delete the volume at 4., the volume would be never deleted.

liggitt · 2018-06-19T15:40:57Z

In-tree provisioner calls Delete() because there is no PV for the volume yet:

User creates a PVC

Driver creates a volume

Provisioner fails to save the PV

[Provisioner deletes the volume]

User deletes the PVC.

After 5., no provisioning is called, because there is nothing to provision - Kubernetes does not need the PV at this time. If Kubernetes did not delete the volume at 4., the volume would be never deleted.

I expected something like this:

when 3 fails, the provisioner should requeue the pvc to be processed
the workqueue would have sufficient information to locate the pvc, the pv, and the created volume
when syncing, if both the pv and pvc no longer exist, the provisioner would delete the created volume (tolerating a "not found" error)

msau42 · 2018-06-19T17:31:47Z

To handle the case where both PVC and PV are missing, the provisioner would need to keep an in-memory cache of created volumes, which would be lost on restarts.

liggitt · 2018-06-19T17:36:18Z

To handle the case where both PVC and PV are missing, the provisioner would need to keep an in-memory cache of created volumes, which would be lost on restarts.

Then we're back to persisting local state prior to calling CreateVolume. The PV object seems the most coherent object to do that on.

jsafrane · 2018-06-20T10:56:50Z

Then we're back to persisting local state prior to calling CreateVolume. The PV object seems the most coherent object to do that on.

And we're back at API breakage. Until now, PVs were only fully provisioned volumes, ready for binding and scheduling. With PVs for not fully provisioned volumes we need to change at least PV controller (not to bind PVC until provisioning is complete), scheduler (to wait for the PV to be fully provisioned and get topology labels in case someone force-binds PVC) and kubelet (to do the same when pod is scheduled directly, e.g. by DaemonSet). There is unknown number of external components that may need this change too, and IMO this counts as API breakage.

sbezverk · 2018-06-20T11:51:40Z

@jsafrane imho forcing storage backend to create/delete/create/delete volumes as a result of API server related issues is not right, each subsystem should deal with its issues internally, minimizing exposure to other subsystems. If changing API, not sure why it is consider breaking, makes api subsystem more robust, I think it should be explored.

wongma7 · 2018-06-21T17:06:06Z

Yeah this is not worth breaking pv api over, it is complicated enough, adding another phase before Available is imo out of the question.

I believe the motivation for this pr is ultimately kubernetes-csi/external-provisioner#68. This supposed issue of PVs failing to save and wasting storage backend create API calls is just a symptom of the true problem kubernetes-csi/external-provisioner#68 causing the API server throttling in the first place. Otherwise we have 0 evidence that a PV failing to save 5 times in a row is a common enough occurrence that we need to change this code.

k8s-ci-robot · 2018-07-20T22:29:05Z

@sbezverk: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vladimirvivien · 2018-07-24T19:55:31Z

@wongma7 should this be closed ?

wongma7 · 2018-07-24T20:02:58Z

@vladimirvivien yes

fejta-bot · 2018-10-22T20:30:05Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-11-21T21:17:05Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2018-12-21T22:00:26Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2018-12-21T22:00:33Z

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Minimizing need to delete provisioned volume

1fa644e

Signed-off-by: Serguei Bezverkhi <[email protected]>

k8s-ci-robot requested review from msau42 and saad-ali June 14, 2018 13:39

k8s-ci-robot assigned saad-ali Jun 14, 2018

sbezverk mentioned this pull request Jun 14, 2018

external-provisioner should NOT issue delete calls between retries. kubernetes-csi/external-provisioner#94

Closed

wgliang reviewed Jun 15, 2018

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 22, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 21, 2018

k8s-ci-robot closed this Dec 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimizing need to delete provisioned volume #65100

Minimizing need to delete provisioned volume #65100

sbezverk commented Jun 14, 2018 •

edited

Loading

k8s-ci-robot commented Jun 14, 2018

sbezverk commented Jun 14, 2018

k8s-ci-robot commented Jun 14, 2018 •

edited

Loading

jsafrane commented Jun 14, 2018

sbezverk commented Jun 14, 2018

wgliang Jun 15, 2018

wgliang Jun 15, 2018

jsafrane commented Jun 15, 2018

sbezverk commented Jun 15, 2018

liggitt commented Jun 15, 2018 •

edited

Loading

sbezverk commented Jun 15, 2018

jsafrane commented Jun 18, 2018

jsafrane commented Jun 18, 2018

liggitt commented Jun 18, 2018 •

edited

Loading

msau42 commented Jun 18, 2018

jsafrane commented Jun 19, 2018

liggitt commented Jun 19, 2018

msau42 commented Jun 19, 2018

liggitt commented Jun 19, 2018

jsafrane commented Jun 20, 2018

sbezverk commented Jun 20, 2018

wongma7 commented Jun 21, 2018

k8s-ci-robot commented Jul 20, 2018

vladimirvivien commented Jul 24, 2018

wongma7 commented Jul 24, 2018

fejta-bot commented Oct 22, 2018

fejta-bot commented Nov 21, 2018

fejta-bot commented Dec 21, 2018

k8s-ci-robot commented Dec 21, 2018

Minimizing need to delete provisioned volume #65100

Minimizing need to delete provisioned volume #65100

Conversation

sbezverk commented Jun 14, 2018 • edited Loading

k8s-ci-robot commented Jun 14, 2018

sbezverk commented Jun 14, 2018

k8s-ci-robot commented Jun 14, 2018 • edited Loading

jsafrane commented Jun 14, 2018

sbezverk commented Jun 14, 2018

wgliang Jun 15, 2018

Choose a reason for hiding this comment

wgliang Jun 15, 2018

Choose a reason for hiding this comment

jsafrane commented Jun 15, 2018

sbezverk commented Jun 15, 2018

liggitt commented Jun 15, 2018 • edited Loading

sbezverk commented Jun 15, 2018

jsafrane commented Jun 18, 2018

jsafrane commented Jun 18, 2018

liggitt commented Jun 18, 2018 • edited Loading

msau42 commented Jun 18, 2018

jsafrane commented Jun 19, 2018

liggitt commented Jun 19, 2018

msau42 commented Jun 19, 2018

liggitt commented Jun 19, 2018

jsafrane commented Jun 20, 2018

sbezverk commented Jun 20, 2018

wongma7 commented Jun 21, 2018

k8s-ci-robot commented Jul 20, 2018

vladimirvivien commented Jul 24, 2018

wongma7 commented Jul 24, 2018

fejta-bot commented Oct 22, 2018

fejta-bot commented Nov 21, 2018

fejta-bot commented Dec 21, 2018

k8s-ci-robot commented Dec 21, 2018

sbezverk commented Jun 14, 2018 •

edited

Loading

k8s-ci-robot commented Jun 14, 2018 •

edited

Loading

liggitt commented Jun 15, 2018 •

edited

Loading

liggitt commented Jun 18, 2018 •

edited

Loading