Add support for VolumeClaimDeletePolicies for Elasticsearch clusters #4050

pebrc · 2020-12-16T22:12:24Z

First draft of a possible approach inspired by kubernetes/enhancements#1915

Adds an new volumeClaimDeletePolicy to the Elasticsearch Spec on the cluster level (not per NodeSet)

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: es
spec:
  version: 7.10.1
  volumeClaimDeletePolicy: DeleteOnScaledownAndClusterDeletion
  nodeSets:
  - name: default
    count: 2

Possible values are DeleteOnScaledownAndClusterDeletion (default), DeleteOnScaledownOnly.

RemoveOnScaledownAndClusterDeletion relies on an owner reference pointing to the Elasticsearch resource to garbage collect PVCs once the Elasticsearch cluster has been deleted (existing behaviour).

It also runs additional garbage collection to remove PVCs on each reconciliation that are no longer in use because either the whole node set has been removed or individual nodes have been scaled down (existing behaviour).

RemoveOnScaledownOnly means the PVCs are kept around after the cluster has been deleted. This is implemented by removing the owner reference. Removal of PVCs on scale down happens as before.

Switching from one to the other strategy is allowed and is implemented by avoiding the StatefulSet templating mechanism. This is mainly because the PVC template in StatefulSets are considered immutable and it would require StatefulSets to be recreated in order to change the PVC ownership. Instead the operator edits the PVCs after they have been created by the StatefulSet controller.

Why not a per NodeSet setting? I initially started with a setting per NodeSet, but it becomes tricky when users remove a whole node set as we then have no trace of the chosen policy anymore. We would need to add the policy choice to the PVC as an annotation or persist the policy choice in some other place so that we can prevent garbage collection if so desired by the user.

Fixes #2328

pkg/apis/elasticsearch/v1/elasticsearch_types.go

pkg/controller/elasticsearch/validation/claim_validation.go

pkg/controller/elasticsearch/nodespec/statefulset.go

pebrc · 2021-02-24T11:30:32Z

@sebgl I think this ready for review, with one exception: I am thinking about adding an e2e test following roughly this playbook:

create ES cluster with retain policy, wait for it to healthy etc as per our standard test steps
add some data
delete the cluster
recreate the cluster
check if data still present
change policy to Remove*
delete cluster
verify pvcs are deleted (again a standard e2e test step we already have)

sebgl

There's one thing I don't understand well. One imporant use case for this feature as I understood it is to be able to do this combination:

delete pvcs on scale down AND do not delete pvcs on cluster deletion

As in: if I scale my cluster down I don't care about old pvcs of old nodes (likely no data left in there anyway since we migrate it away in the downscale process), but if I remove the Elasticsearch resource entirely I want to be able to restore it as it was later.

Can I achieve this with one of the 3 options?

config/crds/all-crds.yaml

pkg/apis/elasticsearch/v1/elasticsearch_types.go

pkg/controller/elasticsearch/driver/pvc_owner.go

sebgl · 2021-02-25T09:17:38Z

pkg/controller/elasticsearch/driver/pvc_owner_test.go

+	}
+
+	updated := func(pvc corev1.PersistentVolumeClaim) corev1.PersistentVolumeClaim {
+		pvc.ResourceVersion = "1000" // fake client starts at 999


We have comparison.Equal(a, b) (pkg/controller/common/comparison) to compare while ignoring ResourceVersion. The content of that file should maybe be moved to pkg/utils/compare with more explicit function names 🤔

I forgot that we had that. It works on single RuntimeObjects though so we need a version that takes slices.

should we add it there then, so we don't keep that special resource version logic isolated in this test only?

I think its usefulness would be somewhat limited by the fact that most (all?) k8s API objects implement runtime.Object with pointer receivers, so callers will typically have to take pointers of the elements of the slice.

But I removed to the custom ResourceVersion code and reused comparison.AssertEqual

pkg/controller/elasticsearch/nodespec/statefulset.go

pkg/controller/elasticsearch/nodespec/statefulset_test.go

pkg/controller/elasticsearch/nodespec/statefulset.go

pebrc · 2021-02-25T10:25:26Z

delete pvcs on scale down AND do not delete pvcs on cluster deletion

No that's not possible with this implementation. The closest you get is using Retain which means that the more or less empty PVCs from nodes you scaled down are retained and if and when the cluster is recreated and scaled up again to its previous size they would rejoin.

I tried to keep the behaviour aligned with the k8s KEP but if we think the use case you mentioned is a strong enough we could add another option.

sebgl · 2021-02-25T11:05:19Z

I left a comment on kubernetes/enhancements#2440 to advocate for the "preserve on deletion, remove on downscale" use case.

pebrc · 2021-02-25T12:32:59Z

I think our naming is different enough from the KEP to allow us deviate also in behaviour. It seems that for Elasticsearch there is indeed little benefit in keeping the volume around on scale down. If anything this might cause trouble when rejoining if > 500 index deletions have happened while the node was offline. (Thanks to @\DaveCTurner for pointing this out). The indices will be considered dangling (index metadata on the old node is enough for that, so master eligible nodes are affected by that).

We could drop the RemoveOnScaleDown option and rename Retain to RetainOnClusterDeletion and change its behaviour to only affect cluster deletion and not scale downs.

This will make it a little bit harder to integrate with the KEP feature once it becomes available. RetainOnClusterDeletion would require us to keep our GC approach in place. This is because in the KEP apps.DeleteOnStatefulSetScaledown implies apps.DeleteOnStatefulSetDeletion and sets a owner reference on the StatefulSet. For our RemoveOnClusterDeletion we could use the KEP feature. However as we will want to be backwards compatible with existing versions of k8s we will have to keep the existing mechanism in place for the foreseeable future anyway.

…letion

pebrc · 2021-03-01T16:33:16Z

Makefile

@@ -460,6 +460,7 @@ e2e-local: LOCAL_E2E_CTX := /tmp/e2e-local.json
 e2e-local:
 	@go run test/e2e/cmd/main.go run \
 		--test-run-name=e2e \
+		--operator-image=$(OPERATOR_IMAGE) \


sebgl

I think the API is a bit confusing.
Using RetainOnClusterDeletion instead of RemoveOnScaleDown somewhat suggests we don't remove on scale down, but IIUC we do?
It's strange to offer RemoveOnScaleDown while other options are actually a superset of it (rather than something different).

I'm wondering if we should move away further from the StatefulSet KEP, and propose something a bit simpler that fits our needs:

spec:
    # retainVolumeClaims specifies whether ECK should retain PersistentVolumeClaims when the Elasticsearch resource is removed, allowing to recreate a cluster from existing volumes. If false, ECK deletes PersistentVolumeClaims upon Elasticsearch resource deletion. Defaults to false.
    retainVolumeClaimsOnDeletion: false

pebrc · 2021-03-02T09:40:24Z

I'm wondering if we should move away further from the StatefulSet KEP, and propose something a bit simpler that fits our needs:

I think the only argument against it is that a boolean flag is closed against extension vs the enum style attribute which is open to extension by simple addition of values. Now we could say we simply do not see another use case that would require us to add another policy that encodes different behaviour, ever.

thbkrkr · 2021-03-02T14:00:29Z

Seb's proposal to have "PVCs automatically deleted on StatefulSet scaledown, but not on StatefulSet deletion" led to the update of the KEP (that has already received 2 lgtm).
Possible values for PersistentVolumeClaimDeletePolicy are now:

Retain
DeleteOnStatefulSetDeletionOnly
DeleteOnScaledownOnly
DeleteOnScaledownAndStatefulSetDeletion

I find these names rather clear and it covers all the usecases.
What do you think about following this to facilitate the integration with the KEP once it becomes available?

pebrc · 2021-03-02T14:04:15Z

@thbkrkr yes, I talked to @sebgl out of band and we agreed on following the KEP naming. Specifically we want to support two values in the initial implementation:

DeleteOnScaleDownOnly
DeleteOnScaledownAndClusterDeletion

I will update this PR accordingly.

…ledownOnly

thbkrkr · 2021-03-02T14:35:20Z

pkg/apis/elasticsearch/v1/elasticsearch_types.go

+type VolumeClaimDeletePolicy string
+
+const (
+	// DeleteOnScaledownAndClusterDeletionPolicy remove PersistentVolumeClaims when the corresponding Elasticsearch node is removed.


Suggested change

// DeleteOnScaledownAndClusterDeletionPolicy remove PersistentVolumeClaims when the corresponding Elasticsearch node is removed.

// DeleteOnScaledownAndClusterDeletionPolicy removes PersistentVolumeClaims when the corresponding Elasticsearch node is removed.

sebgl

I did some manual testing (including backward compatibility) and it all seems to work as expected 👍

I left a few nits, also let's not forget about user-facing docs on the website. Other than that, LGTM :)

sebgl · 2021-03-02T15:39:49Z

pkg/controller/elasticsearch/driver/pvc_owner_test.go

+	}
+
+	updated := func(pvc corev1.PersistentVolumeClaim) corev1.PersistentVolumeClaim {
+		pvc.ResourceVersion = "1000" // fake client starts at 999


should we add it there then, so we don't keep that special resource version logic isolated in this test only?

pkg/controller/elasticsearch/driver/pvc_owner_test.go

pkg/controller/elasticsearch/nodespec/statefulset.go

pkg/controller/elasticsearch/driver/nodes.go

pebrc · 2021-03-02T16:57:26Z

I left a few nits, also let's not forget about user-facing docs on the website.

I raised #4287 as a reminder.

Add support for VolumeClaimDeletePolicy for Elasticsearch

d9b3b7a

botelastic bot added the triage label Dec 16, 2020

pebrc added the >enhancement Enhancement of existing functionality label Dec 17, 2020

botelastic bot removed the triage label Dec 17, 2020

run generate

deaf34b

sebgl reviewed Dec 18, 2020

View reviewed changes

pebrc added 5 commits February 16, 2021 11:12

Merge remote-tracking branch 'upstream/master' into pvc-policy

01eaa4d

Fix merge errors, validate against pvc and add tests

9673afd

Run generate

51112de

Add missing unit test

f55fcee

Merge remote-tracking branch 'upstream/master' into pvc-policy

22bc727

pebrc marked this pull request as ready for review February 17, 2021 16:36

pebrc added 7 commits February 17, 2021 21:07

Not entirely sure what is going on here

d3e9be9

Do not use stateful set to set pvc owner refs

29bc4e8

make linter happy

3f64a50

fix up unit tests

c504306

improve godoc

05c49b2

Merge remote-tracking branch 'upstream/master' into pvc-policy

345bdc7

Regenerate api-docs

452fc24

Add e2e test for volume retention

af85359

sebgl requested changes Feb 25, 2021

View reviewed changes

pebrc added 4 commits February 26, 2021 11:04

review input

8c60e17

Remove RemoveOnClusterDeletion and change Retain to RetainOnClusterDe…

9fdba9c

…letion

run generate

5cc80e7

Merge remote-tracking branch 'upstream/master' into pvc-policy

0f79677

pebrc requested a review from sebgl March 1, 2021 14:59

pebrc commented Mar 1, 2021

View reviewed changes

sebgl requested changes Mar 1, 2021

View reviewed changes

charith-elastic mentioned this pull request Mar 2, 2021

make e2e-local is broken #4285

Closed

thbkrkr added the v1.5.0 label Mar 2, 2021

pebrc added 2 commits March 2, 2021 15:25

Rename options to DeleteOnScaledownAndClusterDeletion and DeleteOnSca…

02f5d18

…ledownOnly

regenerate api docs

45c993a

pebrc requested a review from sebgl March 2, 2021 15:21

thbkrkr reviewed Mar 2, 2021

View reviewed changes

sebgl approved these changes Mar 2, 2021

View reviewed changes

pebrc added 2 commits March 2, 2021 17:20

review input: naming and comments

dac0af1

use comparison.AssertEqual

47d5e5c

pebrc mentioned this pull request Mar 2, 2021

Document VolumeClaimDeletePolicy #4287

Closed

pebrc merged commit 6bb3dc7 into elastic:master Mar 2, 2021

lapwingcloud mentioned this pull request Nov 19, 2021

Expand the size of elasticsearch disk managed by elastic operator in kubernetes lapwingcloud/archive#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for VolumeClaimDeletePolicies for Elasticsearch clusters #4050

Add support for VolumeClaimDeletePolicies for Elasticsearch clusters #4050

pebrc commented Dec 16, 2020 •

edited

Loading

pebrc commented Feb 24, 2021 •

edited

Loading

sebgl left a comment •

edited

Loading

sebgl Feb 25, 2021

pebrc Feb 25, 2021

sebgl Mar 2, 2021

pebrc Mar 2, 2021

pebrc commented Feb 25, 2021

sebgl commented Feb 25, 2021

pebrc commented Feb 25, 2021 •

edited

Loading

pebrc Mar 1, 2021

sebgl left a comment •

edited

Loading

pebrc commented Mar 2, 2021

thbkrkr commented Mar 2, 2021

pebrc commented Mar 2, 2021 •

edited

Loading

thbkrkr Mar 2, 2021

sebgl left a comment

sebgl Mar 2, 2021

pebrc commented Mar 2, 2021

	// DeleteOnScaledownAndClusterDeletionPolicy remove PersistentVolumeClaims when the corresponding Elasticsearch node is removed.
	// DeleteOnScaledownAndClusterDeletionPolicy removes PersistentVolumeClaims when the corresponding Elasticsearch node is removed.

Add support for VolumeClaimDeletePolicies for Elasticsearch clusters #4050

Add support for VolumeClaimDeletePolicies for Elasticsearch clusters #4050

Conversation

pebrc commented Dec 16, 2020 • edited Loading

pebrc commented Feb 24, 2021 • edited Loading

sebgl left a comment • edited Loading

Choose a reason for hiding this comment

sebgl Feb 25, 2021

Choose a reason for hiding this comment

pebrc Feb 25, 2021

Choose a reason for hiding this comment

sebgl Mar 2, 2021

Choose a reason for hiding this comment

pebrc Mar 2, 2021

Choose a reason for hiding this comment

pebrc commented Feb 25, 2021

sebgl commented Feb 25, 2021

pebrc commented Feb 25, 2021 • edited Loading

pebrc Mar 1, 2021

Choose a reason for hiding this comment

sebgl left a comment • edited Loading

Choose a reason for hiding this comment

pebrc commented Mar 2, 2021

thbkrkr commented Mar 2, 2021

pebrc commented Mar 2, 2021 • edited Loading

thbkrkr Mar 2, 2021

Choose a reason for hiding this comment

sebgl left a comment

Choose a reason for hiding this comment

sebgl Mar 2, 2021

Choose a reason for hiding this comment

pebrc commented Mar 2, 2021

pebrc commented Dec 16, 2020 •

edited

Loading

pebrc commented Feb 24, 2021 •

edited

Loading

sebgl left a comment •

edited

Loading

pebrc commented Feb 25, 2021 •

edited

Loading

sebgl left a comment •

edited

Loading

pebrc commented Mar 2, 2021 •

edited

Loading