Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for VolumeClaimDeletePolicies for Elasticsearch clusters #4050

Merged
merged 23 commits into from
Mar 2, 2021

Conversation

pebrc
Copy link
Collaborator

@pebrc pebrc commented Dec 16, 2020

First draft of a possible approach inspired by kubernetes/enhancements#1915

Adds an new volumeClaimDeletePolicy to the Elasticsearch Spec on the cluster level (not per NodeSet)

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: es
spec:
  version: 7.10.1
  volumeClaimDeletePolicy: DeleteOnScaledownAndClusterDeletion
  nodeSets:
  - name: default
    count: 2

Possible values are DeleteOnScaledownAndClusterDeletion (default), DeleteOnScaledownOnly.

RemoveOnScaledownAndClusterDeletion relies on an owner reference pointing to the Elasticsearch resource to garbage collect PVCs once the Elasticsearch cluster has been deleted (existing behaviour).

It also runs additional garbage collection to remove PVCs on each reconciliation that are no longer in use because either the whole node set has been removed or individual nodes have been scaled down (existing behaviour).

RemoveOnScaledownOnly means the PVCs are kept around after the cluster has been deleted. This is implemented by removing the owner reference. Removal of PVCs on scale down happens as before.

Switching from one to the other strategy is allowed and is implemented by avoiding the StatefulSet templating mechanism. This is mainly because the PVC template in StatefulSets are considered immutable and it would require StatefulSets to be recreated in order to change the PVC ownership. Instead the operator edits the PVCs after they have been created by the StatefulSet controller.

Why not a per NodeSet setting? I initially started with a setting per NodeSet, but it becomes tricky when users remove a whole node set as we then have no trace of the chosen policy anymore. We would need to add the policy choice to the PVC as an annotation or persist the policy choice in some other place so that we can prevent garbage collection if so desired by the user.

Fixes #2328

@botelastic botelastic bot added the triage label Dec 16, 2020
@pebrc pebrc added the >enhancement Enhancement of existing functionality label Dec 17, 2020
@botelastic botelastic bot removed the triage label Dec 17, 2020
@pebrc pebrc marked this pull request as ready for review February 17, 2021 16:36
@pebrc
Copy link
Collaborator Author

pebrc commented Feb 24, 2021

@sebgl I think this ready for review, with one exception: I am thinking about adding an e2e test following roughly this playbook:

  1. create ES cluster with retain policy, wait for it to healthy etc as per our standard test steps
  2. add some data
  3. delete the cluster
  4. recreate the cluster
  5. check if data still present
  6. change policy to Remove*
  7. delete cluster
  8. verify pvcs are deleted (again a standard e2e test step we already have)

Copy link
Contributor

@sebgl sebgl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one thing I don't understand well. One imporant use case for this feature as I understood it is to be able to do this combination:

  • delete pvcs on scale down AND do not delete pvcs on cluster deletion

As in: if I scale my cluster down I don't care about old pvcs of old nodes (likely no data left in there anyway since we migrate it away in the downscale process), but if I remove the Elasticsearch resource entirely I want to be able to restore it as it was later.

Can I achieve this with one of the 3 options?

config/crds/all-crds.yaml Outdated Show resolved Hide resolved
pkg/apis/elasticsearch/v1/elasticsearch_types.go Outdated Show resolved Hide resolved
}

updated := func(pvc corev1.PersistentVolumeClaim) corev1.PersistentVolumeClaim {
pvc.ResourceVersion = "1000" // fake client starts at 999
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have comparison.Equal(a, b) (pkg/controller/common/comparison) to compare while ignoring ResourceVersion. The content of that file should maybe be moved to pkg/utils/compare with more explicit function names 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that we had that. It works on single RuntimeObjects though so we need a version that takes slices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add it there then, so we don't keep that special resource version logic isolated in this test only?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its usefulness would be somewhat limited by the fact that most (all?) k8s API objects implement runtime.Object with pointer receivers, so callers will typically have to take pointers of the elements of the slice.

But I removed to the custom ResourceVersion code and reused comparison.AssertEqual

pkg/controller/elasticsearch/nodespec/statefulset.go Outdated Show resolved Hide resolved
@pebrc
Copy link
Collaborator Author

pebrc commented Feb 25, 2021

delete pvcs on scale down AND do not delete pvcs on cluster deletion

No that's not possible with this implementation. The closest you get is using Retain which means that the more or less empty PVCs from nodes you scaled down are retained and if and when the cluster is recreated and scaled up again to its previous size they would rejoin.

I tried to keep the behaviour aligned with the k8s KEP but if we think the use case you mentioned is a strong enough we could add another option.

@sebgl
Copy link
Contributor

sebgl commented Feb 25, 2021

I left a comment on kubernetes/enhancements#2440 to advocate for the "preserve on deletion, remove on downscale" use case.

@pebrc
Copy link
Collaborator Author

pebrc commented Feb 25, 2021

I think our naming is different enough from the KEP to allow us deviate also in behaviour. It seems that for Elasticsearch there is indeed little benefit in keeping the volume around on scale down. If anything this might cause trouble when rejoining if > 500 index deletions have happened while the node was offline. (Thanks to @\DaveCTurner for pointing this out). The indices will be considered dangling (index metadata on the old node is enough for that, so master eligible nodes are affected by that).

We could drop the RemoveOnScaleDown option and rename Retain to RetainOnClusterDeletion and change its behaviour to only affect cluster deletion and not scale downs.

This will make it a little bit harder to integrate with the KEP feature once it becomes available. RetainOnClusterDeletion would require us to keep our GC approach in place. This is because in the KEP apps.DeleteOnStatefulSetScaledown implies apps.DeleteOnStatefulSetDeletion and sets a owner reference on the StatefulSet. For our RemoveOnClusterDeletion we could use the KEP feature. However as we will want to be backwards compatible with existing versions of k8s we will have to keep the existing mechanism in place for the foreseeable future anyway.

@pebrc pebrc requested a review from sebgl March 1, 2021 14:59
@@ -460,6 +460,7 @@ e2e-local: LOCAL_E2E_CTX := /tmp/e2e-local.json
e2e-local:
@go run test/e2e/cmd/main.go run \
--test-run-name=e2e \
--operator-image=$(OPERATOR_IMAGE) \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@sebgl sebgl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API is a bit confusing.
Using RetainOnClusterDeletion instead of RemoveOnScaleDown somewhat suggests we don't remove on scale down, but IIUC we do?
It's strange to offer RemoveOnScaleDown while other options are actually a superset of it (rather than something different).

I'm wondering if we should move away further from the StatefulSet KEP, and propose something a bit simpler that fits our needs:

spec:
    # retainVolumeClaims specifies whether ECK should retain PersistentVolumeClaims when the Elasticsearch resource is removed, allowing to recreate a cluster from existing volumes. If false, ECK deletes PersistentVolumeClaims upon Elasticsearch resource deletion. Defaults to false.
    retainVolumeClaimsOnDeletion: false

@pebrc
Copy link
Collaborator Author

pebrc commented Mar 2, 2021

I'm wondering if we should move away further from the StatefulSet KEP, and propose something a bit simpler that fits our needs:

I think the only argument against it is that a boolean flag is closed against extension vs the enum style attribute which is open to extension by simple addition of values. Now we could say we simply do not see another use case that would require us to add another policy that encodes different behaviour, ever.

@thbkrkr
Copy link
Contributor

thbkrkr commented Mar 2, 2021

Seb's proposal to have "PVCs automatically deleted on StatefulSet scaledown, but not on StatefulSet deletion" led to the update of the KEP (that has already received 2 lgtm).
Possible values for PersistentVolumeClaimDeletePolicy are now:

  • Retain
  • DeleteOnStatefulSetDeletionOnly
  • DeleteOnScaledownOnly
  • DeleteOnScaledownAndStatefulSetDeletion

I find these names rather clear and it covers all the usecases.
What do you think about following this to facilitate the integration with the KEP once it becomes available?

@pebrc
Copy link
Collaborator Author

pebrc commented Mar 2, 2021

@thbkrkr yes, I talked to @sebgl out of band and we agreed on following the KEP naming. Specifically we want to support two values in the initial implementation:

  • DeleteOnScaleDownOnly
  • DeleteOnScaledownAndClusterDeletion

I will update this PR accordingly.

@thbkrkr thbkrkr added the v1.5.0 label Mar 2, 2021
@pebrc pebrc requested a review from sebgl March 2, 2021 15:21
type VolumeClaimDeletePolicy string

const (
// DeleteOnScaledownAndClusterDeletionPolicy remove PersistentVolumeClaims when the corresponding Elasticsearch node is removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// DeleteOnScaledownAndClusterDeletionPolicy remove PersistentVolumeClaims when the corresponding Elasticsearch node is removed.
// DeleteOnScaledownAndClusterDeletionPolicy removes PersistentVolumeClaims when the corresponding Elasticsearch node is removed.

Copy link
Contributor

@sebgl sebgl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some manual testing (including backward compatibility) and it all seems to work as expected 👍

I left a few nits, also let's not forget about user-facing docs on the website. Other than that, LGTM :)

}

updated := func(pvc corev1.PersistentVolumeClaim) corev1.PersistentVolumeClaim {
pvc.ResourceVersion = "1000" // fake client starts at 999
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add it there then, so we don't keep that special resource version logic isolated in this test only?

pkg/controller/elasticsearch/driver/pvc_owner_test.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/driver/pvc_owner_test.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/driver/pvc_owner_test.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/driver/pvc_owner_test.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/nodespec/statefulset.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/nodespec/statefulset.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/driver/nodes.go Show resolved Hide resolved
@pebrc
Copy link
Collaborator Author

pebrc commented Mar 2, 2021

I left a few nits, also let's not forget about user-facing docs on the website.

I raised #4287 as a reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality v1.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not remove PVC when removing Elasticsearch resource with Operator
3 participants