-
Notifications
You must be signed in to change notification settings - Fork 15
GPII-3421 - Allow couchdb updates #179
GPII-3421 - Allow couchdb updates #179
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CouchDB downtime is undesirable, of course, so I'd like to hear more about this problem.
Can we do anything to eliminate or reduce the downtime (e.g. change the deployment/update strategy, or something)? It's one thing to take downtime to upgrade to a new version of CouchDB; it's another to take downtime to give CouchDB more CPU/memory.
@@ -4,7 +4,7 @@ metadata: | |||
name: {{ template "couchdb.fullname" . }} | |||
labels: | |||
app: {{ template "couchdb.name" . }} | |||
chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand why you are changing this. Are there any consequences of this change (e.g. users of this chart are no longer able to deploy multiple CouchDBs to a given cluster, or something)?
Possibly related question: can this change be contributed back to upstream? If so, please do so before closing this ticket :). If not, I'd like to discuss the situation further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of any negative consequences of this, Helm does no relly on this info. AFAIK the only result is that you'll be actually allowed to update the DB (as opposed to thinking it's been updated and Helm/TF silently ignoring that real state is not desired state, which I think is much more dangerous behavior - think about the security update we did to CouchDB recently).
@mrtyler thanks for the review. This PR is just addressing whether you can update the DB at all or not. I'm looking into what can be done about minimizing/eliminating the downtime, as well as what can be done about Helm/TF ignoring the fact that actual state is not consistent with the desired state (mentioned in https://issues.gpii.net/browse/GPII-3421) and eventually will open different PRs for those 2 issues. |
LGTM. Ok to merge this, but I don't think the ticket is complete until we've at least talked about the CouchDB downtime issue. |
@mrtyler cheers, agreed, I'm not closing the ticket until I either resolve those 2 other issues mentioned with another PR or discuss that I've failed miserably. |
About the upgrading seems a common issue in the charts repository: helm/charts#7726 |
Opened upstream PR - helm/charts#8527 |
This fixes CouchDB updates by removing
Version
part from StatefulSet label, as Kubernetes don't allow updates to any other fields but 'replicas', 'template', and 'updateStrategy' (https://issues.gpii.net/browse/GPII-3421).Also adds
force_update
to the CouchDB and this will cause StatefulSet recreation, but without it we wouldn't be able to update, as the chart is already deployed with theVersion
in the label.While this causes CouchDB downtime, the underlying volumes and therefore data are preserved. In future we should be carefull about updates that modify anything but the fields mentioned above.
From the tests on my cluster:
PVC and associated PVs before the update:
after the update:
-> the volumes were not affected by re-creating the StatefulSet (this is how it should work :)).