Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

Harden elasticsearch chart for Kube 1.5 #1062

Merged
merged 4 commits into from
Jul 5, 2017

Conversation

icereval
Copy link
Collaborator

@icereval icereval commented May 11, 2017

Further extends upon PR #890

  • Added configmap to explicitly provide cluster configurations and scripts

  • Replace depreciating ES_HEAP_SIZE with ES_JAVA_OPTS to position for ES v5 support

  • Removed alpha storage class operators

  • Removed catastrophic liveness probe checking entire clusters health

  • Readiness probe now inspects local node health

  • Added termination grace period (defaults to 60m) to allow pre-stop-script.sh time to gracefully migrate shards

  • Added init container to configure vm.max_map_count

  • Updated elasticsearch.yaml:

    • Added PROCESSOR configuration to prevent large cluster garbage collection issues leading to node eviction
    • Added configurable gateway defaults to help avoid a split brain, requiring two masters online and in consensus before recovery can continue
  • Updated pre-stop-script.sh:

    • Check v1beta1 statefulset endpoint
    • Evalute .spec.replicas for statefulset desired size
    • Clear _cluster/settings ip exclusion prior to shutdown to avoid a possible (random) ip match scenario on expansion of the clsuter
  • Data nodes now use default storage class if one is not specified

  • Apply Helm best practices to prep for stable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 11, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @icereval. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 11, 2017
DataStorageClass: "anything"
DataStorageClassVersion: "alpha"
# DataStorageClass: "ssd"
DataTerminationGracePeriodSeconds: 900
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not move the chart to stable as long as config values start with an uppercase letter. In this case, I would also switch to nesting first. I think this make sense here, e. g.:

client:
  replicas: 2
  port: 2379
  heapSize: "128m"
master:
  replicas: 2
  heapSize: "128m"
data:
  replicas: 3
  storage: "30Gi"

See https://github.com/kubernetes/helm/blob/master/docs/chart_best_practices/values.md

Generally, I think a move to stable should be a separate PR so it is explicit.

Regarding resources, it's becoming increasingly popular to specify resources like so (without specifying defaults): https://github.com/kubernetes/charts/blob/master/stable/nginx-ingress/templates/controller-deployment.yaml#L93

Copy link
Collaborator Author

@icereval icereval May 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree and all are very good points. Would you mind informing the base PR #890, if we can get this merged, even if still in incubator status one of us can follow up with a new PR and the final changes needed to make this stable.

@icereval icereval force-pushed the feature/elasticsearch branch 2 times, most recently from fd939ad to 918f888 Compare May 13, 2017 01:56
@unguiculus unguiculus self-assigned this May 24, 2017
@unguiculus unguiculus requested a review from simonswine May 24, 2017 18:52
@icereval icereval force-pushed the feature/elasticsearch branch from 918f888 to 38dc3f0 Compare June 1, 2017 21:52
@viglesiasce
Copy link
Contributor

@simonswine @unguiculus any update on this review? Were the latest changes enough to merge?

@unguiculus
Copy link
Member

As suggested in #890, a move to stable should only happen after applying best practices, e. g. regarding values. See https://github.com/kubernetes/helm/blob/master/docs/chart_best_practices/values.md.

mkrakowitzer and others added 2 commits June 19, 2017 06:40
* Add environment variable KUBERNETES_MASTER, resolves issue documented
here:
fabric8io/fabric8#6229 (comment)
* Rename PetSet to StatefulSet, rename template file
* Add initialDelay and increase timesouts to all liveness and readiness
checks. This was the only way I could get it to deploy reliably in my
environment.
* Update to a newer image version
* Added configmap to explicitly provide cluster configurations and scripts

* Replace depreciating `ES_HEAP_SIZE` with `ES_JAVA_OPTS` to position for ES v5 support

* Removed alpha storage class operators

* Removed catastrophic liveness probe checking entire clusters health

* Readiness probe now inspects local node health

* Added termination grace period (defaults to 15m) to allow pre-stop-script.sh time to gracefully migrate shards

* Added init container to configure `vm.max_map_count`

* Updated elasticsearch.yaml:
  * Added `PROCESSOR` configuration to prevent large cluster garbage collection issues leading to node eviction
  * Added configurable gateway defaults to help avoid a split brain, requiring two masters online and in consensus before recovery can continue

* Updated pre-stop-script.sh:
  * Check `v1beta1` `statefulset` endpoint
  * Evalute `.spec.replicas` for statefulset desired size
  * Clear `_cluster/settings` ip exclusion prior to shutdown to avoid a possible (random) ip match scenario on expansion of the clsuter

* Data nodes now use default storage class if once is not specified
@icereval icereval force-pushed the feature/elasticsearch branch from 38dc3f0 to b62ccae Compare June 19, 2017 10:59
@icereval
Copy link
Collaborator Author

@unguiculus & @prydonius, rebase complete and a first pass at helm best practices applied to the incubator chart.

@icereval icereval force-pushed the feature/elasticsearch branch from c1e43a6 to c425ab8 Compare June 19, 2017 13:50
@unguiculus
Copy link
Member

Excellent. This goes in the right direction. Now, please have a look at app labels, which should be {{ template "name" . }}. Note that you will then have to add the release label to the selector in services. nginx-ingress an excellent example:

https://github.com/kubernetes/charts/blob/master/stable/nginx-ingress/templates/controller-deployment.yaml#L6
https://github.com/kubernetes/charts/blob/master/stable/nginx-ingress/templates/controller-service.yaml#L56-L58

@unguiculus
Copy link
Member

@icereval Are you working on an update?

@icereval
Copy link
Collaborator Author

icereval commented Jul 3, 2017

@unguiculus, yep, I'll have my updates ready soon

@icereval icereval force-pushed the feature/elasticsearch branch from c425ab8 to e1ed701 Compare July 3, 2017 03:35
@icereval
Copy link
Collaborator Author

icereval commented Jul 3, 2017

@unguiculus, best practice changes pushed up

@unguiculus
Copy link
Member

Would you mind adding a NOTES.txt? Otherwise it looks nice but I have yet to review more thoroughly. I installed it on GKE and everything came up nicely.

@unguiculus
Copy link
Member

@k8s-bot ok to test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 4, 2017
@icereval
Copy link
Collaborator Author

icereval commented Jul 4, 2017

Added NOTES.txt and client.serviceType based on review of the concourse stable chart.

@icereval icereval force-pushed the feature/elasticsearch branch from ea779d1 to 5b6139f Compare July 4, 2017 20:49
@unguiculus unguiculus added UX reviewed lgtm Indicates that a PR is ready to be merged. labels Jul 5, 2017
@unguiculus unguiculus merged commit 09892a3 into helm:master Jul 5, 2017
mikesplain pushed a commit to barklyprotects/charts that referenced this pull request Jul 6, 2017
* Update elasticsearch chart to work with Kube 1.5

* Add environment variable KUBERNETES_MASTER, resolves issue documented
here:
fabric8io/fabric8#6229 (comment)
* Rename PetSet to StatefulSet, rename template file
* Add initialDelay and increase timesouts to all liveness and readiness
checks. This was the only way I could get it to deploy reliably in my
environment.
* Update to a newer image version

* Harden aspects of the elasticsearch chart

* Added configmap to explicitly provide cluster configurations and scripts

* Replace depreciating `ES_HEAP_SIZE` with `ES_JAVA_OPTS` to position for ES v5 support

* Removed alpha storage class operators

* Removed catastrophic liveness probe checking entire clusters health

* Readiness probe now inspects local node health

* Added termination grace period (defaults to 15m) to allow pre-stop-script.sh time to gracefully migrate shards

* Added init container to configure `vm.max_map_count`

* Updated elasticsearch.yaml:
  * Added `PROCESSOR` configuration to prevent large cluster garbage collection issues leading to node eviction
  * Added configurable gateway defaults to help avoid a split brain, requiring two masters online and in consensus before recovery can continue

* Updated pre-stop-script.sh:
  * Check `v1beta1` `statefulset` endpoint
  * Evalute `.spec.replicas` for statefulset desired size
  * Clear `_cluster/settings` ip exclusion prior to shutdown to avoid a possible (random) ip match scenario on expansion of the clsuter

* Data nodes now use default storage class if once is not specified

* Apply best practices

* Add Notes for client service types, and warnings
yanns pushed a commit to yanns/charts that referenced this pull request Jul 28, 2017
* Update elasticsearch chart to work with Kube 1.5

* Add environment variable KUBERNETES_MASTER, resolves issue documented
here:
fabric8io/fabric8#6229 (comment)
* Rename PetSet to StatefulSet, rename template file
* Add initialDelay and increase timesouts to all liveness and readiness
checks. This was the only way I could get it to deploy reliably in my
environment.
* Update to a newer image version

* Harden aspects of the elasticsearch chart

* Added configmap to explicitly provide cluster configurations and scripts

* Replace depreciating `ES_HEAP_SIZE` with `ES_JAVA_OPTS` to position for ES v5 support

* Removed alpha storage class operators

* Removed catastrophic liveness probe checking entire clusters health

* Readiness probe now inspects local node health

* Added termination grace period (defaults to 15m) to allow pre-stop-script.sh time to gracefully migrate shards

* Added init container to configure `vm.max_map_count`

* Updated elasticsearch.yaml:
  * Added `PROCESSOR` configuration to prevent large cluster garbage collection issues leading to node eviction
  * Added configurable gateway defaults to help avoid a split brain, requiring two masters online and in consensus before recovery can continue

* Updated pre-stop-script.sh:
  * Check `v1beta1` `statefulset` endpoint
  * Evalute `.spec.replicas` for statefulset desired size
  * Clear `_cluster/settings` ip exclusion prior to shutdown to avoid a possible (random) ip match scenario on expansion of the clsuter

* Data nodes now use default storage class if once is not specified

* Apply best practices

* Add Notes for client service types, and warnings
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. code reviewed lgtm Indicates that a PR is ready to be merged. UX reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants