*: add self-hosted etcd experiment #135

philips · 2017-03-25T18:00:07Z

No description provided.

philips · 2017-03-27T10:44:18Z

completely untested but it can at least pass make plan now

philips · 2017-03-27T17:46:51Z

@xiang90 and @Quentin-M please take over and close.

philips · 2017-03-29T23:23:19Z

I am seeing high API server latency with this configuration. Using kube-prometheus to measure/test.

philips · 2017-03-29T23:23:27Z

https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus

Quentin-M · 2017-03-29T23:29:59Z

@philips Could you check the CPU usage, as well as the etcd logs (and optionally disk throughput - what instance type are you using)?

philips · 2017-04-03T23:18:11Z

@Quentin-M It went away after awhile; and never reproduced.

Quentin-M · 2017-04-04T23:24:48Z

As per discussion with @xiang90, the current status of this is:

We are waiting for Possibly use static pods for temporary control-plane rather than compiled-in components kubernetes-retired/bootkube#168,
We are waiting for Kubernetes 1.6 (allows us to keep the GC enabled),
Need to conditionally enable/disable locksmithd.
It will make it to the release by the end of April.

After this, we'll work on backup, disaster recovery (etcd itself and self-hosted) and inter-etcd+external authenticated/encrypted communication.

xiang90 · 2017-04-04T23:33:12Z

the io issue is also tracked here: coreos/etcd-operator#936

philips · 2017-04-10T15:17:24Z

What is the status of this now @xiang90 @Quentin-M ?

philips · 2017-04-12T20:12:36Z

Ping @xiang90 @Quentin-M on a status update. This is targeted to be done done in the next few weeks; are there any big blockers?

Quentin-M · 2017-04-12T22:04:34Z

Still blocked on the items above. A design document for kubernetes-retired/bootkube#168 has been written, and is now being implemented.

xiang90 · 2017-04-15T01:33:51Z

@Quentin-M

kubernetes-retired/bootkube#168 is fixed.

can we move this forward accordingly? this currently blocks us from testing self hosted etcd e2e on tectonic cluster. (we currently mock the self hosted cluster for testing)

/cc @yifan-gu @hasbro17

xiang90 · 2017-04-15T01:36:27Z

modules/bootkube/resources/manifests/kube-controller-manager.yaml

@@ -26,6 +26,7 @@ spec:
        - --root-ca-file=/etc/kubernetes/secrets/ca.crt
        - --service-account-private-key-file=/etc/kubernetes/secrets/service-account.key
        - --leader-elect=true
+        - --enable-garbage-collector=false


this is fixed on 1.6. we can get rid of it.

Quentin-M · 2017-04-17T01:20:10Z

Rebased, implemented condition disabling of locksmithd (until we have CLUO..). We still need to bump the whole project to Kubernetes 1.6 (GC) and Bootkube 0.4.0 (kubernetes-retired/bootkube#168) though before it can be (re-)tested & used.

Quentin-M · 2017-04-17T05:35:18Z

There is Kubernetes 1.6.1 and Bootkube 0.4.0: #246.
Note to myself that I will also need to add kenc / bootstrap etcd's manifest.

Quentin-M · 2017-04-19T17:40:50Z

Now that the flannel issue regarding to taints is fixed, blocked on kubernetes-retired/bootkube#452.

squat · 2017-04-26T02:49:30Z

Currently this PR fails in the bootkube pivot. trying to determine what could be causing it. Will ping @aaronlevy and @yifan-gu for a hand tomorrow.

cc @Quentin-M @sym3tri

Quentin-M · 2017-04-26T08:29:38Z

@squat We had a lot of pivot issues in the past too. Make sure you use beefed-up machines with SSD drives when doing you tests - this could reduce significantly the failure rate. We noticed that the disk contention when all the images are being pulled affects the bootstrap etcd node very badly - making several requests time-out - including lease renewals..

aaronlevy · 2017-04-26T17:37:05Z

fwiw the lease renewals shouldn't be an issue any longer (the bootstrap control-plane is the only thing that would lose the lease -- but now that it is static manifests and not compiled in - doesn't kill bootkube)

Quentin-M · 2017-04-26T17:45:57Z

Thanks for the clarification. I didn't make it clear that timeouts on lease renewal won't break bootkube now thanks to these guys' work! Just intended to convey that generally speaking, calls can timeout easily if SSD drives are not used.

…

On Apr 26, 2017, at 10:37, Aaron Levy ***@***.***> wrote: fwiw the lease renewals shouldn't be an issue any longer (the bootstrap control-plane is the only thing that would lose the lease -- but now that it is static manifests and not compiled in - doesn't kill bootkube) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

squat · 2017-04-27T00:13:54Z

Successfully booting clusters with self-hosted etcd!!! also, scaling from 1-3 pods is working great 😄

squat · 2017-04-27T01:26:41Z

Clusters are booting and the frontend now properly hides etcd page when you choose to deploy the operators :)

…cd if experimental operators is off.

Quentin-M

Successfully deployed on eu-west-1
Scaled up to 3
Survived full cluster reboot

👍 👏 👌

Still waiting boot etcd to be deletd... 😗

philips force-pushed the self-hosted branch from 30518e7 to 5595b92 Compare March 27, 2017 10:42

philips force-pushed the self-hosted branch from 5595b92 to 33f8cfd Compare March 27, 2017 10:50

s-urbaniak added the do-not-merge label Apr 6, 2017

xiang90 reviewed Apr 15, 2017

View reviewed changes

Quentin-M force-pushed the self-hosted branch from 44a1be5 to 0a3efc2 Compare April 17, 2017 01:18

Quentin-M changed the title ~~WIP modules/bootkube: add self-hosted etcd experiment~~ [Wait for bump | test] modules/bootkube: add self-hosted etcd experiment Apr 17, 2017

Quentin-M mentioned this pull request Apr 24, 2017

*: update etcd-operator and etcd version kubernetes-retired/bootkube#461

Merged

squat force-pushed the self-hosted branch 4 times, most recently from 4154e05 to 2fd4aa1 Compare April 25, 2017 00:10

s-urbaniak added tectonic/backend labels Apr 25, 2017

squat force-pushed the self-hosted branch from 2fd4aa1 to c7ccfd5 Compare April 25, 2017 19:47

squat force-pushed the self-hosted branch 6 times, most recently from fcda268 to 999da98 Compare April 27, 2017 00:02

squat removed the tectonic/backend label Apr 27, 2017

squat force-pushed the self-hosted branch from 999da98 to 0f94274 Compare April 27, 2017 01:02

squat changed the title ~~[Wait for bump | test] modules/bootkube: add self-hosted etcd experiment~~ *: add self-hosted etcd experiment Apr 27, 2017

squat force-pushed the self-hosted branch 2 times, most recently from 871ca58 to d782374 Compare April 27, 2017 01:25

squat force-pushed the self-hosted branch 3 times, most recently from 6d90edc to a3dcf33 Compare April 27, 2017 01:51

ggreer and others added 4 commits April 26, 2017 18:52

frontend: Move etcd nodes to define nodes page. Only show/validate et…

50cb6b2

…cd if experimental operators is off.

*: add self-hosted etcd

dd974a0

modules/aws/ignition: disable locksmithd with self-hosted etcd

2a77ae0

*: add kenc and bootstrap etcd

7acb18e

squat force-pushed the self-hosted branch 2 times, most recently from cd2f7ec to 13d1ba9 Compare April 27, 2017 02:36

Documentation: add network requirements for etcd

13d1ba9

Quentin-M approved these changes Apr 27, 2017

View reviewed changes

squat merged commit bb8f2df into coreos:master Apr 27, 2017

sym3tri mentioned this pull request May 2, 2017

all: have flag to enable self-hosted etcd #57

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: add self-hosted etcd experiment #135

*: add self-hosted etcd experiment #135

philips commented Mar 25, 2017

philips commented Mar 27, 2017

philips commented Mar 27, 2017

philips commented Mar 29, 2017

philips commented Mar 29, 2017

Quentin-M commented Mar 29, 2017 •

edited

Loading

philips commented Apr 3, 2017

Quentin-M commented Apr 4, 2017

xiang90 commented Apr 4, 2017

philips commented Apr 10, 2017

philips commented Apr 12, 2017

Quentin-M commented Apr 12, 2017

xiang90 commented Apr 15, 2017

xiang90 Apr 15, 2017

Quentin-M commented Apr 17, 2017 •

edited

Loading

Quentin-M commented Apr 17, 2017

Quentin-M commented Apr 19, 2017

squat commented Apr 26, 2017

Quentin-M commented Apr 26, 2017

aaronlevy commented Apr 26, 2017

Quentin-M commented Apr 26, 2017 via email

squat commented Apr 27, 2017

squat commented Apr 27, 2017

Quentin-M left a comment

*: add self-hosted etcd experiment #135

*: add self-hosted etcd experiment #135

Conversation

philips commented Mar 25, 2017

philips commented Mar 27, 2017

philips commented Mar 27, 2017

philips commented Mar 29, 2017

philips commented Mar 29, 2017

Quentin-M commented Mar 29, 2017 • edited Loading

philips commented Apr 3, 2017

Quentin-M commented Apr 4, 2017

xiang90 commented Apr 4, 2017

philips commented Apr 10, 2017

philips commented Apr 12, 2017

Quentin-M commented Apr 12, 2017

xiang90 commented Apr 15, 2017

xiang90 Apr 15, 2017

Choose a reason for hiding this comment

Quentin-M commented Apr 17, 2017 • edited Loading

Quentin-M commented Apr 17, 2017

Quentin-M commented Apr 19, 2017

squat commented Apr 26, 2017

Quentin-M commented Apr 26, 2017

aaronlevy commented Apr 26, 2017

Quentin-M commented Apr 26, 2017 via email

squat commented Apr 27, 2017

squat commented Apr 27, 2017

Quentin-M left a comment

Choose a reason for hiding this comment

Quentin-M commented Mar 29, 2017 •

edited

Loading

Quentin-M commented Apr 17, 2017 •

edited

Loading