Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

*: add self-hosted etcd experiment #135

Merged
merged 5 commits into from
Apr 27, 2017
Merged

Conversation

philips
Copy link
Contributor

@philips philips commented Mar 25, 2017

No description provided.

@philips
Copy link
Contributor Author

philips commented Mar 27, 2017

completely untested but it can at least pass make plan now

@philips
Copy link
Contributor Author

philips commented Mar 27, 2017

@xiang90 and @Quentin-M please take over and close.

@philips
Copy link
Contributor Author

philips commented Mar 29, 2017

screen shot 2017-03-30 at 1 22 02 am

I am seeing high API server latency with this configuration. Using kube-prometheus to measure/test.

@philips
Copy link
Contributor Author

philips commented Mar 29, 2017

@Quentin-M
Copy link
Contributor

Quentin-M commented Mar 29, 2017

@philips Could you check the CPU usage, as well as the etcd logs (and optionally disk throughput - what instance type are you using)?

@philips
Copy link
Contributor Author

philips commented Apr 3, 2017

@Quentin-M It went away after awhile; and never reproduced.

@Quentin-M
Copy link
Contributor

As per discussion with @xiang90, the current status of this is:

After this, we'll work on backup, disaster recovery (etcd itself and self-hosted) and inter-etcd+external authenticated/encrypted communication.

@xiang90
Copy link

xiang90 commented Apr 4, 2017

the io issue is also tracked here: coreos/etcd-operator#936

@philips
Copy link
Contributor Author

philips commented Apr 10, 2017

What is the status of this now @xiang90 @Quentin-M ?

@philips
Copy link
Contributor Author

philips commented Apr 12, 2017

Ping @xiang90 @Quentin-M on a status update. This is targeted to be done done in the next few weeks; are there any big blockers?

@Quentin-M
Copy link
Contributor

Still blocked on the items above. A design document for kubernetes-retired/bootkube#168 has been written, and is now being implemented.

@xiang90
Copy link

xiang90 commented Apr 15, 2017

@Quentin-M

kubernetes-retired/bootkube#168 is fixed.

can we move this forward accordingly? this currently blocks us from testing self hosted etcd e2e on tectonic cluster. (we currently mock the self hosted cluster for testing)

/cc @yifan-gu @hasbro17

@@ -26,6 +26,7 @@ spec:
- --root-ca-file=/etc/kubernetes/secrets/ca.crt
- --service-account-private-key-file=/etc/kubernetes/secrets/service-account.key
- --leader-elect=true
- --enable-garbage-collector=false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fixed on 1.6. we can get rid of it.

@Quentin-M
Copy link
Contributor

Quentin-M commented Apr 17, 2017

Rebased, implemented condition disabling of locksmithd (until we have CLUO..). We still need to bump the whole project to Kubernetes 1.6 (GC) and Bootkube 0.4.0 (kubernetes-retired/bootkube#168) though before it can be (re-)tested & used.

@Quentin-M Quentin-M changed the title WIP modules/bootkube: add self-hosted etcd experiment [Wait for bump | test] modules/bootkube: add self-hosted etcd experiment Apr 17, 2017
@Quentin-M
Copy link
Contributor

There is Kubernetes 1.6.1 and Bootkube 0.4.0: #246.
Note to myself that I will also need to add kenc / bootstrap etcd's manifest.

@Quentin-M
Copy link
Contributor

Now that the flannel issue regarding to taints is fixed, blocked on kubernetes-retired/bootkube#452.

@squat
Copy link
Contributor

squat commented Apr 26, 2017

Currently this PR fails in the bootkube pivot. trying to determine what could be causing it. Will ping @aaronlevy and @yifan-gu for a hand tomorrow.

cc @Quentin-M @sym3tri

@Quentin-M
Copy link
Contributor

@squat We had a lot of pivot issues in the past too. Make sure you use beefed-up machines with SSD drives when doing you tests - this could reduce significantly the failure rate. We noticed that the disk contention when all the images are being pulled affects the bootstrap etcd node very badly - making several requests time-out - including lease renewals..

@aaronlevy
Copy link
Contributor

fwiw the lease renewals shouldn't be an issue any longer (the bootstrap control-plane is the only thing that would lose the lease -- but now that it is static manifests and not compiled in - doesn't kill bootkube)

@Quentin-M
Copy link
Contributor

Quentin-M commented Apr 26, 2017 via email

@squat squat force-pushed the self-hosted branch 6 times, most recently from fcda268 to 999da98 Compare April 27, 2017 00:02
@squat
Copy link
Contributor

squat commented Apr 27, 2017

Successfully booting clusters with self-hosted etcd!!! also, scaling from 1-3 pods is working great 😄

@squat squat changed the title [Wait for bump | test] modules/bootkube: add self-hosted etcd experiment *: add self-hosted etcd experiment Apr 27, 2017
@squat squat force-pushed the self-hosted branch 2 times, most recently from 871ca58 to d782374 Compare April 27, 2017 01:25
@squat
Copy link
Contributor

squat commented Apr 27, 2017

Clusters are booting and the frontend now properly hides etcd page when you choose to deploy the operators :)

@squat squat force-pushed the self-hosted branch 3 times, most recently from 6d90edc to a3dcf33 Compare April 27, 2017 01:51
@squat squat force-pushed the self-hosted branch 2 times, most recently from cd2f7ec to 13d1ba9 Compare April 27, 2017 02:36
Copy link
Contributor

@Quentin-M Quentin-M left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Successfully deployed on eu-west-1
  • Scaled up to 3
  • Survived full cluster reboot

👍 👏 👌

Still waiting boot etcd to be deletd... 😗

@squat squat merged commit bb8f2df into coreos:master Apr 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants