-
Notifications
You must be signed in to change notification settings - Fork 266
Conversation
completely untested but it can at least pass make plan now |
@xiang90 and @Quentin-M please take over and close. |
@philips Could you check the CPU usage, as well as the etcd logs (and optionally disk throughput - what instance type are you using)? |
@Quentin-M It went away after awhile; and never reproduced. |
As per discussion with @xiang90, the current status of this is:
After this, we'll work on backup, disaster recovery (etcd itself and self-hosted) and inter-etcd+external authenticated/encrypted communication. |
the io issue is also tracked here: coreos/etcd-operator#936 |
What is the status of this now @xiang90 @Quentin-M ? |
Ping @xiang90 @Quentin-M on a status update. This is targeted to be done done in the next few weeks; are there any big blockers? |
Still blocked on the items above. A design document for kubernetes-retired/bootkube#168 has been written, and is now being implemented. |
kubernetes-retired/bootkube#168 is fixed. can we move this forward accordingly? this currently blocks us from testing self hosted etcd e2e on tectonic cluster. (we currently mock the self hosted cluster for testing) |
@@ -26,6 +26,7 @@ spec: | |||
- --root-ca-file=/etc/kubernetes/secrets/ca.crt | |||
- --service-account-private-key-file=/etc/kubernetes/secrets/service-account.key | |||
- --leader-elect=true | |||
- --enable-garbage-collector=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fixed on 1.6. we can get rid of it.
Rebased, implemented condition disabling of locksmithd (until we have CLUO..). We still need to bump the whole project to Kubernetes 1.6 (GC) and Bootkube 0.4.0 (kubernetes-retired/bootkube#168) though before it can be (re-)tested & used. |
Now that the flannel issue regarding to taints is fixed, blocked on kubernetes-retired/bootkube#452. |
4154e05
to
2fd4aa1
Compare
Currently this PR fails in the bootkube pivot. trying to determine what could be causing it. Will ping @aaronlevy and @yifan-gu for a hand tomorrow. |
@squat We had a lot of pivot issues in the past too. Make sure you use beefed-up machines with SSD drives when doing you tests - this could reduce significantly the failure rate. We noticed that the disk contention when all the images are being pulled affects the bootstrap etcd node very badly - making several requests time-out - including lease renewals.. |
fwiw the lease renewals shouldn't be an issue any longer (the bootstrap control-plane is the only thing that would lose the lease -- but now that it is static manifests and not compiled in - doesn't kill bootkube) |
Thanks for the clarification. I didn't make it clear that timeouts on lease renewal won't break bootkube now thanks to these guys' work! Just intended to convey that generally speaking, calls can timeout easily if SSD drives are not used.
… On Apr 26, 2017, at 10:37, Aaron Levy ***@***.***> wrote:
fwiw the lease renewals shouldn't be an issue any longer (the bootstrap control-plane is the only thing that would lose the lease -- but now that it is static manifests and not compiled in - doesn't kill bootkube)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
fcda268
to
999da98
Compare
Successfully booting clusters with self-hosted etcd!!! also, scaling from 1-3 pods is working great 😄 |
871ca58
to
d782374
Compare
Clusters are booting and the frontend now properly hides etcd page when you choose to deploy the operators :) |
6d90edc
to
a3dcf33
Compare
cd2f7ec
to
13d1ba9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Successfully deployed on eu-west-1
- Scaled up to 3
- Survived full cluster reboot
👍 👏 👌
Still waiting boot etcd to be deletd... 😗
No description provided.