Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Update ROADMAP.md #675

Merged
merged 1 commit into from
Jul 6, 2017
Merged

Update ROADMAP.md #675

merged 1 commit into from
Jul 6, 2017

Conversation

mumoshu
Copy link
Contributor

@mumoshu mumoshu commented May 24, 2017

@c-knowles @danielfm @redbaron @camilb Recently, I've implemented the revised CA support in #629 according to our roadmap.
The revised CA support being the last concrete objective that was achievable in short term, I believe it's time to discuss about the plan for upcoming releases 😃
Any comments, requests, thoughts on the roadmap?

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 24, 2017

## v0.9.8

* Experimental support for kube-aws plugins
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accordingly to #509
I'm looking forward to @c-knowles's design proposal for this 🙇


## v0.9.9

* RBAC enabled by default
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accordingly to #655
My plan is to first ensure all the features to be working with RBAC in v0.9.8.
It would be a matter of just start enabling it by default in v0.9.9 then.


## v0.9.10

* kubeadm support to simplify k8s components configuration
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accordingly to #654
However, I'm not yet sure what clear and concrete benefits we can gain by doing so, at least at the moment of writing this.

@codecov-io
Copy link

codecov-io commented May 24, 2017

Codecov Report

Merging #675 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #675   +/-   ##
=======================================
  Coverage   37.41%   37.41%           
=======================================
  Files          52       52           
  Lines        3170     3170           
=======================================
  Hits         1186     1186           
  Misses       1807     1807           
  Partials      177      177

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e20f4e...9c716d1. Read the comment docs.

@redbaron
Copy link
Contributor

I am working on porting things to Ignition

@camilb
Copy link
Contributor

camilb commented May 24, 2017

I'm currently testing the integration of these add-ons :

Working on some improvements:

  • provide RBAC permissions for kube2iam, nginx ingress, external-dns
  • simplify and improve security for dex integration

Future plans:

  • option to deploy a in-cluster monitoring solution using Prometheus Operator with basic alerts and dashboards configured by default. Thinking to integrate some of the features I implemented in prometheus-kubernetes

@redbaron
Copy link
Contributor

My current plan in the order of pain points for us:

  • Ignition
  • Move Flannel and Calico to Kubernetes backends instead of Etcd
  • Self hosting Flannel & Calico via daemonset
  • Safe cluster rolls (coordinate ASG shutdown events and node drain calls)
  • Self-hosted Etcd via bootkube & etcd-operator

@danielfm
Copy link
Contributor

@redbaron

Safe cluster rolls (coordinate ASG shutdown events and node drain calls)

I'm currently testing this in #674.

@redbaron
Copy link
Contributor

@danielfm that improves node drainer behaviour, which is good, but it doesn't prevent 2 nodes going down and causing lets say ZK running in kube to lose quorum.

@danielfm
Copy link
Contributor

@redbaron could you elaborate on this in #674 please?

If there's anything I can do to accommodate this in that PR, please let me know.

@cknowles
Copy link
Contributor

I'm mainly concentrating on plugins and will publish some design soon. Also looking at general stability. Several times when I updated kube-aws something broke and not because I did anything different/special or changed any of my setup. In that sense, we should double down on tests and think a little more about the goals of kube-aws and what should not be included. I know @redbaron mentioned this same thing many times ;)

When there are a range of options for implementation of some aspect with no clear winner, what do we do? We'd need to pick one or support multiple options or some generic method to plug them in. How do we decide? I'm interested in why we wish to pick a certain tool over another. Some of the above seem natural choices, mostly the CoreOS projects or other Kubernetes org projects. A counter example is ingress controller choice - there a several, we're using the Traefik Helm chart which also does the job of kube-lego.

@danielfm
Copy link
Contributor

danielfm commented May 24, 2017

@c-knowles I was also thinking what should be the "right" scope for kube-aws.

Is it only supposed to create a working cluster - and the user must use other means to bootstrap the services running on top of it, such as ingress controllers, monitoring, etc -, or is it supposed to be a "batteries included" kind of tool?

I think the first approach is the best one for my use case, but I wonder if there's any objective ways to determine which approach is the right one in general.

@camilb
Copy link
Contributor

camilb commented May 24, 2017

@c-knowles I agree we should freeze the new features at some point and focus on stability. Features where we have multiple options, if possible, we should provide them as plugins for helm, like the nginx ingress + kube-lego or traefik and the user can choose which one is better for his needs. For other tools there is a more clear direction, like, for example, "external-dns" which is a project that tries to replace all the other tools for automatic DNS configuration and can be directly integrated, deployed using helm or as an add-on.

The features I'm looking at the moment, can be deployed separately. Maybe we can create a directory for add-ons and charts where we can have multiple examples of self-managed services that can be easy to deploy after the cluster is up.

The "cluster.yaml" already became quite complex and sometimes there are options that not work together by default (like rbac + kube2iam for example) and will be even harder to test and maintain all the possible setups in the future.

@cknowles
Copy link
Contributor

@danielfm The things I've built have assumed the first also due to what kube-aws was like when I started using it. I agree if there is a more objective way to assess that I'm happy to participate. The metrics I can think of are community support/involvement, ease of maintenance such as testing properly, clear separation of concerns.

@mumoshu listed a bunch of the goals of kube-aws, maybe we should list non-goals as well? If we stick to a CoreOS type ethos then it will be sensible, secure, HA defaults and most things will be modular and usable separately. As @camilb says we already have options that don't work together and the cluster.yaml is winding up towards 1500 lines by default when you start using the tool without doing any modifications!

@redbaron
Copy link
Contributor

For me kube-aws ends where working, stable and manageable Kubernetes API starts. nginx-ingress, monitoring, anything else which runs on top of it already has Helm, where community collaborate on best practices.

@mumoshu
Copy link
Contributor Author

mumoshu commented May 25, 2017

Is it only supposed to create a working cluster - and the user must use other means to bootstrap the services running on top of it, such as ingress controllers, monitoring, etc -, or is it supposed to be a "batteries included" kind of tool?

Could be the latter, but certainly what would be included into kube-aws should be limited.
kube-aws is all about running production k8s clusters - what we need for a production k8s cluster is high scalability/availability/security/flexibility/customizabilit/maintainability.

Especially while incubating, I'm rather happy to include many optional "integrations" or "supports" that contribute to easily achieve any of kube-aws goals above. For example, installing dex on a k8s cluster is not yet a matter of just running helm install as of today, and it also contribute to security which is one of kube-aws goals. That's why it made sense for me to include dex integration into kube-aws.

Regarding integrations or supports don't contribute to any of kube-aws goals, I'd like them to not included in kube-aws for now but after kube-aws plugins #509 has landed. Plugins contributing kube-aws goals may reside in builtinplugins and others may reside in something like contrib/plugins in this repo.

We should investigate and settle on a set of possible/necessary extension points for kube-aws to implement kube-aws plugins #509. Hardening all the extension points on kube-aws side would result in adding more integrations as kube-aws plugins not to break a cluster.

So -

For me kube-aws ends where working, stable and manageable Kubernetes API starts. nginx-ingress, monitoring, anything else which runs on top of it already has Helm, where community collaborate on best practices.

I agree 👍

options that don't work together

If the options were ones in kube-aws core and builtin plugins, kube-aws maintainers(oh, me) would be responsible to ensure they're working(or at least validated and forbid).
If the options were ones from kube-aws plugins, I believe plugin maintainers would be responsible.

Btw I believe we'd better improve our CI so that we can keep ensuring a lot of options to work.
However we don't yet have budget for e.g. a dedicated AWS account to allow us running many E2E tests in CI.

cluster.yaml is winding up towards 1500 lines by default when you start using the tool without doing any modifications!

For this problem, I believe a dedicated kube-aws documentation site #534 with a bunch of example cluster.yamls and detailed explanation per each setting key would make the everything-in-the-fat-clulster.yaml to be unnecessary. If we had such a doc, the default cluster.yaml should be just several lines. However I'm sure the doc takes much more time to maintain than the current fat cluster.yaml. I myself didn't happen to have much time to maintain a dedicated doc site so I have been came along with the fat cluster.yaml until now, even though I'm not happy with that.


Please poke me if anything is still unclear!
Thanks for your supports 👍

@mumoshu
Copy link
Contributor Author

mumoshu commented May 25, 2017

@redbaron How are you planning to migrate to ignition?
I have been thinking of a hybrid(ignition calls cloud-config, both executed) at the first stage and gradually move systemd unit definitions from cloud-config to ignition so that each change won't be too big and we can fire and motion.

@everpeace
Copy link
Contributor

everpeace commented May 25, 2017

This is what I'm just thinking, and not short term, But I am wondering to support Elastic GPU which is still under preview release.

Recently, I implement NVIDIA Driver installation support for GPU instances (#645) because I think easiness to spin up deep-learning-ready k8s cluster is important. Elastic GPU support provides users spawn more flexible nodepools (medium instance having huge GPUs or so).

This feature might not worth to be included builtinplugins which @mumoshu stated.

@cknowles
Copy link
Contributor

@mumoshu ah yes I should have added to my list documentation! I'm working on that but progress is slow as I'm concentrating on plugins design. I'll be back onto that afterwards.

Anyone interested in istio ?

@mumoshu
Copy link
Contributor Author

mumoshu commented May 26, 2017

@c-knowles I'm interested in istio and also linkerd - not sure how they are different in which ways yet

@cknowles
Copy link
Contributor

@mumoshu best places I found with comparisons between istio and linkerd:
https://news.ycombinator.com/item?id=14410533
https://lyft.github.io/envoy/docs/intro/comparison.html

From what I've read so far, istio is definitely our preference.

@mumoshu
Copy link
Contributor Author

mumoshu commented Jul 6, 2017

Thank you very much for the great discussion everyone!
I'm merging this anyway but please let me know if you'd like to include your items in specific releases, so that we can probably coordinate our works better together.

I'm roughly estimating that kube-aws will be released every 1~2 month(s)(rc's are not included).

@mumoshu
Copy link
Contributor Author

mumoshu commented Jul 6, 2017

Playing with k8s-bot...

@mumoshu
Copy link
Contributor Author

mumoshu commented Jul 6, 2017

/lgtm

@k8s-ci-robot
Copy link
Contributor

@mumoshu: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mumoshu mumoshu merged commit fed2684 into master Jul 6, 2017
@mumoshu mumoshu deleted the roadmap-updates branch July 6, 2017 14:07
@mumoshu mumoshu mentioned this pull request Jul 6, 2017
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this pull request Mar 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants