Switch up the cluster install order. #5846

kwoodson · 2017-10-23T14:44:21Z

The purpose of this pull request is to change the order of installation to the following:

Provision masters
Install masters
Provision node groups (infra/compute)
Join nodes to cluster (approval process)
Call hosted playbooks on entire cluster

This model of install is a bit more robust than the previous one of bringing up nodes after hosted has been installed. This method allows us to have all nodes available when the services are being configured rather than after-the-fact.

kwoodson · 2017-10-23T14:50:43Z

@mtnbikenc, @michaelgugino

We don't want to call openshift-cluster/config.yml because it will perform the entire installation from top-to-bottom. I went into the config.yml playbook and basically used the internals of that playbook inside of this one. This is good in the fact that we are breaking the pieces up and being modular. This is bad that I've duplicated code.

Do you have any suggestions on we can achieve the same behavior without the side effects of code duplication?

michaelgugino · 2017-10-23T14:58:19Z

playbooks/aws/openshift-cluster/install.yml

+
+- include: ../../common/openshift-master/additional_config.yml
+
+- include: ../../common/openshift-node/config.yml


Is this line intended?

@michaelgugino,

The masters run as nodes. This is for those hosts and not for the bootstrapped groups. We can probably consider doing all of them as bootstrapped.

Oh yeah, I always forget masters are also nodes.

Yeah, in GCP everything is bootstrapped which caused #6011 to fail when we enabled those by default

michaelgugino · 2017-10-23T15:00:42Z

playbooks/aws/openshift-cluster/install.yml


- name: run the config
-  include: ../../common/openshift-cluster/config.yml
+- name: Verify Requirements


If we're going to use this in multiple places, we should probably make it into it's on playbook and include here and in common/../config.yml

The health checker play has already been converted to it's own play (openshift-checks/install.yml), but that change is not captured here.

michaelgugino · 2017-10-23T15:09:33Z

I went into the config.yml playbook and basically used the internals of that playbook inside of this one. This is good in the fact that we are breaking the pieces up and being modular. This is bad that I've duplicated code.

I think we're as de-duplicated as we can get, save the one item I mentioned above.

Long term, I would like to see this ordering applied as the default, then we can just place a boolean on the node config portion and pass it in via the provisioning play.

kwoodson · 2017-11-05T14:57:33Z

/retest

mtnbikenc · 2017-11-10T15:57:20Z

@kwoodson The Lego'ing done here seems to highlight that we've not quite broken things down into discrete components. Everything you have broken out into hosted.yml are pieces that will need to broken out into their own component(s) when we refactor to openshift-cluster during playbook consolidation. (Those plays don't really belong there.) Could we possibly add provisioning hooks into the installer flow so we can run the desired provisioning plays when it makes sense? I hesitate on this because I know playbook includes are not dynamic and I don't want to add more skipped tasks.

I'd like to see a unified flow for the installer which covers both provisioning and installing.

mtnbikenc

At a minimum, the health check should be switched to use openshift-checks/install.yml

kwoodson · 2017-11-10T16:18:29Z

@mtnbikenc @michaelgugino @sdodson @abutcher @smarterclayton

I think the overall goal that can also be accomplished during our playbook refactor should follow this strategy (also an issue that was filed here: #6012
):

Produce a small core openshift-cluster (control plane)
Install only components relevant to ensure cluster functions (etcd, masters, certificates)
Installation of nodes (bootstrap or bare metal)
Install any desired add-ons

This architecture will enable:

smaller more definitive core code base
add ons (aka hosted) will then be stand alone making them easier to test
move openshift-node setup into the product (bootstrap)
increased modularity for additional roles/features

smarterclayton · 2017-11-10T21:24:31Z

I would like to avoid hooks - they invert the layering we want to have.

The role we call to install the control plane is going to have minimal responsibilities - the vast majority of what is in hosted should not be there. So getting that layering straight I think is a good discussion to have:

given a set of machines, set up etcd, certs, masters (and how masters are installed is going to change pretty drastically soon) until you have a working API that responds to api calls
given a control plane, configure or approve/join any nodes that are available
given a control plane with nodes, start installing addons, which includes things like:
1. storage plugins
2. networking
3. router
4. registry
5. other openshift apis

I would expect an openshift-control-plane role that did 1, but not an openshift-cluster role that did 1 and 3

michaelgugino · 2017-11-10T21:42:43Z

I would expect an openshift-control-plane role that did 1, but not an openshift-cluster role that did 1 and 3

@smarterclayton
I would expect number 1 to be 3 roles. A role for etcd, a role for certs, and a role for masters. Possibly cert role can be combined with one or the other.

I would expect our high-level play layout to be something along the lines of

Prerequisites
Setup infra (etcd, storage hosts, whatever that openshift needs to run)
Setup Masters
Setup Nodes
Setup Hosted.

mtnbikenc · 2017-11-10T22:03:34Z

This is a good discussion and I'm onboard with the general direction.

kwoodson · 2017-11-12T18:47:06Z

@mtnbikenc, I have updated the code with your recommendation to include openshift-checks/install.yml.

Please review again as I believe this is ready to go.

mtnbikenc · 2017-11-13T15:14:37Z

playbooks/aws/openshift-cluster/install.yml


- name: run the config
-  include: ../../common/openshift-cluster/config.yml
+- name: run the std_include


This name: should be openshift-checks.

mtnbikenc

Small nit on the task name.

smarterclayton · 2017-11-13T16:35:33Z

I would expect number 1 to be 3 roles. A role for etcd, a role for certs, and a role for masters. Possibly cert role can be combined with one or the other.

Maybe. I'm not interested in "generic etcd" - our requirements for how masters are laid out are also going to change - we're going to stop using system units, we're going to use static pods, and so there will be some coupling of dependencies. As long as we're cautious in our factorings (I.e., separate etcd is not a goal, but reusable etcd within a larger framework may be) we should be ok.

The control plane is probably going to look like:

assume the host is a node that supports static run-once pods (a detail of how openshift node is configured)
lay down a set of core configuration for each sub component (etcd, kube-apiserver, controller managers, service signer)
lay down a set of static pod definitions for each sub component (etcd, kube-apiserver, controller managers, service signer)
start and enable the node run-once process

If subcomponent factorings for the etcd role make sense, that's fine, but I don't want to over factor those roles until we have the next step laid out for config. We must get to static pod configs in the 3.9 timeframe, so I want us all on the same page about what that is before we move on.

kwoodson · 2017-11-13T22:47:53Z

@mtnbikenc, Thanks for finding that. I updated the name.

mtnbikenc

/lgtm

mtnbikenc · 2017-11-14T15:17:03Z

/test install

openshift-merge-robot · 2017-11-15T09:35:33Z

/test all [submit-queue is verifying that this PR is safe to merge]

openshift-ci-robot · 2017-11-15T10:08:17Z

@kwoodson: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/openshift-jenkins/upgrade	`3d14183`	link	`/test upgrade`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-merge-robot · 2017-11-15T10:52:26Z

Automatic merge from submit-queue.

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 23, 2017

kwoodson requested review from michaelgugino and mtnbikenc October 23, 2017 14:45

kwoodson self-assigned this Oct 23, 2017

kwoodson mentioned this pull request Oct 23, 2017

Provisioning updates. #5361

Merged

michaelgugino reviewed Oct 23, 2017

View reviewed changes

kwoodson force-pushed the cluster_install_order branch from 400d68c to f5f6453 Compare October 26, 2017 17:25

kwoodson mentioned this pull request Nov 4, 2017

Configure playbook should be broken up into control plane setup and then either node setup or node bootstrapping #6012

Closed

kwoodson requested a review from smarterclayton November 4, 2017 16:30

mtnbikenc requested changes Nov 10, 2017

View reviewed changes

kwoodson force-pushed the cluster_install_order branch 2 times, most recently from 51f6819 to 32af1ab Compare November 12, 2017 18:24

mtnbikenc reviewed Nov 13, 2017

View reviewed changes

mtnbikenc requested changes Nov 13, 2017

View reviewed changes

mtnbikenc self-assigned this Nov 13, 2017

Updating provisioning order.

3d14183

kwoodson force-pushed the cluster_install_order branch from 32af1ab to 3d14183 Compare November 13, 2017 22:47

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 14, 2017

mtnbikenc approved these changes Nov 14, 2017

View reviewed changes

openshift-merge-robot merged commit d4b6e2c into openshift:master Nov 15, 2017

mtnbikenc mentioned this pull request Dec 6, 2017

Playbook Consolidation - byo/config.yml #6361

Merged

kwoodson deleted the cluster_install_order branch March 5, 2018 15:41


		- include: ../../common/openshift-master/additional_config.yml

		- include: ../../common/openshift-node/config.yml

Switch up the cluster install order. #5846

Switch up the cluster install order. #5846

Uh oh!

Conversation

kwoodson commented Oct 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwoodson commented Oct 23, 2017

Uh oh!

michaelgugino Oct 23, 2017

Choose a reason for hiding this comment

Uh oh!

kwoodson Oct 23, 2017

Choose a reason for hiding this comment

Uh oh!

michaelgugino Oct 23, 2017

Choose a reason for hiding this comment

Uh oh!

smarterclayton Nov 4, 2017

Choose a reason for hiding this comment

Uh oh!

michaelgugino Oct 23, 2017

Choose a reason for hiding this comment

Uh oh!

mtnbikenc Nov 10, 2017

Choose a reason for hiding this comment

Uh oh!

michaelgugino commented Oct 23, 2017

Uh oh!

kwoodson commented Nov 5, 2017

Uh oh!

mtnbikenc commented Nov 10, 2017

Uh oh!

mtnbikenc left a comment

Choose a reason for hiding this comment

Uh oh!

kwoodson commented Nov 10, 2017

Uh oh!

smarterclayton commented Nov 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelgugino commented Nov 10, 2017

Uh oh!

mtnbikenc commented Nov 10, 2017

Uh oh!

kwoodson commented Nov 12, 2017

Uh oh!

mtnbikenc Nov 13, 2017

Choose a reason for hiding this comment

Uh oh!

mtnbikenc left a comment

Choose a reason for hiding this comment

Uh oh!

smarterclayton commented Nov 13, 2017

Uh oh!

kwoodson commented Nov 13, 2017

Uh oh!

mtnbikenc left a comment

Choose a reason for hiding this comment

Uh oh!

mtnbikenc commented Nov 14, 2017

Uh oh!

openshift-merge-robot commented Nov 15, 2017

Uh oh!

openshift-ci-robot commented Nov 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-merge-robot commented Nov 15, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kwoodson commented Oct 23, 2017 •

edited

Loading

smarterclayton commented Nov 10, 2017 •

edited

Loading

openshift-ci-robot commented Nov 15, 2017 •

edited

Loading