Skip to content

Conversation

@kwoodson
Copy link
Contributor

@kwoodson kwoodson commented Oct 23, 2017

The purpose of this pull request is to change the order of installation to the following:

  • Provision masters
  • Install masters
  • Provision node groups (infra/compute)
  • Join nodes to cluster (approval process)
  • Call hosted playbooks on entire cluster

This model of install is a bit more robust than the previous one of bringing up nodes after hosted has been installed. This method allows us to have all nodes available when the services are being configured rather than after-the-fact.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 23, 2017
@kwoodson kwoodson self-assigned this Oct 23, 2017
@kwoodson
Copy link
Contributor Author

@mtnbikenc, @michaelgugino

We don't want to call openshift-cluster/config.yml because it will perform the entire installation from top-to-bottom. I went into the config.yml playbook and basically used the internals of that playbook inside of this one. This is good in the fact that we are breaking the pieces up and being modular. This is bad that I've duplicated code.

Do you have any suggestions on we can achieve the same behavior without the side effects of code duplication?

@kwoodson kwoodson mentioned this pull request Oct 23, 2017

- include: ../../common/openshift-master/additional_config.yml

- include: ../../common/openshift-node/config.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelgugino,

The masters run as nodes. This is for those hosts and not for the bootstrapped groups. We can probably consider doing all of them as bootstrapped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I always forget masters are also nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in GCP everything is bootstrapped which caused #6011 to fail when we enabled those by default


- name: run the config
include: ../../common/openshift-cluster/config.yml
- name: Verify Requirements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to use this in multiple places, we should probably make it into it's on playbook and include here and in common/../config.yml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The health checker play has already been converted to it's own play (openshift-checks/install.yml), but that change is not captured here.

@michaelgugino
Copy link
Contributor

I went into the config.yml playbook and basically used the internals of that playbook inside of this one. This is good in the fact that we are breaking the pieces up and being modular. This is bad that I've duplicated code.

I think we're as de-duplicated as we can get, save the one item I mentioned above.

Long term, I would like to see this ordering applied as the default, then we can just place a boolean on the node config portion and pass it in via the provisioning play.

@kwoodson
Copy link
Contributor Author

kwoodson commented Nov 5, 2017

/retest

@mtnbikenc
Copy link
Member

@kwoodson The Lego'ing done here seems to highlight that we've not quite broken things down into discrete components. Everything you have broken out into hosted.yml are pieces that will need to broken out into their own component(s) when we refactor to openshift-cluster during playbook consolidation. (Those plays don't really belong there.) Could we possibly add provisioning hooks into the installer flow so we can run the desired provisioning plays when it makes sense? I hesitate on this because I know playbook includes are not dynamic and I don't want to add more skipped tasks.

I'd like to see a unified flow for the installer which covers both provisioning and installing.

Copy link
Member

@mtnbikenc mtnbikenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a minimum, the health check should be switched to use openshift-checks/install.yml

@kwoodson
Copy link
Contributor Author

@mtnbikenc @michaelgugino @sdodson @abutcher @smarterclayton

I think the overall goal that can also be accomplished during our playbook refactor should follow this strategy (also an issue that was filed here: #6012
):

  • Produce a small core openshift-cluster (control plane)
  • Install only components relevant to ensure cluster functions (etcd, masters, certificates)
  • Installation of nodes (bootstrap or bare metal)
  • Install any desired add-ons

This architecture will enable:

  • smaller more definitive core code base
  • add ons (aka hosted) will then be stand alone making them easier to test
  • move openshift-node setup into the product (bootstrap)
  • increased modularity for additional roles/features

@smarterclayton
Copy link
Contributor

smarterclayton commented Nov 10, 2017

I would like to avoid hooks - they invert the layering we want to have.

The role we call to install the control plane is going to have minimal responsibilities - the vast majority of what is in hosted should not be there. So getting that layering straight I think is a good discussion to have:

  1. given a set of machines, set up etcd, certs, masters (and how masters are installed is going to change pretty drastically soon) until you have a working API that responds to api calls
  2. given a control plane, configure or approve/join any nodes that are available
  3. given a control plane with nodes, start installing addons, which includes things like:
    1. storage plugins
    2. networking
    3. router
    4. registry
    5. other openshift apis

I would expect an openshift-control-plane role that did 1, but not an openshift-cluster role that did 1 and 3

@michaelgugino
Copy link
Contributor

I would expect an openshift-control-plane role that did 1, but not an openshift-cluster role that did 1 and 3

@smarterclayton
I would expect number 1 to be 3 roles. A role for etcd, a role for certs, and a role for masters. Possibly cert role can be combined with one or the other.

I would expect our high-level play layout to be something along the lines of

  1. Prerequisites
  2. Setup infra (etcd, storage hosts, whatever that openshift needs to run)
  3. Setup Masters
  4. Setup Nodes
  5. Setup Hosted.

@mtnbikenc
Copy link
Member

This is a good discussion and I'm onboard with the general direction.

@kwoodson kwoodson force-pushed the cluster_install_order branch 2 times, most recently from 51f6819 to 32af1ab Compare November 12, 2017 18:24
@kwoodson
Copy link
Contributor Author

@mtnbikenc, I have updated the code with your recommendation to include openshift-checks/install.yml.

Please review again as I believe this is ready to go.


- name: run the config
include: ../../common/openshift-cluster/config.yml
- name: run the std_include
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name: should be openshift-checks.

Copy link
Member

@mtnbikenc mtnbikenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit on the task name.

@mtnbikenc mtnbikenc self-assigned this Nov 13, 2017
@smarterclayton
Copy link
Contributor

I would expect number 1 to be 3 roles. A role for etcd, a role for certs, and a role for masters. Possibly cert role can be combined with one or the other.

Maybe. I'm not interested in "generic etcd" - our requirements for how masters are laid out are also going to change - we're going to stop using system units, we're going to use static pods, and so there will be some coupling of dependencies. As long as we're cautious in our factorings (I.e., separate etcd is not a goal, but reusable etcd within a larger framework may be) we should be ok.

The control plane is probably going to look like:

  1. assume the host is a node that supports static run-once pods (a detail of how openshift node is configured)
  2. lay down a set of core configuration for each sub component (etcd, kube-apiserver, controller managers, service signer)
  3. lay down a set of static pod definitions for each sub component (etcd, kube-apiserver, controller managers, service signer)
  4. start and enable the node run-once process

If subcomponent factorings for the etcd role make sense, that's fine, but I don't want to over factor those roles until we have the next step laid out for config. We must get to static pod configs in the 3.9 timeframe, so I want us all on the same page about what that is before we move on.

@kwoodson kwoodson force-pushed the cluster_install_order branch from 32af1ab to 3d14183 Compare November 13, 2017 22:47
@kwoodson
Copy link
Contributor Author

@mtnbikenc, Thanks for finding that. I updated the name.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 14, 2017
Copy link
Member

@mtnbikenc mtnbikenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@mtnbikenc
Copy link
Member

/test install

@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 15, 2017

@kwoodson: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/upgrade 3d14183 link /test upgrade

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue.

@openshift-merge-robot openshift-merge-robot merged commit d4b6e2c into openshift:master Nov 15, 2017
@kwoodson kwoodson deleted the cluster_install_order branch March 5, 2018 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants