Skip to content

Conversation

@tomassedovic
Copy link
Contributor

This adds the OpenStack provider support under playbooks/openstack and attempts to follow a similar file layout as the AWS code that's already there.

The playbooks and roles come from the openshift-ansible-contrib repository, specifically:

https://github.com/openshift/openshift-ansible-contrib/tree/master/playbooks/provisioning/openstack

Current status

We can do a functional multi-master OpenShift deployment on OpenStack Newton (or OSP 10) or newer. The playbooks here create the necessary OpenStack resources (VMs, networking, security groups, etc.).

The resulting cluster should more or less correspond to this reference architecture:
https://access.redhat.com/documentation/en-us/reference_architectures/2017/html-single/deploying_and_managing_red_hat_openshift_container_platform_3.6_on_red_hat_openstack_platform_10/

We don't do a much in terms of using the OpenStack-native resources, but that's something we want to look at soon (for load balancing, DNS, baremetal deployment, authentication, networking, etc.).

Deployment workflow overview:

  1. We do all configuration up front
  2. Then the provision playbook creates a Heat stack in the background (think of Heat as CloudFormation for OpenStack) -- this creates the VMs, networking, etc.
  3. We configure the DNS (either by running the dns-server role on a VM from step 2 or by using the dynamic dns updates to a pre-existing server)
  4. We configure all the VMs to use the DNS and install any dependencies, etc. that are necessary for openshift installation
  5. Finally, we run the byo playbooks to install OpenShift.

History

The code came from the casl-ansible project. It has consequently moved to openshift-ansible-contrib.

And now we'd like to move and maintain it here and build it into a Red Hat supported deployment mechanism for OpenStack.

Some differences from the AWS provider

  1. We don't have playbooks for building the golden images yet
  2. Related: we create the entice cluster all at once (as opposed to the two-step "deploy master cluster, then add nodes which register themselves" approach AWS takes)
  3. OpenStack doesn't have a universal support for DNS yet, so we allow using an external DNS and use RFC 2136 dynamic dns updates to populate it or create one ourselves and install BIND on it (for proof of concept deployments)
  4. Our steps use the group_vars for configuration, AWS passes -e @provisioning_vars.yml to ansible-playbook instead. I quite like the group_vars approach, but we can switch to extra-vars if you prefer. Both providers work with either approach.

Additional questions and notes

  1. We use Ansible Galaxy to optionally bring in the dns-server role from: https://github.com/redhat-cop/infra-ansible. What's openshift-ansible's stance towards using Galaxy modules?
  2. We have a travis-based end-to-end CI running on every pull request. Can we integrate with your CI? Travis is not really suited for this. We can provide the necessary OpenStack resources via RDO Cloud.
  3. Since we've experienced breakages introduced by openshift-ansible in the past, would it make sense to run the CI even on non-openstack changes? At least as a periodic job.
  4. The @redhat-cop/casl people (who are using these roles in openshift-ansible-contrib) would like us to preserve the full history of the files we're bringing in. If you're fine with that, I'll create another pull request that does that. It would look something like this: Add an Openstack provider openshift-ansible-contrib#397
  5. I'm sure some of the code here could use some clean up. I'd prefer if we merged what we have and did that in subsequent PRs, but it's your show.
  6. Ditto for the roles this brings in -- some of them should probably be renamed -- should we do that here or can it be done later?

/cc @sdodson @kwoodson @redhat-cop/casl @mperezco @tzumainn @celebdor @bogdando @Tlacenka

@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 18, 2017
@tomassedovic
Copy link
Contributor Author

(sorry for the size of the pull request -- we can split it into smaller commits, but most of it is necessary for the full deployment)

@michaelgugino
Copy link
Contributor

I appreciate you submitting this PR. I definitely want to see openstack supported as a target.

Overall, this patch set needs a fair amount of work. A few issues I want to highlight:

  1. The plays contained in this patch set duplicate a lot of the logic that our existing roles and plays implement. This PR needs to better utilize what we have in tree, rather than re-implementing these pieces. An example: Docker registries.
  2. Variable namespacing. The roles that are being added in this PR don't use great namespacing for their variables.
  3. Too much in plays, not enough in roles for the openstack pieces. I would like to see a role similar to openshift_aws. That role breaks down the provisioning into logical components. Those components are then called using "include_role" and "tasks_from".
  4. Continued from 3, due to the fact that things are being done in plays and not roles, an unnecessary amount of 'set_fact' is being performed. These default variable values can be easily set and overridden with role defaults.

Please see the latest version of our aws provisioning README: https://github.com/openshift/openshift-ansible/blob/master/playbooks/aws/README.md#high-level-overview

That section 'high-level-overview' breaks down all the steps to deploy a cluster onto AWS. We are reusing as much of the existing roles and plays as we can during the installation process. In fact, if you're not baking a golden node image, you can probably reuse almost all of it.

An example breakdown might be:

  1. setup openstack-specific prerequisites; flavor, network, ssh keypair, etc as you see fit.
  2. provision instances on openstack.
  3. run the installer as normal on our newly created instances.

We use Ansible Galaxy to optionally bring in the dns-server role from: https://github.com/redhat-cop/infra-ansible. What's openshift-ansible's stance towards using Galaxy modules?

We vendor in roles into roles/ as necessary.

Our steps use the group_vars for configuration

I prefer to use group_vars as well, and the aws provisioning supports them, although it is maybe not as obviously. I recently added a note to the provisioning docs to this effect.

OpenStack doesn't have a universal support for DNS yet

We'll need to discuss this more in-depth I think during our architecture call. If we're going to ship a role that deploys bind and we support it, it better be in really, really good shape. I haven't reviewed this portion of the PR.

Can we integrate with your CI? Travis is not really suited for this. We can provide the necessary OpenStack resources via RDO Cloud.

This would definitely be my preference. I would love to see this supported in our CI.

@bogdando
Copy link
Contributor

@tomassedovic please preserve the git history

@tomassedovic
Copy link
Contributor Author

Thanks @michaelgugino!

Yeah what I've done so far is just moved all the code over and did changes necessary to get a successful deployment.

I'm sure there's a lot of roles we can re-use. I have looked at the AWS provider, but I didn't spend a lot of time looking at the roles yet. I'll make sure to do that and see how much we can share.

I'm a bit confused about your comments regarding the high-level overview and breakdown -- the process here is exactly as you described. Are you asking for docs changes to reflect that better?

I'll make sure to attend the architecture call, but regarding the DNS. When it comes to OpenStack, you have two options:

  1. Bring your own DNS. If it supports dynamic updates, we'll add the entries, if not, set them up yourself
  2. We can create and configure a DNS for evaluation purposes for you

The DNS I mentioned is in the latter category and is maintained by the CASL people here:

https://github.com/redhat-cop/infra-ansible/tree/master/roles/dns-server

I'm not sure we necessarily want to ship and support that (or a similar) role for production use, but would it be possible to have it there for the evaluation purposes?

Right now, you can get from zero to openshift with a single ansible-playbook run. I'd still love to keep the push-button option for folks who want to try it out.

@oybed any comments on the DNS role and situation?

@bogdando
Copy link
Contributor

@michaelgugino please note that the purpose of the merge is to streamline the development process for the openstack provider, inlcude it into CI pipelines, simplify (and enable) packaging, which is basically "productization baby-steps". Please keep in mind that major rework of playbooks and roles is placed on the provider team's roadmap. The concerns you've raised are valid and correct, thank you for reviewing that! We'll be addressing those eventually, once we have a streamlined development process done. I hope that works.

@michaelgugino
Copy link
Contributor

I'm a bit confused about your comments regarding the high-level overview and breakdown -- the process here is exactly as you described. Are you asking for docs changes to reflect that better?

@tomassedovic
If you take a look at the plays outlined in those sections (the install.yml part specifically, I believe), you'll see how we're using some existing roles to install the cluster.

please note that the purpose of the merge is to streamline the development process for the openstack provider, inlcude it into CI pipelines, simplify (and enable) packaging, which is basically "productization baby-steps". Please keep in mind that major rework of playbooks and roles is placed on the provider team's roadmap. The concerns you've raised are valid and correct, thank you for reviewing that! We'll be addressing those eventually, once we have a streamlined development process done. I hope that works.

@bogdando
I think if that is the case, there are alternatives to merging this as-is into master. Perhaps we could setup a feature branch that can be contributed to/have PR's filed against, and when it's in good shape, we can open a PR to merge into master.

@sdodson Any suggestions here?

All the tasks that were previously in playbooks are now under
`roles/openshift_openstack`.

The `openshift-cluster` directory now only contains playbooks that
include tasks from that role. This makes the structure much closer to
that of the AWS provider.
@tomassedovic
Copy link
Contributor Author

@michaelgugino the new commit makes some of the changes you've requested.

  1. Created the openshift_openstack role and moved tasks that were previously in playbooks there
  2. Added the provision_install.yml playbook that includes provision.yml and install.yml -- these playbooks use include_role and tasks_from to get the actual tasks from the new role
  3. Removed the DNS -- we'll need to figure it out, but that's a sizeable discussion on its own, so I've made it so the playbooks require a DNS that's set up for now and we'll address this in a later PR

I'll take a look at the variable naming/namespacing and re-using more code from the existing roles next.

If you have time though, please let take a look and me know whether this is closer to what you would expect.

Thanks!

@tomassedovic
Copy link
Contributor Author

@michaelgugino regarding variable namespacing, do you mean that any openstack-specific variable should be prefixed with openshift_openstack. It's a bit of a mouthful, but I'm not sure how to make it shorter. OSP is Red Hat-specific, would openshift_os_* work?

Copy link
Contributor

@michaelgugino michaelgugino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks to be in much better shape. I appreciate all the changes you made. Just a few more items I would like to discuss, but this patch set is definitely moving in the right direction.

@@ -0,0 +1,90 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a docker role. Please remove this play and modify openstack provisioning to consume that role.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the playbooks in custom-actions were just examples of custom actions people might want to do. I'll remove them from now and we'll see whether it makes sense to add them using the right roles later.

---
- hosts: cluster_hosts
become: true
vars:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will override inventory/group vars. You should not define empty vars like this. use '| default([])' where the var is consumed.

@@ -0,0 +1,13 @@
---
- hosts: cluster_hosts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not conform to our group names.

@@ -0,0 +1,13 @@
---
- hosts: cluster_hosts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group name.

@@ -0,0 +1,12 @@
---
- hosts: cluster_hosts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group name.

@@ -0,0 +1,9 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

@@ -0,0 +1,70 @@
---
# Get the needed information about the current deployment
- hosts: masters[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not conforming to the group names we use elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, the install.yml playbook for AWS does use the masters group:

What should we use instead? oo_first_master?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that other play used masters group. We'll probably need to change that.

Yes, oo_first_master would the the correct one for the first host of the master group.

@@ -0,0 +1,49 @@
---
stack_name: "{{ env_id }}.{{ public_dns_domain }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these variables are not namespaced well. Also, do we need to put these in vars/main.yml? Could they live in defaults/main.yml? That will allow them to be overriden by inventory.

Probably some of these variables don't need to be defined, we can just use the "openstack_x" variables in their places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 good point with using defaults instead of vars. Ansible has a tricky precedence, so let's follow this recommendation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my last 'big item' for cleanup.

@@ -0,0 +1,9 @@
# Role openstack-stack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this role with openshift-openstack and use include_role.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, we already do that, I just forgot to remove the old role.

@@ -0,0 +1,25 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the primary purpose of this role? This part might need to stay in contrib?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. The contrib playbooks let you use either a dynamic inventory (as in this PR) or use this role to generate a static inventory. That's not used in this PR, so I can drop it for now and we deal with this in another change.

@tomassedovic
Copy link
Contributor Author

Thanks! I can handle most of those comments by removing some more stuff :-). The playbooks in custom-actions serve more of an example/documentation purpose and are not directly used by anything else in here.

I'll take a look at the group names and vars next. Could you please point me out to the list of groups I can use. I grepped through the source, but it does show up both OSEv3 and masters so I'm not sure what the preferred groups are and where are they supposed to come from.

@michaelgugino
Copy link
Contributor

@tomassedovic the group names are created in playbooks/common/openshift-cluster/evaluate_groups.yml, and the group names that play uses are created in playbooks/byo/openshift-cluster/initialize_groups.yml

I know it's confusing, I'm not a huge fan of it, but that's what we're doing currently and I'd like to stay consistent where possible.

@tomassedovic
Copy link
Contributor Author

@michaelgugino thanks, this is quite useful! I certainly don't want to go against the grain here.

I'll update the PR.

The `openstack-stack` role is now under `openshift_openstack` and the
`openstack-create-cinder-registry` one will be added there, later.
They're duplicating a lot of functionality that's already in
openshift-ansible and they're not actually used from the provisioning
playbooks.

We'll revisit them later.
They're not necessary for the initial PR so let's add them properly
later.
@tomassedovic
Copy link
Contributor Author

@michaelgugino that should be most of your comments except for the vars stuff. I'll take a look at that on Monday.

Also, I need to re-test it to make sure it all still works.

Copy link
Contributor

@bogdando bogdando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit removes more roles than it describes, please clarify the static_inventory role removal.

# and configure their DNS if they have to.

- name: Prepare the Nodes in the cluster for installation
hosts: cluster_hosts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this name change needs to be covered for the docs and the static_inventory and dynamic inventory please

@bogdando
Copy link
Contributor

@tomassedovic The changes you submit on top of the dropped history will end up in a confusing git blame UX for developers. Could you please firstly copy the history as is, then put your changes on top?

@tomassedovic
Copy link
Contributor Author

I still haven't heard from the openshift-ansible folks whether they would accept that going in.

@michaelgugino @sdodson can we add the git history here? It would bring in a bunch of commits, but it would keep the commits, git blame, etc. The PR would look something like this:

openshift/openshift-ansible-contrib#397

Copy link
Contributor

@bogdando bogdando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted there is the rhel_subscribe role exists. Could you please as well remove the subscription role and update the code to use the former role?

@tomassedovic
Copy link
Contributor Author

Yep, that's on my list.

The contents of roles/openshift_openstack/vars/main.yml were moved to
the defaults/main.yml file instead.

There are now duplication warnings we need to address, but the
deployment does still work.
Most of the vars in `roles/openshift_openstack/defaults/main.yml` are
now prefixed with `openstack_`.
@tomassedovic
Copy link
Contributor Author

@michaelgugino this latest set of changes handles the default/vars split as well any fixes I had to do to make this work again.

I've removed most of the vars duplications (num_etcd vs openstack_num_etcd) and added the openstack_ prefix so it should be better namespaced now. There's still a few stragglers -- let me know if you want me to fix all the names here or do that later.

For everyone else's benefit: the openshift-ansible team has OKd preserving git history so like I said, once this is good & done, I'll clone this PR and add the history there.

@tomassedovic
Copy link
Contributor Author

@jeremyeder @mbruzek folks on the openshift-ansible architecture meeting suggested you take a look at this since you're doing a lot of openstack-based deployments.

The code here is basically just the playbooks from openshift-ansible-contrib plus changes that make it more consistent with the AWS provider.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 1, 2017

@tomassedovic: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/tox 9e07520 link /test tox

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@michaelgugino michaelgugino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see the variable namespacing cleaned up. That will prevent any issues in the future, I didn't check every variable name against the rest of openshift-ansible as there are quite a few, but best practice is to include the role name as a prefix to the variable name.

I tend to apply this best-practice even on registered variables.

@@ -0,0 +1,147 @@
---
env_id: "openshift"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to namespace this variable.

@@ -0,0 +1,147 @@
---
env_id: "openshift"
public_dns_domain: "example.com"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace.

@@ -0,0 +1,95 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need lots of namespacing here.

- block:
- name: create the docker-storage config file
template:
src: "{{ role_path }}/templates/docker-storage-setup-overlayfs.j2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need '{{ role_path }}/templates/' here.

- block:
- name: create the docker-storage-setup config file
template:
src: "{{ role_path }}/templates/docker-storage-setup-dm.j2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need '{{ role_path }}/templates/' here.

- block:
- name: create the docker-storage-setup config file for CentOS
template:
src: "{{ role_path }}/templates/docker-storage-setup-dm.j2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need '{{ role_path }}/templates/' here.

- "{{ openstack_etcd_image }}"
- "{{ openstack_dns_image }}"
loop_control:
loop_var: image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a better name. Probably don't need this variable, using {{ item }} is self-explanatory to the end-user/developer that something is looping over the play.

loop_var: image

# Check that custom flavors are available
- include: custom_flavor_check.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as the image task above.


- include: node-network.yml

- name: "Verify SELinux is enforcing"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would move this to the top of this file, or somewhere much earlier in the play run.

Because the templates are present in a role, the `template` module is
able to look them up directly, without having to use `{{ role_path
}}/templates`.
The `openstack_*_network_name` vars are strings, not booleans, so the
absense shouldn't really be marked by `False`.
This makes sure that all the variables used in the `openshift_openstack`
role are prefixed with `openshift_openstack_` as is the convention.
@tomassedovic
Copy link
Contributor Author

That should address the latest batch of comments. I've namespaced everything and tested the deployment still works (it does!).

Copy link
Contributor

@michaelgugino michaelgugino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@michaelgugino
Copy link
Contributor

@tomassedovic Thank your for taking the time to address all of my comments. I realize it was a non-trivial amount of work and I appreciate the level of effort you provided.

This looks to be in pretty good shape now, I think it can be merged.
@sdodson would you like to look this over again or have someone else review it?

@tomassedovic Are you going to push a new PR with all the commit history?

@tomassedovic
Copy link
Contributor Author

@michaelgugino yep, that's the plan. I'll create a new PR tomorrow that has all the git history.

Copy link
Contributor

@bogdando bogdando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking of CASL folks who use the provisioning playbooks as a dependency, let's make sure we'll keep the old naming working, yet added a deprecation warnings. This can be done in follow up patches I suppose. Until then, CASL team could continue using the "old" location of the modules.

@tomassedovic
Copy link
Contributor Author

Right so when I said "tomorrow" I apparently meant "at some point within the next 10 days or so". I'm too busy with travel right now and I'm not sure when I'll be able to get to this.

@michaelgugino
Copy link
Contributor

Just thinking of CASL folks who use the provisioning playbooks as a dependency, let's make sure we'll keep the old naming working, yet added a deprecation warnings. This can be done in follow up patches I suppose. Until then, CASL team could continue using the "old" location of the modules.

I don't want to bring in a bunch of deprecated variables. casl should be the project where these deprecated variables are managed.

@oybed
Copy link
Member

oybed commented Nov 5, 2017

@michaelgugino @bogdando Agreed on the deprecated variables. The CASL project will monitor this PR and adopt to it. Also; I have some additional follow-up changes that I'd like to submit to make some of the HEAT templating more generic.

@tomassedovic I see some references to the DNS aspect still. Will that be addressed in a follow-up clean-up PR?

@mbruzek
Copy link
Contributor

mbruzek commented Nov 6, 2017

I will have to make several changes to variable names when this code lands, but it looks very similar to what I am using now. I looked over the changes and did not see any problems or issues at this time.
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 6, 2017
@tomassedovic
Copy link
Contributor Author

@oybed yeah, that's the plan. At this point, I'd rather merge it and clean it up. We'll have to address DNS in followup patches anyway.

@tomassedovic
Copy link
Contributor Author

Closing this PR in favour of #6039 which preserves the git history from openshift-ansible-contrib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants