Skip to content

Conversation

@celebdor
Copy link
Contributor

@celebdor celebdor commented Nov 30, 2017

This PR adds the necessary network resources for Kuryr in the heat
stack, including putting a l4 load balancer in front of the API.

It then automatically uses the resources to generate the OSEv3 variables
needed for the already existing kuryr role.

Change-Id: I6fc4c6bc9835217334db1289987daf358dcf287b
Signed-off-by: Antoni Segura Puimedon antonisp@celebdor.com

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 30, 2017
@celebdor celebdor changed the title [wip]Kuryr var generation in OSt dynamic inventory Kuryr var generation in OSt dynamic inventory Dec 3, 2017
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 3, 2017
return None

data = {}
for output in stack['outputs']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we instead use metadata properties, like we do above, and omit the stack outputs parsing, and hardocded stack names as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're working on not having to hardcode the stack name (the idea is adding a tag to the stack which we'll filter for here).

But this is exactly what Heat outputs are for? Why shouldn't we use them? Also, where should we attach this metadata other than the stack? Some of the data here (the network and SG ids) are not known before the stack is created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what heat outputs are for

# TODO: Filter the cluster stack with tags once it is supported in shade
hardcoded_cluster_name = 'openshift.example.com'
stack = cloud_client.get_stack(hardcoded_cluster_name)
if stack is None or stack['stack_status'] != 'CREATE_COMPLETE':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want a dynamic inventory to be stack-state-aware. Let's just note instead one shall use it after the stack is created perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC how the dynamic inventory works, everytime that ansible needs inventory data, the script gets run. I am not sure there is a way to prevent the dynamic inventory from running before the stacking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Toni is correct, the inventory script is run at the before any tasks are executed as well (where no stack exists yet). So the check needs to happen. But Toni: we could also just ignore the stack state and only set the vars if the right outputs exist.

If you want to keep checking the stack status (which is fine by me), it must also include UPDATE_COMPLETE though.

Copy link
Contributor

@bogdando bogdando Dec 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Folks, could we please please do not introduce the finite-state machine for heat stacks into a dynamic inventory? Let's keep this simple please.

cloud_client.auth['user_domain_id'])
settings['kuryr_openstack_project_id'] = cloud_client.current_project_id
settings['kuryr_openstack_project_domain_name'] = (
cloud_client.auth['project_domain_id'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we already have inventory vars for this. Let's use them as a source of creds instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kuryr role is pre-existing and does not currently depend on the openshift_openstack playbook. That's why we adapt those vars here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this inventory.py has to maintain compatibility with external use cases? Please put fixme then, I hope we'll have that kuryr role merged into this repo eventually. So this can be simplified to the existing inventory vars instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put the fixme

#os_sdn_network_plugin_name: cni
#openshift_node_proxy_mode: userspace
#openshift_hosted_manage_registry: false
#openshift_hosted_manage_router: false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add here also:
#use_trunk_ports: true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also mention that this requires the Trunk ports extension (which I didn't have enabled). This guide describes how to detect and validate it's enable:

https://docs.openstack.org/neutron/pike/admin/config-trunking.html

I wonder if we can check for this in our prerequisites (as a separate PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I'll update the patch

#
## You should set the following if you want to use Kuryr/Neutron as your SDN
#openshift_use_kuryr: True
#openshift_use_openshift_sdn: False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_trunk_ports: True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@tomassedovic tomassedovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I've managed to create a Heat stack, but the full deployment didn't finish for me yet.

That may be an unrelated issue, will check later. In the meantime, I've left a few comments inline.

# TODO: Filter the cluster stack with tags once it is supported in shade
hardcoded_cluster_name = 'openshift.example.com'
stack = cloud_client.get_stack(hardcoded_cluster_name)
if stack is None or stack['stack_status'] != 'CREATE_COMPLETE':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Toni is correct, the inventory script is run at the before any tasks are executed as well (where no stack exists yet). So the check needs to happen. But Toni: we could also just ignore the stack state and only set the vars if the right outputs exist.

If you want to keep checking the stack status (which is fine by me), it must also include UPDATE_COMPLETE though.


# # If you VM images will name the ethernet device different than 'eth0',
# # override this
#kuryr_cni_link_interface: eth0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kuryr role doesn't actually have a default for this, so it must always be set. I'd prefer adding the default, but if not, the description here (and in the README) must change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better set the default. Done

settings['kuryr_openstack_username'] = cloud_client.auth['username']
settings['kuryr_openstack_password'] = cloud_client.auth['password']
settings['kuryr_openstack_user_domain_name'] = (
cloud_client.auth['user_domain_id'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we require Keystone V3? If so, we should mention that in the readme.

Also, this must fall back to user_domain_name if user_domain_id isn't there. And same below with the project_domain_id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I proposed to take data from inventory vars and omit that complexity of v2 vs v3 keystone. Let's please follow up on this item if I just can't get the case right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll document and put a FIXME for allowing v2

#os_sdn_network_plugin_name: cni
#openshift_node_proxy_mode: userspace
#openshift_hosted_manage_registry: false
#openshift_hosted_manage_router: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also mention that this requires the Trunk ports extension (which I didn't have enabled). This guide describes how to detect and validate it's enable:

https://docs.openstack.org/neutron/pike/admin/config-trunking.html

I wonder if we can check for this in our prerequisites (as a separate PR).

@tomassedovic
Copy link
Contributor

Here's the issue I'm hitting in task [openshift_service_catalog : wait for api server to be ready]:

fatal: [master-0.openshift.example.com]: FAILED! => {"attempts": 120, "changed": false, "cmd": ["curl", "-k", "https://apiserver.kube-service-catalog.svc/healthz"], "d
elta": "0:00:10.523017", "end": "2017-12-04 13:07:43.308304", "failed": true, "msg": "non-zero return code", "rc": 6, "start": "2017-12-04 13:07:32.785287", "stderr":
"  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0
    0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     0
  0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0\r  0     0    0
 0    0     0      0      0 --:--:--  0:00:03 --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0\r  0     0    0     0    0
     0      0      0 --:--:--  0:00:05 --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:--  0:00:06 --:--:--     0\r  0     0    0     0    0     0
     0      0 --:--:--  0:00:07 --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:--  0:00:08 --:--:--     0\r  0     0    0     0    0     0      0
     0 --:--:--  0:00:09 --:--:--     0curl: (6) Could not resolve host: apiserver.kube-service-catalog.svc; Name or service not known", "stderr_lines": ["  % Total
 % Received % Xferd  Average Speed   Time    Time     Time  Current", "                                 Dload  Upload   Total   Spent    Left  Speed", "", "  0     0
  0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0", "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0", "  0     0    0
     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0", "  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0", "  0     0    0
  0    0     0      0      0 --:--:--  0:00:03 --:--:--     0", "  0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0", "  0     0    0     0
    0     0      0      0 --:--:--  0:00:05 --:--:--     0", "  0     0    0     0    0     0      0      0 --:--:--  0:00:06 --:--:--     0", "  0     0    0     0
 0     0      0      0 --:--:--  0:00:07 --:--:--     0", "  0     0    0     0    0     0      0      0 --:--:--  0:00:08 --:--:--     0", "  0     0    0     0    0
    0      0      0 --:--:--  0:00:09 --:--:--     0curl: (6) Could not resolve host: apiserver.kube-service-catalog.svc; Name or service not known"], "stdout": "", "s
tdout_lines": []}

@tomassedovic
Copy link
Contributor

Okay, so I'm seeing the service catalog issue above even on master without Kuryr.

@bogdando
Copy link
Contributor

bogdando commented Dec 4, 2017

@tomassedovic perhaps we want this #6086 (comment)

@tomassedovic
Copy link
Contributor

@bogdando good find! That worked for me on master, I'll try it with this PR. For anyone else wanting to try it, try putting these lines in your inventory/group_vars/OSEv3.yml:

openshift_enable_service_catalog: false
template_service_broker_install: false

(but as the original commenter pointted out, that means we're not deploying OpenShift 3.7 fully)

@tomassedovic
Copy link
Contributor

@celebdor okay, with the fixes above, I got to a the deployment finished successfully.

So, how do I test this? It's not running the router and registry pods. You mentioned using oadm to create them. Can you paste the exact command to run? Going forward, can we have them created by default as with the default SDN?

How can I verify that it's all working?

Copy link
Contributor

@bogdando bogdando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document everything needed to test this and have e2e passing

@tomassedovic
Copy link
Contributor

@bogdando if I understand it correctly, the openshift router is not implemented in kuryr yet, so the traditional end-end we've been doing -- where we spin up a pod, create a route and verify its connectivity won't work with this patch.

But we should document what is testable and how to do it.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 7, 2017
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 21, 2017
# Copyright 2016 Red Hat, Inc. and/or its affiliates
#
# Ansible is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
Copy link
Contributor

@bogdando bogdando Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok to add GNU / GPL here (for this repo)?

Copy link
Contributor

@bogdando bogdando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, hoping to see how-to-test instructions please

luis5tb added a commit to luis5tb/openshift-ansible that referenced this pull request Jan 10, 2018
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 12, 2018
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2018
@tomassedovic
Copy link
Contributor

/ok-to-test

@openshift-ci-robot openshift-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 13, 2018
Copy link
Contributor

@tomassedovic tomassedovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested this yet (will do shortly) but it looks solid. Needs a few small changes though.

def _get_kuryr_vars(cloud_client):
"""Returns a dictionary of Kuryr variables resulting of heat stacking"""
# TODO: Filter the cluster stack with tags once it is supported in shade
cluster_name = os.getenv('OPENSHIFT_CLUSTER', 'openshift.example.com')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default should match the one in roles/openshift_openstack/defaults/main.yml, otherwise we get failures when OPENSHIFT_CLUSTER is unset.

I would also prefer the default to be something like openshift-cluster or just openshift. If the deployer wants to tie it to the cluster domain, let them be explicit about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Went with 'openshift-cluster'

import shade
HAS_SHADE = True
except ImportError:
HAS_SHADE = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: I'd rather see shade = None here and test for that instead of introducing another global. But it's not a big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is a bit odd. I did it to match the upstream ansible/ansible code in hopes of submitting it there at some point:
https://github.com/ansible/ansible/blob/devel/lib/ansible/modules/cloud/openstack/os_keypair.py#L91-L95

All the modules there do it like this (at least the ones I checked)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair enough. Thanks!


- name: set stack name
set_fact:
openshift_openstack_stack_name: "{{ lookup('env', 'OPENSHIFT_CLUSTER') | ternary (lookup('env', 'OPENSHIFT_CLUSTER'), omit) }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set fact is kind of hard to track and we don't generally allow env lookup in roles as that can be confusing to the role users.

Please set the default in roles/openshift_openstack/defaults/main.yml to the same value as in the inventory (e.g. openshift-cluster) and put the env lookup to the import_role call in playbooks/openstack/openshift-cluster/provision.yml.

That way the standalone users of this role will have full control over the value from Ansible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


{% if not openshift_openstack_provider_network_name %}
{% if openshift_use_kuryr|default(false)|bool %}
api_lb:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking: is the API load balancer actually necessary or is it possible to run without it?

If the latter, I'd like to make it optional, but we can deal with that later (I'll update my LB patch on top of this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary for the pod workloads that talk to the openshift API.

@tomassedovic tomassedovic self-assigned this Feb 13, 2018
#os_sdn_network_plugin_name: cni
#openshift_node_proxy_mode: userspace
#openshift_hosted_manage_registry: false
#openshift_hosted_manage_router: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The all.yml sample file doesn't require router to be disabled but this does. Which is it? (we should be consistent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are not networking the routers (they use host networking) they should be fine. Registry still needs to wait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 28, 2018
@openshift-merge-robot openshift-merge-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 28, 2018
@tomassedovic
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 28, 2018
@openshift-merge-robot openshift-merge-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 28, 2018
celebdor and others added 12 commits March 2, 2018 10:39
The controller only needs the segmentation driver config, which changed
its name to default_driver.

The cni part only cares about the link device.

Change-Id: I757425c47bba976b0e5f526d98d7b396e861ec48
Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
This patch adds the necessary network resources for Kuryr in the heat
stack, including putting a l4 load balancer in front of the API.

It then automatically uses the resources to generate the OSEv3 variables
needed for the already existing kuryr role.

Change-Id: I6fc4c6bc9835217334db1289987daf358dcf287b
Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
In order to fail gracefully we must detect when Neutron is not
configured in the way Kuryr needs.

Change-Id: I2c07ddc864b8622b958f18b4808f9d96f3906532
Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
It is important to allow users to point to their own image builds as
well as making life easier for developers that are developing some
feature and want to point to their own registry.

Change-Id: I4e2d69af2adf106f38167e9f55aa8e519cb8d0e4
Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
Lacking this entry made the sanity check tasks fail when deploying
openshift on openstack with kuryr.

Change-Id: Ia139d0474f0de9584c814b6dc0e2bc8f4ea1eb84
Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
This option was missing and was preventing the ConfigMap rendering from
working.

Change-Id: I4d8b1440f3fe2d20b9637a1fa8e4628d72694f70
Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
Upstream kuryr uses kube-system. However, in OpenShift it comes with a
default nodeSelector as if it were a namespace for apps. It seems that
metrics and other openshift system Deployments and DaemonSets are
expected to use 'openshift-infra' instead.

Change-Id: Iebe09c0016320ccde0664545cefbba954d1389e0
Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
The legacy neutron-lbaasv2 implementation based on haproxy is not aware
of security groupts at VIP creation and sets the default SG. The default
SG may not provide access from the cluster ingress, so we make sure in
this patch that the right SG is used.

This was not an issue with OpenStack's hybrid firewall because the
haproxy running on namespaces were bypassing the security groups. The
problem popped up when newer firewall implementations in Neutron fixed
that hole and exposed the wrong SG setting.

Finally, it is not possible to set the SG directly in Heat and that is a
wontfix bug there.

Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
Change-Id: Ia5100609655fef2ad7906a2f9df63aa8e8d2f33f
Change-Id: Ic48bbdc69e1d905f2bee0c1de914307e526e6356
Change-Id: Ie90428e79d1ca97719300c64226d12c5a762244f
Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
The previous patches in the branch left kuryr working but made things
break when no LB nor kuryr were in the deployment.

Change-Id: I3a40d5934b901ba50e098c8c3bcef033cb1c931f
Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
Change-Id: Ic41853de7957eb7f603cc307565207241a232fec
Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
@tomassedovic
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 2, 2018
@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@tomassedovic
Copy link
Contributor

/retest

1 similar comment
@tomassedovic
Copy link
Contributor

/retest

@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue.

@openshift-merge-robot openshift-merge-robot merged commit 74ef4d9 into openshift:master Mar 3, 2018
@openshift-ci-robot
Copy link

@celebdor: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/system-containers f5c83ef link /test system-containers
ci/openshift-jenkins/containerized f5c83ef link /test containerized
ci/openshift-jenkins/extended_conformance_install_crio f5c83ef link /test crio
ci/openshift-jenkins/logging 72a1515 link /test logging

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants