-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Kuryr var generation in OSt dynamic inventory #6319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kuryr var generation in OSt dynamic inventory #6319
Conversation
| return None | ||
|
|
||
| data = {} | ||
| for output in stack['outputs']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we instead use metadata properties, like we do above, and omit the stack outputs parsing, and hardocded stack names as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're working on not having to hardcode the stack name (the idea is adding a tag to the stack which we'll filter for here).
But this is exactly what Heat outputs are for? Why shouldn't we use them? Also, where should we attach this metadata other than the stack? Some of the data here (the network and SG ids) are not known before the stack is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what heat outputs are for
| # TODO: Filter the cluster stack with tags once it is supported in shade | ||
| hardcoded_cluster_name = 'openshift.example.com' | ||
| stack = cloud_client.get_stack(hardcoded_cluster_name) | ||
| if stack is None or stack['stack_status'] != 'CREATE_COMPLETE': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want a dynamic inventory to be stack-state-aware. Let's just note instead one shall use it after the stack is created perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC how the dynamic inventory works, everytime that ansible needs inventory data, the script gets run. I am not sure there is a way to prevent the dynamic inventory from running before the stacking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Toni is correct, the inventory script is run at the before any tasks are executed as well (where no stack exists yet). So the check needs to happen. But Toni: we could also just ignore the stack state and only set the vars if the right outputs exist.
If you want to keep checking the stack status (which is fine by me), it must also include UPDATE_COMPLETE though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Folks, could we please please do not introduce the finite-state machine for heat stacks into a dynamic inventory? Let's keep this simple please.
| cloud_client.auth['user_domain_id']) | ||
| settings['kuryr_openstack_project_id'] = cloud_client.current_project_id | ||
| settings['kuryr_openstack_project_domain_name'] = ( | ||
| cloud_client.auth['project_domain_id']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems we already have inventory vars for this. Let's use them as a source of creds instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Kuryr role is pre-existing and does not currently depend on the openshift_openstack playbook. That's why we adapt those vars here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean this inventory.py has to maintain compatibility with external use cases? Please put fixme then, I hope we'll have that kuryr role merged into this repo eventually. So this can be simplified to the existing inventory vars instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put the fixme
playbooks/openstack/README.md
Outdated
| #os_sdn_network_plugin_name: cni | ||
| #openshift_node_proxy_mode: userspace | ||
| #openshift_hosted_manage_registry: false | ||
| #openshift_hosted_manage_router: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add here also:
#use_trunk_ports: true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also mention that this requires the Trunk ports extension (which I didn't have enabled). This guide describes how to detect and validate it's enable:
https://docs.openstack.org/neutron/pike/admin/config-trunking.html
I wonder if we can check for this in our prerequisites (as a separate PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I'll update the patch
| # | ||
| ## You should set the following if you want to use Kuryr/Neutron as your SDN | ||
| #openshift_use_kuryr: True | ||
| #openshift_use_openshift_sdn: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_trunk_ports: True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tomassedovic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I've managed to create a Heat stack, but the full deployment didn't finish for me yet.
That may be an unrelated issue, will check later. In the meantime, I've left a few comments inline.
| # TODO: Filter the cluster stack with tags once it is supported in shade | ||
| hardcoded_cluster_name = 'openshift.example.com' | ||
| stack = cloud_client.get_stack(hardcoded_cluster_name) | ||
| if stack is None or stack['stack_status'] != 'CREATE_COMPLETE': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Toni is correct, the inventory script is run at the before any tasks are executed as well (where no stack exists yet). So the check needs to happen. But Toni: we could also just ignore the stack state and only set the vars if the right outputs exist.
If you want to keep checking the stack status (which is fine by me), it must also include UPDATE_COMPLETE though.
|
|
||
| # # If you VM images will name the ethernet device different than 'eth0', | ||
| # # override this | ||
| #kuryr_cni_link_interface: eth0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Kuryr role doesn't actually have a default for this, so it must always be set. I'd prefer adding the default, but if not, the description here (and in the README) must change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better set the default. Done
| settings['kuryr_openstack_username'] = cloud_client.auth['username'] | ||
| settings['kuryr_openstack_password'] = cloud_client.auth['password'] | ||
| settings['kuryr_openstack_user_domain_name'] = ( | ||
| cloud_client.auth['user_domain_id']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we require Keystone V3? If so, we should mention that in the readme.
Also, this must fall back to user_domain_name if user_domain_id isn't there. And same below with the project_domain_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I proposed to take data from inventory vars and omit that complexity of v2 vs v3 keystone. Let's please follow up on this item if I just can't get the case right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll document and put a FIXME for allowing v2
playbooks/openstack/README.md
Outdated
| #os_sdn_network_plugin_name: cni | ||
| #openshift_node_proxy_mode: userspace | ||
| #openshift_hosted_manage_registry: false | ||
| #openshift_hosted_manage_router: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also mention that this requires the Trunk ports extension (which I didn't have enabled). This guide describes how to detect and validate it's enable:
https://docs.openstack.org/neutron/pike/admin/config-trunking.html
I wonder if we can check for this in our prerequisites (as a separate PR).
|
Here's the issue I'm hitting in task |
|
Okay, so I'm seeing the service catalog issue above even on master without Kuryr. |
|
@tomassedovic perhaps we want this #6086 (comment) |
|
@bogdando good find! That worked for me on master, I'll try it with this PR. For anyone else wanting to try it, try putting these lines in your inventory/group_vars/OSEv3.yml: (but as the original commenter pointted out, that means we're not deploying OpenShift 3.7 fully) |
|
@celebdor okay, with the fixes above, I got to a the deployment finished successfully. So, how do I test this? It's not running the router and registry pods. You mentioned using How can I verify that it's all working? |
bogdando
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document everything needed to test this and have e2e passing
|
@bogdando if I understand it correctly, the openshift router is not implemented in kuryr yet, so the traditional end-end we've been doing -- where we spin up a pod, create a route and verify its connectivity won't work with this patch. But we should document what is testable and how to do it. |
| # Copyright 2016 Red Hat, Inc. and/or its affiliates | ||
| # | ||
| # Ansible is free software: you can redistribute it and/or modify | ||
| # it under the terms of the GNU General Public License as published by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it ok to add GNU / GPL here (for this repo)?
bogdando
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, hoping to see how-to-test instructions please
|
/ok-to-test |
tomassedovic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested this yet (will do shortly) but it looks solid. Needs a few small changes though.
playbooks/openstack/inventory.py
Outdated
| def _get_kuryr_vars(cloud_client): | ||
| """Returns a dictionary of Kuryr variables resulting of heat stacking""" | ||
| # TODO: Filter the cluster stack with tags once it is supported in shade | ||
| cluster_name = os.getenv('OPENSHIFT_CLUSTER', 'openshift.example.com') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default should match the one in roles/openshift_openstack/defaults/main.yml, otherwise we get failures when OPENSHIFT_CLUSTER is unset.
I would also prefer the default to be something like openshift-cluster or just openshift. If the deployer wants to tie it to the cluster domain, let them be explicit about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Went with 'openshift-cluster'
| import shade | ||
| HAS_SHADE = True | ||
| except ImportError: | ||
| HAS_SHADE = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit: I'd rather see shade = None here and test for that instead of introducing another global. But it's not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is a bit odd. I did it to match the upstream ansible/ansible code in hopes of submitting it there at some point:
https://github.com/ansible/ansible/blob/devel/lib/ansible/modules/cloud/openstack/os_keypair.py#L91-L95
All the modules there do it like this (at least the ones I checked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, fair enough. Thanks!
|
|
||
| - name: set stack name | ||
| set_fact: | ||
| openshift_openstack_stack_name: "{{ lookup('env', 'OPENSHIFT_CLUSTER') | ternary (lookup('env', 'OPENSHIFT_CLUSTER'), omit) }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set fact is kind of hard to track and we don't generally allow env lookup in roles as that can be confusing to the role users.
Please set the default in roles/openshift_openstack/defaults/main.yml to the same value as in the inventory (e.g. openshift-cluster) and put the env lookup to the import_role call in playbooks/openstack/openshift-cluster/provision.yml.
That way the standalone users of this role will have full control over the value from Ansible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| {% if not openshift_openstack_provider_network_name %} | ||
| {% if openshift_use_kuryr|default(false)|bool %} | ||
| api_lb: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking: is the API load balancer actually necessary or is it possible to run without it?
If the latter, I'd like to make it optional, but we can deal with that later (I'll update my LB patch on top of this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary for the pod workloads that talk to the openshift API.
playbooks/openstack/README.md
Outdated
| #os_sdn_network_plugin_name: cni | ||
| #openshift_node_proxy_mode: userspace | ||
| #openshift_hosted_manage_registry: false | ||
| #openshift_hosted_manage_router: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The all.yml sample file doesn't require router to be disabled but this does. Which is it? (we should be consistent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are not networking the routers (they use host networking) they should be fine. Registry still needs to wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
/lgtm |
The controller only needs the segmentation driver config, which changed its name to default_driver. The cni part only cares about the link device. Change-Id: I757425c47bba976b0e5f526d98d7b396e861ec48 Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
This patch adds the necessary network resources for Kuryr in the heat stack, including putting a l4 load balancer in front of the API. It then automatically uses the resources to generate the OSEv3 variables needed for the already existing kuryr role. Change-Id: I6fc4c6bc9835217334db1289987daf358dcf287b Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
In order to fail gracefully we must detect when Neutron is not configured in the way Kuryr needs. Change-Id: I2c07ddc864b8622b958f18b4808f9d96f3906532 Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
It is important to allow users to point to their own image builds as well as making life easier for developers that are developing some feature and want to point to their own registry. Change-Id: I4e2d69af2adf106f38167e9f55aa8e519cb8d0e4 Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
Lacking this entry made the sanity check tasks fail when deploying openshift on openstack with kuryr. Change-Id: Ia139d0474f0de9584c814b6dc0e2bc8f4ea1eb84 Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
This option was missing and was preventing the ConfigMap rendering from working. Change-Id: I4d8b1440f3fe2d20b9637a1fa8e4628d72694f70 Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
Upstream kuryr uses kube-system. However, in OpenShift it comes with a default nodeSelector as if it were a namespace for apps. It seems that metrics and other openshift system Deployments and DaemonSets are expected to use 'openshift-infra' instead. Change-Id: Iebe09c0016320ccde0664545cefbba954d1389e0 Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
The legacy neutron-lbaasv2 implementation based on haproxy is not aware of security groupts at VIP creation and sets the default SG. The default SG may not provide access from the cluster ingress, so we make sure in this patch that the right SG is used. This was not an issue with OpenStack's hybrid firewall because the haproxy running on namespaces were bypassing the security groups. The problem popped up when newer firewall implementations in Neutron fixed that hole and exposed the wrong SG setting. Finally, it is not possible to set the SG directly in Heat and that is a wontfix bug there. Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com> Change-Id: Ia5100609655fef2ad7906a2f9df63aa8e8d2f33f
Change-Id: Ic48bbdc69e1d905f2bee0c1de914307e526e6356
Change-Id: Ie90428e79d1ca97719300c64226d12c5a762244f Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
The previous patches in the branch left kuryr working but made things break when no LB nor kuryr were in the deployment. Change-Id: I3a40d5934b901ba50e098c8c3bcef033cb1c931f Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
Change-Id: Ic41853de7957eb7f603cc307565207241a232fec Signed-off-by: Antoni Segura Puimedon <antonisp@celebdor.com>
|
/lgtm |
|
/test all [submit-queue is verifying that this PR is safe to merge] |
|
/retest |
1 similar comment
|
/retest |
|
/test all [submit-queue is verifying that this PR is safe to merge] |
|
Automatic merge from submit-queue. |
|
@celebdor: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This PR adds the necessary network resources for Kuryr in the heat
stack, including putting a l4 load balancer in front of the API.
It then automatically uses the resources to generate the OSEv3 variables
needed for the already existing kuryr role.
Change-Id: I6fc4c6bc9835217334db1289987daf358dcf287b
Signed-off-by: Antoni Segura Puimedon antonisp@celebdor.com