OpenStack: Doccument Admin Requirements, Post Deployment Steps, and Networking Arch #2148

iamemilio · 2019-08-02T18:38:26Z

No description provided.

tomassedovic

Thanks! This looks great, but I've got a couple of suggestions regarding the node count.

Happy to merge once those are addressed.

docs/user/openstack/README.md

tomassedovic · 2019-08-05T11:27:27Z

docs/user/openstack/README.md

Please also mention that the default is 3 so people aren't confused. Something like: Default 3, Minimum 2.

tomassedovic · 2019-08-05T13:05:49Z

/label platform/openstack

tomassedovic · 2019-08-05T14:22:18Z

/lgtm
/approve

/retest

tomassedovic · 2019-08-05T14:26:49Z

/lgtm

openshift-bot · 2019-08-05T17:21:29Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-08-05T18:16:27Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-08-05T21:01:50Z

/retest

Please review the full test history for this PR and help us cut down flakes.

tomassedovic

Sweet! More content -> more change requests I'm afraid :-).

There's one pretty important factual mistake in the bootstrap -> master VIP transition. And then a small note about the SG rule quota.

There's also a ton of other stuff we should probably document better such as:

sample install-config.yaml
where to download the RHCOS image from
more? Documentation is never done lol

But I'm more than happy to merge this once you've resolved the issues above and anything else can be done in later PRs. This is already a huge improvement.

tomassedovic · 2019-08-07T11:15:13Z

docs/design/openstack/networking-infrastructure.md

This section is (IIRC) not entirely correct. Initially, bootstrap does have a higher priority than the master's. However, once a master passes the health check, its priority increases and the bootstrap -> master VIP switch happens then.

We do not rely on the bootstrap teardown for the API VIP transition.

This is the initial master priority: https://github.com/openshift/machine-config-operator/blob/db0210f8bb30117bcabd9fa74d8e557d09f787a2/templates/master/00-master/openstack/files/openstack-keepalived-keepalived.yaml#L27

This is the weight based on the health check: https://github.com/openshift/machine-config-operator/blob/db0210f8bb30117bcabd9fa74d8e557d09f787a2/templates/master/00-master/openstack/files/openstack-keepalived-keepalived.yaml#L9

The weight gets added to the priority, turning it from 40 to 90.

Bootstrap's priority is 50 and there are no other checks that would increase it: https://github.com/openshift/machine-config-operator/blob/db0210f8bb30117bcabd9fa74d8e557d09f787a2/manifests/openstack/keepalived.conf.tmpl#L11

See: https://www.keepalived.org/manpage.html

A positive weight means that an OK status will add to
the priority of all VRRP instances which monitor it. On the opposite, a
negative weight will be subtracted from the initial priority in case of
insufficient processes.

/cc @celebdor or @bcrochet to make sure I'm not making this up.

damn. If I had seen this comment I'd avoided a few above. Thanks Tomáš

tomassedovic · 2019-08-07T11:19:31Z

docs/user/openstack/README.md

My deployment had 30 master rules and 18 worker rules for a total of 48. It's possible that more will be added in the future. I'd set the minimum to say 60 and mention that ~100+ would be recommended for future-proofing.

Note that Kuryr will have much higher requirements on the networking resources (but that can be addressed later by the Kuryr folks).

tomassedovic · 2019-08-07T14:49:10Z

docs/user/openstack/README.md

@iamemilio one more thing: this openstack quota set --secgroups 100 --secgroup-rules 1000 <project> command here is now inconsistent with the quotas you've added above.

Could you please either delete this line or fix the values and incorporate it into your changes?

celebdor · 2019-08-07T15:42:40Z

docs/design/openstack/networking-infrastructure.md

it also resolves the names of the nodes.

celebdor · 2019-08-07T15:43:26Z

docs/design/openstack/networking-infrastructure.md

Suggested change

## Virtual IP's

## Virtual IPs

celebdor · 2019-08-07T15:44:57Z

docs/design/openstack/networking-infrastructure.md

Suggested change

Ingress, which handles requests to services managed by OpenShift, DNS, which handles internal dns requests, and API, which handles requests to the openshift API. Our VIP addresses are chosen and validated from the nodes subnet in the openshift

Ingress, which handles requests to routes managed by OpenShift; DNS, which handles internal DNS requests; and API, which handles requests to the OpenShift API. Our VIP addresses are chosen and validated from the nodes subnet in the OpenShift

celebdor · 2019-08-07T15:46:11Z

docs/design/openstack/networking-infrastructure.md

Suggested change

nodes are still coming up. The bootstrap node will run a coredns instance, as well as

nodes are still coming up. The bootstrap node will run a CoreDNS instance, as well as

celebdor · 2019-08-07T15:47:30Z

docs/design/openstack/networking-infrastructure.md

Suggested change

keepalived. While the bootstrap node is up, it will have priority running the API and DNS

Keepalived. While the bootstrap node is up, it will have priority running the API and DNS

The bootstrap node does not have priority for the API VIP, it will yield to the masters as soon as any of those succeeds in VIP health checks.

celebdor · 2019-08-07T15:50:03Z

docs/design/openstack/networking-infrastructure.md

Not the API VIP

celebdor · 2019-08-07T15:54:17Z

docs/design/openstack/networking-infrastructure.md

damn. If I had seen this comment I'd avoided a few above. Thanks Tomáš

celebdor · 2019-08-07T15:58:04Z

docs/design/openstack/networking-infrastructure.md

Suggested change

To ensure the api is reachable through the API VIP, keepalived periodically attempts to reach the api through the API VIP. It will do the same

To ensure the API is reachable through the API VIP, Keepalived periodically attempts to reach the API through the API VIP. It will do the same

The check is not through the API VIP, but using localhost. It is HAProxy monitor who checks via the API VIP to see if the API LB is ready.

why does it do this?

because on each node, nothing listends to the API VIP until that is configured by keepalived. So Keepalived itself, to be able to tell if it can configure the VIP, needs to use either the non virtual IP of the node or localhost.

The note about HAProxy means that HAProxy will only put an iptables rule redirecting the API traffic to itself when it is hosting the VIP.

celebdor · 2019-08-07T16:05:12Z

docs/design/openstack/networking-infrastructure.md

fix the names of the services ;-)

celebdor · 2019-08-07T16:06:47Z

docs/design/openstack/networking-infrastructure.md

I think it is worth it to mention that it is the ocp router haproxy and not the infra haproxy. Also, explaining that since the ocp routers do not run on every worker node, this is a good way to have the same config only make those nodes, where an ocp router gets scheduled, eligible.

Do you mind explaining this further?

Sure. We do not know a priori which worker nodes will get an ocp router pod scheduled. So we run keepalived for ingress on all nodes. Only those that actually run the ocp router pod get max score, so those are the ones that will get the ingress VIP.

also, just to clarify, by ocp router pod, are you talking about the haproxy that we run in static pods, or something like multus?

celebdor · 2019-08-08T08:25:03Z

It looks way better. Thanks for the edits. I answered your questions I think.

tomassedovic · 2019-08-08T15:03:08Z

docs/user/openstack/README.md

I've just realised, the OpenStack router uses up one floating IP as well. So we need 1 for the router, another for the bootstrap and third for the API. So: minimum 3 recommended 4+? I'd actually recommend one for each node so let's make it an even 10?

The one for the OpenStack router is added at the end of deployment when the bootstrap FIP is supposed to be deleted already.

It's correct that we need 3 FIPs, but only 2 at the same time. Minimal requirement should say 2 FIPs.

tomassedovic · 2019-08-08T15:04:02Z

docs/user/openstack/README.md

We've got 1 bootstrap, 3 masters and 2 workers minimum. That's six. Could you increase this and the recommended value by one please?

Users could configure more compute, etc. It seems like you should either emphasize that these minimums are for the installer defaults, or break them down by install-cinfig tunables.

@wking I tired to clarify things in my latest commit. Please let me know if my changes have resolved this issue

tomassedovic · 2019-08-08T15:07:58Z

@iamemilio couple notes on the FIP and instance counts (sorry, I may have given you wrong numbers earlier).

I'm happy to merge this (just remove the WIP) and address that in a subsequent PR or feel free to update this one and I'll lgtm then.

Thanks for doing this! It looks really good!

racedo

In the "RHCOS Image" section, we should add a link to where the images are found. I think today this would be here:
https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/

iamemilio · 2019-08-08T20:00:05Z

/retest

iamemilio · 2019-08-08T20:03:58Z

Covers: OSINFRA 519, 522, 139, 639
Future Documentation changes will be handled in separate pull requests

tomassedovic · 2019-08-09T08:54:46Z

/approve

I am happy to merge this as is and address changes in later PRs.

@mandre has expressed an interest so I'll hold off from /lgtm until he had a look too.

mandre

There are also a few typo here and there and capitalization missing. I'm fine with merging this patch now and iterating to improve them on a follow up patch.

Nice work on improving the docs! Thanks Emilio.

mandre · 2019-08-09T08:39:16Z

docs/user/openstack/README.md

-## OpenStack Requirements
+## Openstack Credentials
+
+There are two ways to pass your credentials to the installer, with a clouds.yaml file or with environment variables. You can also use a combination of the two, but be aware that clouds.yaml file has precident over the environment variables you set.


s/has precident/takes precedence/

mandre · 2019-08-09T08:44:00Z

docs/user/openstack/README.md

+
+#### Master Nodes
+
+The default deployment stands up 3 master nodes, which is the minimum amount required for a cluster. For each master node you stand up, you will need 1 instance, and 1 port available in your quota. They should be assigned a flavor with at least 16 Gb RAM, 4 VCPu, and 25 Gb Disk. It is theoretically possible to run with a smaller flavor, but be aware that if it takes too long to stand up services, or certian essential services crash, the installer could time out, leading to a failed install.


Should we also add the number of ports to the minimum requirements?

mandre · 2019-08-09T08:48:30Z

docs/user/openstack/README.md

-Note the actual IP address. We will use `10.19.115.117` throughout this
-document.
-
 Next, add the `api.<cluster name>.<cluster domain>` and `*.apps.<cluster


*.apps.<clustername>.<cluster domain> should point to another Floating IP, mapped to the Ingress VIP port.

Yeah, following the steps, the *.apps bit here should be removed I think.

I will clarify this

mandre · 2019-08-09T08:54:04Z

docs/user/openstack/README.md

+OR add A record in `/etc/hosts`:

+```
+    <ingress FIP> console-openshift-console.apps.example.shiftstack.com


Perhaps we should add all the different addresses the cluster uses by default, if I'm not mistaken:

<ingress FIP> console-openshift-console.apps.example.shiftstack.com <ingress FIP> integrated-oauth-server-openshift-authentication.apps.example.shiftstack.com <ingress FIP> oauth-openshift.apps.example.shiftstack.com <ingress FIP> prometheus-k8s-openshift-monitoring.apps.example.shiftstack.com <ingress FIP> grafana-openshift-monitoring.apps.example.shiftstack.com

mandre · 2019-08-09T08:55:29Z

docs/user/openstack/README.md

+In order to run the latest version of the installer in OpenStack, at a bare minimum you need the following quota to run a *default* cluster. While it is possible to run the cluster with fewer resources than this, it is not recommended. Certian edge cases, such as deploying [without FIPs](#without-floating-ips), or deploying with an [external loadbalancer](#using-an-external-load-balancer) are documented below, and are not included in the scope of this recomendation.
+
+   * OpenStack Quota
+     * Floating IPs: 3


This is really 2 floating IPs.

no, it's three. One floating IP is taken up by the router itself.

Or rather, one IP from the floating range is taken up by the router even though it doesn't show up in openstack floating ip list. But your range needs at least 3 IPs.

Hmm ok, but does it count as part of the FIP quota?

Hmm, that's a good question. I'm not sure, but since it comes from the same range, having enough quota (backed by FIP count) should always work here.

tomassedovic · 2019-08-09T08:57:04Z

Aaah, we're not owners in docs/design/openstack which is why we can't merge it ourselves.

tomassedovic · 2019-08-09T09:09:15Z

/lgtm

We need @wking or someone to approve this (or merge #2194).

@iamemilio please note the latest comments and feel free to open a new PR addressing them.

openshift-ci-robot · 2019-08-09T19:18:34Z

@iamemilio: you cannot LGTM your own PR.

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2019-08-09T19:18:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iamemilio, tomassedovic

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~docs/design/openstack/OWNERS~~ [tomassedovic]
~~docs/user/openstack/OWNERS~~ [tomassedovic]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2019-08-09T20:31:42Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2019-08-09T23:19:07Z

@iamemilio: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-scaleup-rhel7	`3f9b50c`	link	`/test e2e-aws-scaleup-rhel7`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

This adds a new directory for documents about the design of the OpenStack platform and sets up `openstack-approvers` as its owners. For example content see the existing Bare Metal folder: https://github.com/openshift/installer/tree/master/docs/design/baremetal Or this pull request: openshift#2148

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 2, 2019

openshift-ci-robot requested review from jhixson74 and patrickdillon August 2, 2019 18:38

tomassedovic suggested changes Aug 5, 2019

View reviewed changes

iamemilio force-pushed the docs branch from 49eb759 to 455e911 Compare August 5, 2019 13:57

openshift-ci-robot assigned tomassedovic Aug 5, 2019

openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 5, 2019

iamemilio force-pushed the docs branch from 455e911 to f45ad39 Compare August 5, 2019 14:24

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2019

iamemilio changed the title ~~Openstack: Document Auth Env Vars and Min Hardware Rec~~ [WIP] Openstack: Document Auth Env Vars and Min Hardware Rec Aug 5, 2019

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 5, 2019

iamemilio force-pushed the docs branch 2 times, most recently from f068568 to d56377b Compare August 6, 2019 19:43

tomassedovic mentioned this pull request Aug 7, 2019

RFE: Add an install-config.yaml file for OpenStack #1080

Closed

tomassedovic suggested changes Aug 7, 2019

View reviewed changes

tomassedovic reviewed Aug 7, 2019

View reviewed changes

celebdor suggested changes Aug 7, 2019

View reviewed changes

iamemilio force-pushed the docs branch from 7316a62 to 8c1e7b0 Compare August 7, 2019 20:08

tomassedovic reviewed Aug 8, 2019

View reviewed changes

racedo reviewed Aug 8, 2019

View reviewed changes

Emilio Garcia added 4 commits August 8, 2019 15:56

Documented Auth Environment Variables and Minimum Recommendations

bf54b9d

Networking Design Docs

84b614f

post install steps + response to revisions

d9c1f1c

Clarifying Quotas and Networking Arch

3f9b50c

iamemilio force-pushed the docs branch from cbb135e to 3f9b50c Compare August 8, 2019 19:56

iamemilio changed the title ~~[WIP] Openstack: Document Auth Env Vars and Min Hardware Rec~~ OpenStack: Doccument Admin Requirements, Post Deployment Steps, and Networking Arch. Aug 8, 2019

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 8, 2019

iamemilio changed the title ~~OpenStack: Doccument Admin Requirements, Post Deployment Steps, and Networking Arch.~~ OpenStack: Doccument Admin Requirements, Post Deployment Steps, and Networking Arch Aug 8, 2019

mandre reviewed Aug 9, 2019

View reviewed changes

tomassedovic mentioned this pull request Aug 9, 2019

openstack: set up docs/design/openstack OWNERS #2194

Merged

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 9, 2019

tomassedovic mentioned this pull request Aug 9, 2019

openstack: Add OpenStack to the platform selection #2036

Merged

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 9, 2019

openshift-merge-robot merged commit 0eeef05 into openshift:master Aug 9, 2019

	Ingress, which handles requests to services managed by OpenShift, DNS, which handles internal dns requests, and API, which handles requests to the openshift API. Our VIP addresses are chosen and validated from the nodes subnet in the openshift
	Ingress, which handles requests to routes managed by OpenShift; DNS, which handles internal DNS requests; and API, which handles requests to the OpenShift API. Our VIP addresses are chosen and validated from the nodes subnet in the OpenShift

	nodes are still coming up. The bootstrap node will run a coredns instance, as well as
	nodes are still coming up. The bootstrap node will run a CoreDNS instance, as well as

	keepalived. While the bootstrap node is up, it will have priority running the API and DNS
	Keepalived. While the bootstrap node is up, it will have priority running the API and DNS

	To ensure the api is reachable through the API VIP, keepalived periodically attempts to reach the api through the API VIP. It will do the same
	To ensure the API is reachable through the API VIP, Keepalived periodically attempts to reach the API through the API VIP. It will do the same


		#### Master Nodes

		The default deployment stands up 3 master nodes, which is the minimum amount required for a cluster. For each master node you stand up, you will need 1 instance, and 1 port available in your quota. They should be assigned a flavor with at least 16 Gb RAM, 4 VCPu, and 25 Gb Disk. It is theoretically possible to run with a smaller flavor, but be aware that if it takes too long to stand up services, or certian essential services crash, the installer could time out, leading to a failed install.

OpenStack: Doccument Admin Requirements, Post Deployment Steps, and Networking Arch #2148

OpenStack: Doccument Admin Requirements, Post Deployment Steps, and Networking Arch #2148

Uh oh!

Conversation

iamemilio commented Aug 2, 2019

Uh oh!

tomassedovic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomassedovic commented Aug 5, 2019

Uh oh!

tomassedovic commented Aug 5, 2019

Uh oh!

tomassedovic commented Aug 5, 2019

Uh oh!

openshift-bot commented Aug 5, 2019

Uh oh!

openshift-bot commented Aug 5, 2019

Uh oh!

openshift-bot commented Aug 5, 2019

Uh oh!

tomassedovic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

celebdor commented Aug 8, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

iamemilio commented Aug 8, 2019 •

edited

Loading

iamemilio Aug 9, 2019 •

edited

Loading