Skip to content

Conversation

@iamemilio
Copy link

No description provided.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 2, 2019
Copy link
Contributor

@tomassedovic tomassedovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks great, but I've got a couple of suggestions regarding the node count.

Happy to merge once those are addressed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also mention that the default is 3 so people aren't confused. Something like: Default 3, Minimum 2.

@tomassedovic
Copy link
Contributor

/label platform/openstack

@tomassedovic
Copy link
Contributor

/lgtm
/approve

/retest

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 5, 2019
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2019
@tomassedovic
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 5, 2019
@iamemilio iamemilio changed the title Openstack: Document Auth Env Vars and Min Hardware Rec [WIP] Openstack: Document Auth Env Vars and Min Hardware Rec Aug 5, 2019
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 5, 2019
@iamemilio iamemilio force-pushed the docs branch 2 times, most recently from f068568 to d56377b Compare August 6, 2019 19:43
Copy link
Contributor

@tomassedovic tomassedovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! More content -> more change requests I'm afraid :-).

There's one pretty important factual mistake in the bootstrap -> master VIP transition. And then a small note about the SG rule quota.

There's also a ton of other stuff we should probably document better such as:

  • sample install-config.yaml
  • where to download the RHCOS image from
  • more? Documentation is never done lol

But I'm more than happy to merge this once you've resolved the issues above and anything else can be done in later PRs. This is already a huge improvement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is (IIRC) not entirely correct. Initially, bootstrap does have a higher priority than the master's. However, once a master passes the health check, its priority increases and the bootstrap -> master VIP switch happens then.

We do not rely on the bootstrap teardown for the API VIP transition.

This is the initial master priority: https://github.com/openshift/machine-config-operator/blob/db0210f8bb30117bcabd9fa74d8e557d09f787a2/templates/master/00-master/openstack/files/openstack-keepalived-keepalived.yaml#L27

This is the weight based on the health check: https://github.com/openshift/machine-config-operator/blob/db0210f8bb30117bcabd9fa74d8e557d09f787a2/templates/master/00-master/openstack/files/openstack-keepalived-keepalived.yaml#L9

The weight gets added to the priority, turning it from 40 to 90.

Bootstrap's priority is 50 and there are no other checks that would increase it: https://github.com/openshift/machine-config-operator/blob/db0210f8bb30117bcabd9fa74d8e557d09f787a2/manifests/openstack/keepalived.conf.tmpl#L11

See: https://www.keepalived.org/manpage.html

A positive weight means that an OK status will add to
the priority of all VRRP instances which monitor it. On the opposite, a
negative weight will be subtracted from the initial priority in case of
insufficient processes.

/cc @celebdor or @bcrochet to make sure I'm not making this up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn. If I had seen this comment I'd avoided a few above. Thanks Tomáš

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My deployment had 30 master rules and 18 worker rules for a total of 48. It's possible that more will be added in the future. I'd set the minimum to say 60 and mention that ~100+ would be recommended for future-proofing.

Note that Kuryr will have much higher requirements on the networking resources (but that can be addressed later by the Kuryr folks).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iamemilio one more thing: this openstack quota set --secgroups 100 --secgroup-rules 1000 <project> command here is now inconsistent with the quotas you've added above.

Could you please either delete this line or fix the values and incorporate it into your changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also resolves the names of the nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Virtual IP's
## Virtual IPs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ingress, which handles requests to services managed by OpenShift, DNS, which handles internal dns requests, and API, which handles requests to the openshift API. Our VIP addresses are chosen and validated from the nodes subnet in the openshift
Ingress, which handles requests to routes managed by OpenShift; DNS, which handles internal DNS requests; and API, which handles requests to the OpenShift API. Our VIP addresses are chosen and validated from the nodes subnet in the OpenShift

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nodes are still coming up. The bootstrap node will run a coredns instance, as well as
nodes are still coming up. The bootstrap node will run a CoreDNS instance, as well as

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keepalived. While the bootstrap node is up, it will have priority running the API and DNS
Keepalived. While the bootstrap node is up, it will have priority running the API and DNS

The bootstrap node does not have priority for the API VIP, it will yield to the masters as soon as any of those succeeds in VIP health checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the API VIP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn. If I had seen this comment I'd avoided a few above. Thanks Tomáš

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To ensure the api is reachable through the API VIP, keepalived periodically attempts to reach the api through the API VIP. It will do the same
To ensure the API is reachable through the API VIP, Keepalived periodically attempts to reach the API through the API VIP. It will do the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is not through the API VIP, but using localhost. It is HAProxy monitor who checks via the API VIP to see if the API LB is ready.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because on each node, nothing listends to the API VIP until that is configured by keepalived. So Keepalived itself, to be able to tell if it can configure the VIP, needs to use either the non virtual IP of the node or localhost.

The note about HAProxy means that HAProxy will only put an iptables rule redirecting the API traffic to itself when it is hosting the VIP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the names of the services ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth it to mention that it is the ocp router haproxy and not the infra haproxy. Also, explaining that since the ocp routers do not run on every worker node, this is a good way to have the same config only make those nodes, where an ocp router gets scheduled, eligible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind explaining this further?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We do not know a priori which worker nodes will get an ocp router pod scheduled. So we run keepalived for ingress on all nodes. Only those that actually run the ocp router pod get max score, so those are the ones that will get the ingress VIP.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, just to clarify, by ocp router pod, are you talking about the haproxy that we run in static pods, or something like multus?

@celebdor
Copy link
Contributor

celebdor commented Aug 8, 2019

It looks way better. Thanks for the edits. I answered your questions I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just realised, the OpenStack router uses up one floating IP as well. So we need 1 for the router, another for the bootstrap and third for the API. So: minimum 3 recommended 4+? I'd actually recommend one for each node so let's make it an even 10?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one for the OpenStack router is added at the end of deployment when the bootstrap FIP is supposed to be deleted already.

It's correct that we need 3 FIPs, but only 2 at the same time. Minimal requirement should say 2 FIPs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got 1 bootstrap, 3 masters and 2 workers minimum. That's six. Could you increase this and the recommended value by one please?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL whoops

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users could configure more compute, etc. It seems like you should either emphasize that these minimums are for the installer defaults, or break them down by install-cinfig tunables.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking I tired to clarify things in my latest commit. Please let me know if my changes have resolved this issue

@tomassedovic
Copy link
Contributor

@iamemilio couple notes on the FIP and instance counts (sorry, I may have given you wrong numbers earlier).

I'm happy to merge this (just remove the WIP) and address that in a subsequent PR or feel free to update this one and I'll lgtm then.

Thanks for doing this! It looks really good!

Copy link

@racedo racedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the "RHCOS Image" section, we should add a link to where the images are found. I think today this would be here:
https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/

@iamemilio
Copy link
Author

/retest

@iamemilio
Copy link
Author

iamemilio commented Aug 8, 2019

Covers: OSINFRA 519, 522, 139, 639
Future Documentation changes will be handled in separate pull requests

@iamemilio iamemilio changed the title [WIP] Openstack: Document Auth Env Vars and Min Hardware Rec OpenStack: Doccument Admin Requirements, Post Deployment Steps, and Networking Arch. Aug 8, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 8, 2019
@iamemilio iamemilio changed the title OpenStack: Doccument Admin Requirements, Post Deployment Steps, and Networking Arch. OpenStack: Doccument Admin Requirements, Post Deployment Steps, and Networking Arch Aug 8, 2019
@tomassedovic
Copy link
Contributor

/approve

I am happy to merge this as is and address changes in later PRs.

@mandre has expressed an interest so I'll hold off from /lgtm until he had a look too.

Copy link
Member

@mandre mandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also a few typo here and there and capitalization missing. I'm fine with merging this patch now and iterating to improve them on a follow up patch.

Nice work on improving the docs! Thanks Emilio.

## OpenStack Requirements
## Openstack Credentials

There are two ways to pass your credentials to the installer, with a clouds.yaml file or with environment variables. You can also use a combination of the two, but be aware that clouds.yaml file has precident over the environment variables you set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/has precident/takes precedence/


#### Master Nodes

The default deployment stands up 3 master nodes, which is the minimum amount required for a cluster. For each master node you stand up, you will need 1 instance, and 1 port available in your quota. They should be assigned a flavor with at least 16 Gb RAM, 4 VCPu, and 25 Gb Disk. It is theoretically possible to run with a smaller flavor, but be aware that if it takes too long to stand up services, or certian essential services crash, the installer could time out, leading to a failed install.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add the number of ports to the minimum requirements?

Note the actual IP address. We will use `10.19.115.117` throughout this
document.

Next, add the `api.<cluster name>.<cluster domain>` and `*.apps.<cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*.apps.<clustername>.<cluster domain> should point to another Floating IP, mapped to the Ingress VIP port.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, following the steps, the *.apps bit here should be removed I think.

Copy link
Author

@iamemilio iamemilio Aug 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will clarify this

OR add A record in `/etc/hosts`:

```
<ingress FIP> console-openshift-console.apps.example.shiftstack.com
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should add all the different addresses the cluster uses by default, if I'm not mistaken:

<ingress FIP> console-openshift-console.apps.example.shiftstack.com
<ingress FIP> integrated-oauth-server-openshift-authentication.apps.example.shiftstack.com
<ingress FIP> oauth-openshift.apps.example.shiftstack.com
<ingress FIP> prometheus-k8s-openshift-monitoring.apps.example.shiftstack.com
<ingress FIP> grafana-openshift-monitoring.apps.example.shiftstack.com

In order to run the latest version of the installer in OpenStack, at a bare minimum you need the following quota to run a *default* cluster. While it is possible to run the cluster with fewer resources than this, it is not recommended. Certian edge cases, such as deploying [without FIPs](#without-floating-ips), or deploying with an [external loadbalancer](#using-an-external-load-balancer) are documented below, and are not included in the scope of this recomendation.

* OpenStack Quota
* Floating IPs: 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really 2 floating IPs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's three. One floating IP is taken up by the router itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather, one IP from the floating range is taken up by the router even though it doesn't show up in openstack floating ip list. But your range needs at least 3 IPs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok, but does it count as part of the FIP quota?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a good question. I'm not sure, but since it comes from the same range, having enough quota (backed by FIP count) should always work here.

@tomassedovic
Copy link
Contributor

Aaah, we're not owners in docs/design/openstack which is why we can't merge it ourselves.

@tomassedovic
Copy link
Contributor

/lgtm

We need @wking or someone to approve this (or merge #2194).

@iamemilio please note the latest comments and feel free to open a new PR addressing them.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 9, 2019
@openshift-ci-robot
Copy link
Contributor

@iamemilio: you cannot LGTM your own PR.

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iamemilio, tomassedovic

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 9, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 0eeef05 into openshift:master Aug 9, 2019
@openshift-ci-robot
Copy link
Contributor

@iamemilio: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 3f9b50c link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

jhixson74 pushed a commit to jhixson74/installer that referenced this pull request Dec 6, 2019
This adds a new directory for documents about the design of the
OpenStack platform and sets up `openstack-approvers` as its owners.

For example content see the existing Bare Metal folder:

https://github.com/openshift/installer/tree/master/docs/design/baremetal

Or this pull request:

openshift#2148
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants