DNS and LB services in cluster for non-cloud IPI #148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

openshift-merge-robot merged 1 commit into openshift:master from jcpowermac:vmw_m3_ops_networking

Jun 15, 2020

Contributor

jcpowermac commented Dec 11, 2019

The proposal section was slightly modified from: https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md

openshift-ci-robot added do-not-merge/work-in-progress size/L labels

openshift-ci-robot requested review from imcleod and jcantrill

December 11, 2019 14:22

jcpowermac force-pushed the vmw_m3_ops_networking branch 2 times, most recently from 6245021 to 66ab0f5 Compare

December 11, 2019 17:30

Contributor

russellb commented Dec 11, 2019

The proposal section was slightly modified from: https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md

What were the modifications? Can you add a link to this to the doc itself, as well?

Contributor Author

jcpowermac commented Dec 11, 2019

The proposal section was slightly modified from: https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md

What were the modifications? Can you add a link to this to the doc itself, as well?

Sure I can add a link to original document. I modified this text since this has been moved to MCO

 The `keepalived`
instance here is managed by systemd and a script is used to generate
the `keepalived` configuration before launching the service using
`podman`.

Added repository links to pod and service configurations.

jcpowermac force-pushed the vmw_m3_ops_networking branch 3 times, most recently from 4b41706 to 1aae954 Compare

December 12, 2019 18:34

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md

+              ### Goals
+              Install an IPI OpenShift cluster on various on-premise non-cloud platforms that
+              provides internal DNS and load balancing that is minimally required for OpenShift

Contributor

abhinavdahiya Dec 12, 2019 •

edited

Loading

can we add a section that provides information briefly on what this minimum requirements are?

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

+              ## Proposal
+              The basis and significant portions for this proposal was taken from existing
+              [documentation](https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md).

Contributor

abhinavdahiya Dec 12, 2019

can you move all the links to the end of the document like https://github.com/openshift/enhancements/blame/master/enhancements/installer/aws-customer-provided-subnets.md#L214-L217

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

Comment on lines 69 to 71

+              In both cases, the installation process expects these ports to be
+              reachable on the bootstrap instance at first and then later on the
+              newly-deployed control plane machines.

Contributor

abhinavdahiya Dec 12, 2019

that's technically not correct, the expectation is that these are load balanced to any one of them as long as it's healthy.

There is no first bootstrap and then control-plane, it's all healthy endpoints.

Contributor Author

jcpowermac Dec 13, 2019

@abhinavdahiya I removed this section from the original doc:

On other platforms (for example, see the AWS UPI
instructions) an external
load-balancer is required to be configured in advance in order to
provide this access.

For bootstrap I am unable to find haproxy in use. From my review the expectation is the VIP moves to the control plane node once the bootstrap node is destroyed.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

+              In cluster network infrastructure, a VIP (Virtual IP) is used to provide
+              failover of the API server across the control plane machines
+              (including the bootstrap instance). This "API VIP" is provided by the user

Contributor

abhinavdahiya Dec 12, 2019

(including the bootstrap instance)

should probably reword on the line of, bootstrap-host is part of this backend during bootstrapping.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

+              In cluster network infrastructure, a VIP (Virtual IP) is used to provide
+              failover of the API server across the control plane machines
+              (including the bootstrap instance). This "API VIP" is provided by the user

Contributor

abhinavdahiya Dec 12, 2019

as an install-config.yaml [parameter]

I think we should just mention that it's provided by the user to the installer, because it can be prompted if needed by the terminal prompts and not just install-config.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md

+              (including the bootstrap instance). This "API VIP" is provided by the user
+              as an `install-config.yaml` [parameter](https://github.com/openshift/installer/blob/master/pkg/types/baremetal/platform.go#L57-L63)
+              and the installation process configures `keepalived` to manage this VIP.

Contributor

abhinavdahiya Dec 12, 2019

would like to see included some restrictions of the API VIP...
example, is this global addressable? or only in the private network? is it part of the private network of cluster or some other private network out side cluster's etc.

Member

sdodson Jan 9, 2020

Seems since we only have one API VIP whether or not it's RFC1918 or not matters less than the requirement that the VIP is accessible both by the cluster and external clients.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

+              failover of the API server across the control plane machines
+              (including the bootstrap instance). This "API VIP" is provided by the user
+              as an `install-config.yaml` [parameter](https://github.com/openshift/installer/blob/master/pkg/types/baremetal/platform.go#L57-L63)
+              and the installation process configures `keepalived` to manage this VIP.

Contributor

abhinavdahiya Dec 12, 2019

The installation process doesn't configure it, rather the cluster configures and maintains it.. so i think we should move away from it being install time only configuration.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

+              as an `install-config.yaml` [parameter](https://github.com/openshift/installer/blob/master/pkg/types/baremetal/platform.go#L57-L63)
+              and the installation process configures `keepalived` to manage this VIP.
+              The API VIP first resides on the bootstrap instance.

Contributor

abhinavdahiya Dec 12, 2019 •

edited

Loading

this probably needs to be separated out into a section for lifecycle of API VIP during bootstrapping and after in the cluster, and keepalived setup.

Contributor

abhinavdahiya Dec 12, 2019 •

edited

Loading

keepalived setup

It will be useful to describe

the init container's input and role
the actual keepalived inputs and configured behavior briefly.

Contributor

abhinavdahiya Dec 12, 2019

Also there seems to be difference in keepalived pod setup/configuration between bootstrap and cluster nodes see https://github.com/openshift/machine-config-operator/blob/fd8c53bfb8d97a5a7442a8810a0f2397f20b495d/templates/common/baremetal/files/baremetal-keepalived.yaml

So i think it is important to cover that.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md

+              are rendered by the Machine Config Operator.
+              The VIP will move to one of the control plane nodes, but only after the
+              bootstrap process has completed and the bootstrap instance is stopped. This happens

Contributor

abhinavdahiya Dec 12, 2019

the bootstrap instance is stopped

this is technically incorrect, the users shouldn't have to shutdown the bootstrap-host to move the API VIP to control-plane, we should be able to communicate to control-plane as soon as it's up..?

This seems like a bug...?

Contributor

abhinavdahiya Dec 12, 2019

Contributor Author

jcpowermac Dec 13, 2019 •

edited

Loading

@abhinavdahiya the bootstrap vrrp_interface priority is set to 50 and control plane 40. The only way for the VIP to move to the CP is for the boostrap instance of keepalived to be stopped or the entire machine. The VIP would then be under control of a CP node. haproxy would then load balance between the other CP nodes

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md

+              These [instances](https://github.com/openshift/machine-config-operator/blob/master/templates/master/00-master/baremetal/files/baremetal-haproxy.yaml)
+              of `haproxy` are [configured](https://github.com/openshift/machine-config-operator/blob/master/templates/master/00-master/baremetal/files/baremetal-haproxy-haproxy.yaml)
+              to load balance the API traffic across all of the control plane nodes.

Contributor

abhinavdahiya Dec 12, 2019

would like to include the information wrt how are the backends discovered for the ha proxy and what's the health checking setup for the backend.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md

+              Externally resolvable DNS records are required for:
+              * `api.$cluster_name.$base-domain` -
+              * `*.apps.$cluster_name.$base_domain` -

Contributor

abhinavdahiya Dec 12, 2019

internal services depend on this too..

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md

+              In cluster networking infrastructure, the goal is is to automate as much of the
+              DNS requirements internal to the cluster as possible, leaving only a
+              small amount of public DNS configuration to be implemented by the user
+              before starting the installation process.

Contributor

abhinavdahiya Dec 12, 2019

leaving only a
small amount of public DNS configuration to be implemented by the user
before starting the installation process.

So the customer should be able to use the kubeconfig provided without any pre/post setup.

The only one that is kinda acceptable is the *.apps.$cluster_domain from outside the cluster.

Contributor Author

jcpowermac Dec 13, 2019

@abhinavdahiya I would think that the A record for api and *.app would need to be create before starting install.

Contributor

abhinavdahiya Dec 13, 2019

can to add this to limitations.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated


		#### api-int hostname resolution

		The CoreDNS server performing our internal DNS resolution includes

Contributor

abhinavdahiya Dec 12, 2019

The CoreDNS server

which the we haven't described any coredns server before this. I think we need a section on the coredns setup/inputs/behavior for internal dns.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

+              The `mdns-publisher` is the component that runs on each host to make itself
+              discoverable by other hosts in the cluster.  Control plane hosts currently
+              advertise both `etcd-NN` and `master-NN` names, and the worker nodes advertise

Contributor

abhinavdahiya Dec 12, 2019

would like to know how the publisher get the idx of the machine resp to other node in the cluster.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

Comment on lines 173 to 180

+              One of the of virtual IP addresses required by in cluster networking infrastructure is used
+              for our self-hosted internal DNS - the “DNS VIP”.  The location of the DNS VIP
+              is [managed](https://github.com/openshift/machine-config-operator/blob/master/manifests/baremetal/keepalived.conf.tmpl#L22)
+              by `keepalived`, similar to the management of the API VIP.
+              The control plane nodes are configured to use the DNS VIP as their primary DNS
+              server.  The DNS VIP resides on the bootstrap host until the control plane
+              nodes come up, and then it will move to one of the control plane nodes.

Contributor

abhinavdahiya Dec 12, 2019

this doesn't belong to this section.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

Comment on lines 203 to 288

+              ### Ingress High Availability
+              There is a third VIP used by in cluster networking infrastructure, and that is for Ingress.
+              The Ingress VIP will always reside on a node running an Ingress controller.
+              This ensures that we provide high availability for ingress by default.
+              The [configuration](https://github.com/openshift/machine-config-operator/blob/master/templates/worker/00-worker/baremetal/files/baremetal-keepalived-keepalived.yaml)
+              of this mechanism used to determine which nodes are running an ingress controller
+              is that `keepalived` will try to reach the local `haproxy` stats port number
+              using `curl`.

Contributor

abhinavdahiya Dec 12, 2019

This only provides the dns and routing for the default ingresscontroller.. what about if the user configure multiple for sharding etc.

How would the user configure a vip setup for different one, can the user even do that. we should capture the limitations.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

+              - https://github.com/openshift/installer/pull/1873
+              - https://github.com/openshift/machine-config-operator/pull/795
+              - https://github.com/openshift/api/pull/348

Contributor

abhinavdahiya Dec 12, 2019

it's strange that the only user that needs the VIPs is machine-config-operator but the values are stored in global configuration... for future platforms we should look at fixing this..

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md

		- https://github.com/openshift/api/pull/348

		### Risks and Mitigations

Contributor

abhinavdahiya Dec 12, 2019

can we include that none of this setup has been verified to be resilient or performant... esp the amount of watches in large clusters to apiserver..

jcpowermac force-pushed the vmw_m3_ops_networking branch from 1918fbe to 99e3942 Compare

December 16, 2019 15:25

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

Comment on lines 78 to 93

		The bootstrap keepalived VRRP instances have a higher weight using a `priority 50` than the control plane's `priority 40`.
		Bootstrap will maintain its role as master until some intervention which in our case is destroying the bootstrap node.

Contributor

abhinavdahiya Dec 17, 2019

I would like to make sure we mention this in Limitations section.

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

Comment on lines 88 to 108

		The control plane keepalived configuration uses service checks to either add or remove points to the instance weight.
		Using the default `priority 40` and the service checks will determine which control plane node is the master VRRP instance.

Contributor

abhinavdahiya Dec 17, 2019

The control plane keepalived configuration uses service checks

Can you provide details on the configured health checks... protocol, address, etc..

abhinavdahiya reviewed

View reviewed changes

enhancements/network/baremetal-networking.md Outdated

+              The APIVIP is provided by the user
+              via the `install-config.yaml` [parameter](https://github.com/openshift/installer/blob/master/pkg/types/baremetal/platform.go#L57-L63)
+              or `openshift-installer` terminal prompts. The machine config operator does the inital
+              render of the pod spec and configuration template.  The initContainer does the final

Contributor

abhinavdahiya Dec 17, 2019

The machine config operator does the inital
render of the pod spec and configuration template. The initContainer does the final
templating of the configuration with baremetal-runtimecfg.

let's keep a separate paragraph for baremetal-runtimecfg. Also can we provide details around what inputs does this take/provide as API...??

jcpowermac force-pushed the vmw_m3_ops_networking branch from 99e3942 to 3358e4b Compare

December 18, 2019 21:32

jcpowermac changed the title ~~[wip] DNS and LB services in cluster for non-cloud IPI~~ DNS and LB services in cluster for non-cloud IPI

openshift-ci-robot removed the do-not-merge/work-in-progress label

jcpowermac force-pushed the vmw_m3_ops_networking branch from 3358e4b to 3ce8f16 Compare

January 10, 2020 14:52

jcpowermac mentioned this pull request

Add vSphere platform status openshift/api#541

Merged

yboaron force-pushed the vmw_m3_ops_networking branch from 3ce8f16 to 46b36e6 Compare

June 15, 2020 06:39


          DNS and LB services in cluster for non-cloud IPI

c978f7e

yboaron force-pushed the vmw_m3_ops_networking branch from 46b36e6 to c978f7e Compare

June 15, 2020 13:29

celebdor reviewed

View reviewed changes

enhancements/network/baremetal-networking.md

+              The minimal requirements includes:
+              * Internal DNS:
+                - hostname resolution for masters and workers nodes.
+                - `api-int` hostname resolution.

Contributor

celebdor Jun 15, 2020

Isn't it missing "wildcard apps subdomain resolution?

Contributor

abhinavdahiya commented Jun 15, 2020

/lgtm

openshift-ci-robot assigned abhinavdahiya

openshift-ci-robot added the lgtm label

openshift-ci-robot commented Jun 15, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, jcpowermac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot added the approved label

openshift-merge-robot merged commit 8987ab6 into openshift:master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

sdodson sdodson left review comments

imcleod Awaiting requested review from imcleod

jcantrill Awaiting requested review from jcantrill

+2 more reviewers

celebdor celebdor left review comments

abhinavdahiya abhinavdahiya left review comments

Labels

approved lgtm size/L