-
Notifications
You must be signed in to change notification settings - Fork 531
DNS and LB services in cluster for non-cloud IPI #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS and LB services in cluster for non-cloud IPI #148
Conversation
6245021 to
66ab0f5
Compare
What were the modifications? Can you add a link to this to the doc itself, as well? |
Sure I can add a link to original document. I modified this text since this has been moved to MCO Added repository links to pod and service configurations. |
4b41706 to
1aae954
Compare
| ### Goals | ||
|
|
||
| Install an IPI OpenShift cluster on various on-premise non-cloud platforms that | ||
| provides internal DNS and load balancing that is minimally required for OpenShift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a section that provides information briefly on what this minimum requirements are?
| ## Proposal | ||
|
|
||
| The basis and significant portions for this proposal was taken from existing | ||
| [documentation](https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move all the links to the end of the document like https://github.com/openshift/enhancements/blame/master/enhancements/installer/aws-customer-provided-subnets.md#L214-L217
| In both cases, the installation process expects these ports to be | ||
| reachable on the bootstrap instance at first and then later on the | ||
| newly-deployed control plane machines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's technically not correct, the expectation is that these are load balanced to any one of them as long as it's healthy.
There is no first bootstrap and then control-plane, it's all healthy endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhinavdahiya I removed this section from the original doc:
On other platforms (for example, see the AWS UPI
instructions) an external
load-balancer is required to be configured in advance in order to
provide this access.
For bootstrap I am unable to find haproxy in use. From my review the expectation is the VIP moves to the control plane node once the bootstrap node is destroyed.
cc: @russellb
|
|
||
| In cluster network infrastructure, a VIP (Virtual IP) is used to provide | ||
| failover of the API server across the control plane machines | ||
| (including the bootstrap instance). This "API VIP" is provided by the user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(including the bootstrap instance)
should probably reword on the line of, bootstrap-host is part of this backend during bootstrapping.
|
|
||
| In cluster network infrastructure, a VIP (Virtual IP) is used to provide | ||
| failover of the API server across the control plane machines | ||
| (including the bootstrap instance). This "API VIP" is provided by the user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as an
install-config.yaml[parameter]
I think we should just mention that it's provided by the user to the installer, because it can be prompted if needed by the terminal prompts and not just install-config.
| (including the bootstrap instance). This "API VIP" is provided by the user | ||
| as an `install-config.yaml` [parameter](https://github.com/openshift/installer/blob/master/pkg/types/baremetal/platform.go#L57-L63) | ||
| and the installation process configures `keepalived` to manage this VIP. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would like to see included some restrictions of the API VIP...
example, is this global addressable? or only in the private network? is it part of the private network of cluster or some other private network out side cluster's etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems since we only have one API VIP whether or not it's RFC1918 or not matters less than the requirement that the VIP is accessible both by the cluster and external clients.
| failover of the API server across the control plane machines | ||
| (including the bootstrap instance). This "API VIP" is provided by the user | ||
| as an `install-config.yaml` [parameter](https://github.com/openshift/installer/blob/master/pkg/types/baremetal/platform.go#L57-L63) | ||
| and the installation process configures `keepalived` to manage this VIP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installation process doesn't configure it, rather the cluster configures and maintains it.. so i think we should move away from it being install time only configuration.
| as an `install-config.yaml` [parameter](https://github.com/openshift/installer/blob/master/pkg/types/baremetal/platform.go#L57-L63) | ||
| and the installation process configures `keepalived` to manage this VIP. | ||
|
|
||
| The API VIP first resides on the bootstrap instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably needs to be separated out into a section for lifecycle of API VIP during bootstrapping and after in the cluster, and keepalived setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keepalived setup
It will be useful to describe
- the init container's input and role
- the actual keepalived inputs and configured behavior briefly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there seems to be difference in keepalived pod setup/configuration between bootstrap and cluster nodes see https://github.com/openshift/machine-config-operator/blob/fd8c53bfb8d97a5a7442a8810a0f2397f20b495d/templates/common/baremetal/files/baremetal-keepalived.yaml
So i think it is important to cover that.
| are rendered by the Machine Config Operator. | ||
|
|
||
| The VIP will move to one of the control plane nodes, but only after the | ||
| bootstrap process has completed and the bootstrap instance is stopped. This happens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the bootstrap instance is stopped
this is technically incorrect, the users shouldn't have to shutdown the bootstrap-host to move the API VIP to control-plane, we should be able to communicate to control-plane as soon as it's up..?
This seems like a bug...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @russellb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhinavdahiya the bootstrap vrrp_interface priority is set to 50 and control plane 40. The only way for the VIP to move to the CP is for the boostrap instance of keepalived to be stopped or the entire machine. The VIP would then be under control of a CP node. haproxy would then load balance between the other CP nodes
| These [instances](https://github.com/openshift/machine-config-operator/blob/master/templates/master/00-master/baremetal/files/baremetal-haproxy.yaml) | ||
| of `haproxy` are [configured](https://github.com/openshift/machine-config-operator/blob/master/templates/master/00-master/baremetal/files/baremetal-haproxy-haproxy.yaml) | ||
| to load balance the API traffic across all of the control plane nodes. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would like to include the information wrt how are the backends discovered for the ha proxy and what's the health checking setup for the backend.
| Externally resolvable DNS records are required for: | ||
|
|
||
| * `api.$cluster_name.$base-domain` - | ||
| * `*.apps.$cluster_name.$base_domain` - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal services depend on this too..
| In cluster networking infrastructure, the goal is is to automate as much of the | ||
| DNS requirements internal to the cluster as possible, leaving only a | ||
| small amount of public DNS configuration to be implemented by the user | ||
| before starting the installation process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving only a
small amount of public DNS configuration to be implemented by the user
before starting the installation process.
So the customer should be able to use the kubeconfig provided without any pre/post setup.
The only one that is kinda acceptable is the *.apps.$cluster_domain from outside the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhinavdahiya I would think that the A record for api and *.app would need to be create before starting install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can to add this to limitations.
|
|
||
| #### api-int hostname resolution | ||
|
|
||
| The CoreDNS server performing our internal DNS resolution includes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CoreDNS server
which the we haven't described any coredns server before this. I think we need a section on the coredns setup/inputs/behavior for internal dns.
|
|
||
| The `mdns-publisher` is the component that runs on each host to make itself | ||
| discoverable by other hosts in the cluster. Control plane hosts currently | ||
| advertise both `etcd-NN` and `master-NN` names, and the worker nodes advertise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would like to know how the publisher get the idx of the machine resp to other node in the cluster.
| One of the of virtual IP addresses required by in cluster networking infrastructure is used | ||
| for our self-hosted internal DNS - the “DNS VIP”. The location of the DNS VIP | ||
| is [managed](https://github.com/openshift/machine-config-operator/blob/master/manifests/baremetal/keepalived.conf.tmpl#L22) | ||
| by `keepalived`, similar to the management of the API VIP. | ||
|
|
||
| The control plane nodes are configured to use the DNS VIP as their primary DNS | ||
| server. The DNS VIP resides on the bootstrap host until the control plane | ||
| nodes come up, and then it will move to one of the control plane nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't belong to this section.
| ### Ingress High Availability | ||
|
|
||
| There is a third VIP used by in cluster networking infrastructure, and that is for Ingress. | ||
| The Ingress VIP will always reside on a node running an Ingress controller. | ||
| This ensures that we provide high availability for ingress by default. | ||
|
|
||
| The [configuration](https://github.com/openshift/machine-config-operator/blob/master/templates/worker/00-worker/baremetal/files/baremetal-keepalived-keepalived.yaml) | ||
| of this mechanism used to determine which nodes are running an ingress controller | ||
| is that `keepalived` will try to reach the local `haproxy` stats port number | ||
| using `curl`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only provides the dns and routing for the default ingresscontroller.. what about if the user configure multiple for sharding etc.
How would the user configure a vip setup for different one, can the user even do that. we should capture the limitations.
cc @russellb
|
|
||
| - https://github.com/openshift/installer/pull/1873 | ||
| - https://github.com/openshift/machine-config-operator/pull/795 | ||
| - https://github.com/openshift/api/pull/348 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's strange that the only user that needs the VIPs is machine-config-operator but the values are stored in global configuration... for future platforms we should look at fixing this..
| - https://github.com/openshift/api/pull/348 | ||
|
|
||
| ### Risks and Mitigations | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we include that none of this setup has been verified to be resilient or performant... esp the amount of watches in large clusters to apiserver..
1918fbe to
99e3942
Compare
| The bootstrap keepalived VRRP instances have a higher weight using a `priority 50` than the control plane's `priority 40`. | ||
| Bootstrap will maintain its role as master until some intervention which in our case is destroying the bootstrap node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to make sure we mention this in Limitations section.
| The control plane keepalived configuration uses service checks to either add or remove points to the instance weight. | ||
| Using the default `priority 40` and the service checks will determine which control plane node is the master VRRP instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The control plane keepalived configuration uses service checks
Can you provide details on the configured health checks... protocol, address, etc..
| The APIVIP is provided by the user | ||
| via the `install-config.yaml` [parameter](https://github.com/openshift/installer/blob/master/pkg/types/baremetal/platform.go#L57-L63) | ||
| or `openshift-installer` terminal prompts. The machine config operator does the inital | ||
| render of the pod spec and configuration template. The initContainer does the final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The machine config operator does the inital
render of the pod spec and configuration template. The initContainer does the final
templating of the configuration withbaremetal-runtimecfg.
let's keep a separate paragraph for baremetal-runtimecfg. Also can we provide details around what inputs does this take/provide as API...??
99e3942 to
3358e4b
Compare
3358e4b to
3ce8f16
Compare
3ce8f16 to
46b36e6
Compare
46b36e6 to
c978f7e
Compare
| The minimal requirements includes: | ||
| * Internal DNS: | ||
| - hostname resolution for masters and workers nodes. | ||
| - `api-int` hostname resolution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it missing "wildcard apps subdomain resolution?
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, jcpowermac The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The proposal section was slightly modified from: https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md