Skip to content

Conversation

@tomassedovic
Copy link
Contributor

another investigative PR for the bizzare CI stuff we're seeing for #1959

This adds opens all SGs (maybe we've missed some?) as well as adds SSH keys so we can log in to the bootstrap node and investigate what's wrong.

trown and others added 9 commits August 5, 2019 16:19
For platforms without a DNS as a Service available, there is a
chicken/egg scenario. This is because we cannot setup our own DNS
solution on the nodes before they are booted via ignition.

Instead, we get the ignition config via a virtual IP that is
conifgured and maintained by keepalived. The setup of keepalived
will be done via machine-config-operator, so it is not part of
this patch.

This patch only sets up the plumbing so that we can merge the
keepalived work while keeping CI green. Then we can put up a patch
that will switch to actually using this functionality.

This patch also adds an interface for other platforms that need this
functionality to work from. Specifically, these PRs can be rebased
on this work:
#1873
#1948
The experimental OpenStack backend used to create an extra server
running DNS and load balancer services that the cluster needed.
OpenStack does not always come with DNSaaS or LBaaS so we had to provide
the functionality the OpenShift cluster depends on (e.g. the etcd SRV
records, the api-int records & load balancing, etc.).

This approach is undesirable for two reasons: first, it adds an extra
node that the other IPI platforms do not need. Second, this node is a
single point of failure.

The Baremetal platform has faced the same issues and they have solved
them with a few virtual IP addresses managed by keepalived in
combination with coredns static pod running on every node using the mDNS
protocol to update records as new nodes are added or removed and a
similar static pod haproxy to load balance the control plane internally.

The VIPs are defined here in the installer and they use the
PlatformStatus field to be passed to the necessary
machine-config-operator fields:

openshift/api#374

The Bare Metal IPI Networking Infrastructure document is applicable here as
well:

https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md

There is also a great opportunity to share some of the configuration
files and scripts here.

This change needs several other pull requests:

Keepalived plus the coredns & haproxy static pods in the MCO:
openshift/machine-config-operator#740

Passing the API and DNS VIPs through the installer:
#1998

Co-authored-by: Emilio Garcia <[email protected]>
Co-authored-by: John Trowbridge <[email protected]>
Co-authored-by: Martin Andre <[email protected]>
Co-authored-by: Tomas Sedovic <[email protected]>

Massive thanks to the Bare Metal and oVirt people!
This adds validations that verify the OpenStack VIPs have the expected
values. These are currently not expected to be configured by the user
so they're just compared to the expected values.

Since the MachineCIDR is not part of the Platform struct, we need to
pass it to OpenStack's ValidatePlatform.
Previously the service VM would proxy all traffic to the masters and
workers.

With this new architecture the worker nodes are addressed via an
ingress VIP local to the cluster, and need to be reachable from the
outside in order to serve the *.apps. We're attaching a floating IP to
the port, that maps to the ingress VIP shared among the workers.
The `APIVIP`, `DNSVIP` and `IngressVIP` values not supposed to be
user-configurable. They're set by the installer and setting them to
a value other than the expected one will likely result in an error.

Therefore, this removes them from `InstallConfig.Platform` in favour
of free functions that return the expected values.
We'd removed the VIPs from the install-config, but not the test data
itself.
This should let us log into the bootstrap node to see what's going on.

This is a workaround until we've got must-gather implemented.
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tomassedovic
To complete the pull request process, please assign smarterclayton
You can assign the PR to them by writing /assign @smarterclayton in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 8, 2019
@tomassedovic
Copy link
Contributor Author

/label platform/openstack

@tomassedovic
Copy link
Contributor Author

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 8, 2019
@tomassedovic
Copy link
Contributor Author

/close

@openshift-ci-robot
Copy link
Contributor

@tomassedovic: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@tomassedovic: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-openstack f1c4756 link /test e2e-openstack
ci/prow/e2e-aws-scaleup-rhel7 f1c4756 link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. platform/openstack size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants