[WIP] Hostlevel DNS/connectivity pre-flight checks#5883
[WIP] Hostlevel DNS/connectivity pre-flight checks#5883bogdando wants to merge 1 commit intoopenshift:masterfrom bogdando:dns_preflight
Conversation
|
Can one of the admins verify this patch?
|
|
@e-minguez @tomassedovic @sdodson ^^ PTAL |
|
The check and tag supports autodiscovery and listed alongside the other checks available, although is managed externally with an ansible playbook yet. |
| shell: getent ahostsv4 {{ item }} | ||
| register: lookuphost | ||
| changed_when: false | ||
| failed_when: false |
There was a problem hiding this comment.
this doesn't work yet, wip
|
example commands: |
* Add a common pre-flight check to test cluster nodes DNS resolve and a smoke connectivity test with ICMP ping. * Add openshift_override_resolve_check, openshift_override_icmp_check (default to False) to allow overriding those checks. * Add an auto-discovery stub for the added connectivity check and tag to be listed alongside the *.py scripted checks and tags. * Trigger those checks from the common checks pre-install playbook. Included as a standard ansible task for now. * Also trigger the checks from the common checks adhock playbook, if openshift_checks has the '@connectivity' tag. Included as a standard ansible task for now. * Allow the connectivity checks to be directly invoked as a playbook as well. TODO implement the check playbook invocation in the connectivity.py to replace the include directives. Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
|
/ok-to-test |
|
^^ |
|
@bogdando sorry for the delay, still catching up after vacation. Seems like it would be a useful check to check host inter-connectivity. Could it be brought in line with how the rest of the checks work? As opposed to the way Ansible runs and quits on the first task failure, we want to run all the checks and report back a summary of possibly-multiple failures. That's why they're all in python modules, not that we don't love Ansible tasks. If the logic were in I'm also trying to think of the different scenarios where this might give a false positive. In a cloud environment (openstack, EC2, ...) the hosts might not be able to reach each other on their external IPs and yet can reach each other fine internally. The hostnames in the inventory file might be aliased in ssh config and not actually intended to resolve at all, while each host can define internal and external names for use with openshift. My feeling would be we should be having the hosts try to reach each other on the internal name if set. It's not necessary to have the check be accurate in every single conceivable scenario (false positives can be overridden), but it should only flag things that are pretty likely to be a problem. |
|
@sosiouxme thanks for inputs!
Yeah, I suspected that. That becomes a tough task, but I'll do my best :)
right, could you point me if there is a 'soft-fail' mode for a check, like an issued warning? wrt
I suppose that would be a major issue, especially for |
|
On Fri, Nov 10, 2017 at 10:07 AM, Bogdan Dobrelya ***@***.***> wrote:
That's why they're all in python modules, not that we don't love Ansible
tasks
Yeah, I suspected that. That becomes a tough task, but I'll do my best :)
If building the python module is too intimidating it's something we could
conceivably add to our team backlog.
BTW is there any particular motivation for this check, e.g. have you seen
resolution/connectivity as a common problem in trial installs? What are the
common symptoms?
right, could you point me if there is a 'soft-fail' mode for a check, like
an issues warning?
Ansible doesn't really have the concept of a warning. Your task either
fails and execution stops, or it creates some output that gets buried under
an avalanche of later task output so is useless unless someone is
specifically looking for it.
That's the current state, at least. While I don't see Ansible changing
that, I suspect at some point we'll create a callback plugin where checks
(and perhaps other tasks) can register a warning that doesn't halt
execution but does appear in the summary at the end of the run.
Also, I'm quite a new to openshift-ansible and not too sure where to look
for variables representing internal names for nodes, for *all* places and
cases :) Like, when I deploy my test env, I can see the master-api would
fail, if the public hostname can't be resolved, even if I changed the
hostnames to internal FQDNs (I have a related WIP patch
<openshift/openshift-ansible-contrib#845>. So I'd
appreciate some guidence on that internal vs external configuration
nuances. Thanks!
Well, here's an example host entry for an AWS host:
ec2-34-229-60-235.compute-1.amazonaws.com openshift_ip=172.18.1.43
openshift_public_ip=34.229.60.235
openshift_hostname=ip-172-18-1-43.ec2.internal openshift_public_hostname=
ec2-34-229-60-235.compute-1.amazonaws.com openshift_schedulable=True
openshift_node_labels="{'region': 'infra', 'zone': 'east'}"
I would recommend consulting openshift_hostname, falling back to
openshift_public_hostname, falling back to the inventory hostname.
I suppose that would be a major issue, especially for --net=host pods,
which rely on the hosts names resolution AFAICT. But I can do this a
soft-fail or overridable, no problems.
The hosts defined in the inventory don't need to have any relationship to
the hostnames that actually get used in the cluster; in the example above
that first field could be "foobar" and as long as my ssh config specified
how to reach "foobar" (key, user, host), ansible could reach it and the
actual names and IPs used in the cluster would come from the openshift_
parameters on the host.
|
|
@sosiouxme thank you for an example!
The motivation is to help those poor souls who have to deploy a DIY DNS solution
right, indeed. IIUC, that's only the case for static inventory and static SSH config? |
|
right, indeed. IIUC, that's only the case for static inventory and static
SSH config?
What if we want to only use a dynamic inventory?
The initial hostname is what Ansible uses to reach the host with ssh. The
other parameters are all optional for being more specific about what
OpenShift is going to use. If the inventory (dynamic or static) doesn't
specify them, the playbooks just default to using the same hostname as
Ansible does. Just saying, have to look at the parameters if they exist.
There might also be some logic in use that already normalizes all this into
a fact to avoid the need to look up multiple things...
I don't think anything internal refers to external hostnames (if they're
different). Master certificates ought to be valid for internal or external
names. Generally the only things that need to resolve externally are the
API and any domains, which ought to be going through a LB...
|
|
@bogdando PR needs rebase |
|
So the use case is not relevant anymore, we do not want DIY DNS setups to deploy/verify et al |
a smoke connectivity test with ICMP ping.
(default to False) to allow overriding those checks.
tag to be listed alongside the *.py scripted checks and tags.
Included as a standard ansible task for now.
openshift_checks has the '@connectivity' tag.
Included as a standard ansible task for now.
as well.
TODO implement the check playbook invocation in the connectivity.py
to replace the include directives.
Signed-off-by: Bogdan Dobrelya bdobreli@redhat.com