Skip to content

[WIP] Hostlevel DNS/connectivity pre-flight checks#5883

Closed
bogdando wants to merge 1 commit intoopenshift:masterfrom
bogdando:dns_preflight
Closed

[WIP] Hostlevel DNS/connectivity pre-flight checks#5883
bogdando wants to merge 1 commit intoopenshift:masterfrom
bogdando:dns_preflight

Conversation

@bogdando
Copy link
Contributor

  • Add a common pre-flight check to test cluster nodes DNS resolve and
    a smoke connectivity test with ICMP ping.
  • Add openshift_override_resolve_check, openshift_override_icmp_check
    (default to False) to allow overriding those checks.
  • Add an auto-discovery stub for the added connectivity check and
    tag to be listed alongside the *.py scripted checks and tags.
  • Trigger those checks from the common checks pre-install playbook.
    Included as a standard ansible task for now.
  • Also trigger the checks from the common checks adhock playbook, if
    openshift_checks has the '@connectivity' tag.
    Included as a standard ansible task for now.
  • Allow the connectivity checks to be directly invoked as a playbook
    as well.

TODO implement the check playbook invocation in the connectivity.py
to replace the include directives.

Signed-off-by: Bogdan Dobrelya bdobreli@redhat.com

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 25, 2017
@bogdando bogdando changed the title Hostlevel DNS/connectivity pre-flight checks [WIP] Hostlevel DNS/connectivity pre-flight checks Oct 25, 2017
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Oct 25, 2017
@rh-atomic-bot
Copy link

Can one of the admins verify this patch?
I understand the following commands:

  • bot, add author to whitelist
  • bot, test pull request
  • bot, test pull request once

@bogdando
Copy link
Contributor Author

bogdando commented Oct 25, 2017

@e-minguez @tomassedovic @sdodson ^^ PTAL

@bogdando
Copy link
Contributor Author

The check and tag supports autodiscovery and listed alongside the other checks available, although is managed externally with an ansible playbook yet.

shell: getent ahostsv4 {{ item }}
register: lookuphost
changed_when: false
failed_when: false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't work yet, wip

@bogdando
Copy link
Contributor Author

example commands:

ansible-playbook -v openshift-ansible/playbooks/byo/openshift-checks/adhoc.yml -e openshift_disable_check=None -e openshift_checks=@connectivity,@pre-flight,disk_availability

ansible-playbook -v openshift-ansible/playbooks/byo/openshift-checks/pre-install.yml

ansible-playbook -v openshift-ansible/playbooks/byo/openshift-checks/connectivity.yml

@bogdando bogdando changed the title [WIP] Hostlevel DNS/connectivity pre-flight checks Hostlevel DNS/connectivity pre-flight checks Oct 25, 2017
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 25, 2017
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 28, 2017
* Add a common pre-flight check to test cluster nodes DNS resolve and
  a smoke connectivity test with ICMP ping.
* Add openshift_override_resolve_check, openshift_override_icmp_check
  (default to False) to allow overriding those checks.
* Add an auto-discovery stub for the added connectivity check and
  tag to be listed alongside the *.py scripted checks and tags.
* Trigger those checks from the common checks pre-install playbook.
  Included as a standard ansible task for now.
* Also trigger the checks from the common checks adhock playbook, if
  openshift_checks has the '@connectivity' tag.
  Included as a standard ansible task for now.
* Allow the connectivity checks to be directly invoked as a playbook
  as well.

TODO implement the check playbook invocation in the connectivity.py
to replace the include directives.

Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 30, 2017
@michaelgugino michaelgugino self-assigned this Nov 6, 2017
@michaelgugino
Copy link
Contributor

/ok-to-test

@openshift-ci-robot openshift-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 6, 2017
@bogdando
Copy link
Contributor Author

bogdando commented Nov 8, 2017

^^
3.7 branch is forked hopefully we can "unfreeze" with code reviews now? :)

@sosiouxme
Copy link
Member

@bogdando sorry for the delay, still catching up after vacation.

Seems like it would be a useful check to check host inter-connectivity. Could it be brought in line with how the rest of the checks work? As opposed to the way Ansible runs and quits on the first task failure, we want to run all the checks and report back a summary of possibly-multiple failures. That's why they're all in python modules, not that we don't love Ansible tasks. If the logic were in roles/openshift_health_checker/openshift_checks/connectivity.py then most of the rest of this would be unnecessary - the check would automatically run under the pre-install playbook or from adhoc on request, and we would not need separate variables to disable.

I'm also trying to think of the different scenarios where this might give a false positive. In a cloud environment (openstack, EC2, ...) the hosts might not be able to reach each other on their external IPs and yet can reach each other fine internally. The hostnames in the inventory file might be aliased in ssh config and not actually intended to resolve at all, while each host can define internal and external names for use with openshift. My feeling would be we should be having the hosts try to reach each other on the internal name if set. It's not necessary to have the check be accurate in every single conceivable scenario (false positives can be overridden), but it should only flag things that are pretty likely to be a problem.

@bogdando bogdando changed the title Hostlevel DNS/connectivity pre-flight checks [WIP] Hostlevel DNS/connectivity pre-flight checks Nov 10, 2017
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 10, 2017
@bogdando
Copy link
Contributor Author

bogdando commented Nov 10, 2017

@sosiouxme thanks for inputs!

That's why they're all in python modules, not that we don't love Ansible tasks

Yeah, I suspected that. That becomes a tough task, but I'll do my best :)

we would not need separate variables to disable.
the hosts might not be able to reach each other on their external IPs and yet can reach each other fine
internally. The hostnames in the inventory file might be aliased in ssh config and not actually intended
to resolve at all, while each host can define internal and external names for use with openshift. My
feeling would be we should be having the hosts try to reach each other on the internal name if set. It's
not necessary to have the check be accurate in every single conceivable scenario (false positives can
be overridden), but it should only flag things that are pretty likely to be a problem

right, could you point me if there is a 'soft-fail' mode for a check, like an issued warning?
Also, I'm quite a new to openshift-ansible and not too sure where to look for variables representing internal names for nodes, for all places and cases :) Like, when I deploy my test env, I can see the master-api would fail, if the public hostname can't be resolved, even though I've changed the hostnames to internal FQDNs and expect only those to be contacted anywhere (I have a related WIP patch. And certificates issued for public FQDNs might be a tricky place to touch as well... So I'd appreciate some guidence on that internal vs external configuration nuances. Thanks!

wrt

The hostnames in the inventory file might be aliased in ssh config and not actually intended
to resolve at all

I suppose that would be a major issue, especially for --net=host pods, which rely on the hosts names resolution AFAICT. But I can do this a soft-fail or overridable, no problems.

@sosiouxme
Copy link
Member

sosiouxme commented Nov 10, 2017 via email

@bogdando
Copy link
Contributor Author

@sosiouxme thank you for an example!

BTW is there any particular motivation for this check, e.g. have you seen
resolution/connectivity as a common problem in trial installs? What are the
common symptoms?

The motivation is to help those poor souls who have to deploy a DIY DNS solution
alongside a cloud-hosted openshift cluster, like we're used to do in the shift-on-stack team :)
Unless we have a better solution, like DNSaaS supported natively to provide a smooth UX for OpenShift users on OpenStack.

The hosts defined in the inventory don't need to have any relationship to
the hostnames that actually get used in the cluster; in the example above
that first field could be "foobar" and as long as my ssh config specified
how to reach "foobar" (key, user, host), ansible could reach it and the
actual names and IPs used in the cluster would come from the openshift_
parameters on the host.

right, indeed. IIUC, that's only the case for static inventory and static SSH config?
What if we want to only use a dynamic inventory?

@sosiouxme
Copy link
Member

sosiouxme commented Nov 10, 2017 via email

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 18, 2017
@openshift-merge-robot
Copy link
Contributor

@bogdando PR needs rebase

@bogdando
Copy link
Contributor Author

So the use case is not relevant anymore, we do not want DIY DNS setups to deploy/verify et al

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants