Add playbook to run adhoc health checks or list existing checks #4570

rhcarvalho · 2017-06-23T14:51:32Z

This is useful while developing/testing health checks.

One step towards making checks more discoverable and easier to run.

juanvallejo · 2017-06-23T21:40:16Z

roles/openshift_health_checker/action_plugins/openshift_health_check.py

+            args = self._task.args
+            requested_checks = normalize(args.get('checks', []))
+
+            if not requested_checks:


maybe I'm missing something, but wouldn't this cause other playbooks like health to fail here if they don't have a checks argument passed to them as well?

Yes, they would fail if either checks is not passed or is passed and is empty.

Did I overlook some genuine case when we want to call the action plugin without arguments?

Note that it fails because that's the easiest way to cause text to go to stdout/stderr (without messing with the verbosity level).

Note that it fails because that's the easiest way to cause text to go to stdout/stderr (without messing with the verbosity level)

That makes sense.

Did I overlook some genuine case when we want to call the action plugin without arguments?

No, I had misunderstood this block, but looking at it again with context, it makes sense.

rhcarvalho · 2017-06-26T09:58:33Z

playbooks/byo/openshift-checks/adhoc.yml

+  pre_tasks:
+  - name: Print list of known health checks
+    action: openshift_health_check
+    when: checks is undefined or not checks


To print a list of known checks:

$ ansible-playbook -i <inventory file> playbooks/byo/openshift-checks/adhoc.yml

But that's not documented (yet). The problem with the current output is that it repeats for as many hosts as you have in the OSEv3 group.

I would like to de-duplicate error messages by grouping together hosts that had the same error.

De-duplication will probably be important for clusters with 10s of hosts. Might need to put some thought into how "same" the messages need to be to be folded together.

sosiouxme · 2017-06-26T13:17:11Z

You'll want to adjust this to eval groups like #4495 once that merges.

rhcarvalho · 2017-06-26T13:23:21Z

@sosiouxme thanks for pointing it out!

sosiouxme

Couple things, otherwise looks good.

sosiouxme · 2017-06-26T19:55:44Z

playbooks/byo/openshift-checks/adhoc.yml

+  roles:
+  - openshift_health_checker
+  vars:
+  - r_openshift_health_checker_playbook_context: adhoc


nit: if running the adhoc playbook they expect to run checks, so adhoc should be added to the list that gets the less-info summary.

sosiouxme · 2017-06-26T20:13:26Z

playbooks/byo/openshift-checks/adhoc.yml

+  pre_tasks:
+  - name: Print list of known health checks
+    action: openshift_health_check
+    when: checks is undefined or not checks


De-duplication will probably be important for clusters with 10s of hosts. Might need to put some thought into how "same" the messages need to be to be folded together.

sosiouxme · 2017-06-26T20:20:45Z

roles/openshift_health_checker/action_plugins/openshift_health_check.py


        try:
            known_checks = self.load_known_checks()
+            args = self._task.args
+            requested_checks = normalize(args.get('checks', []))


While we're at it... shouldn't this be named r_openshift_health_checker_names or some such? The adhoc playbook could still pass it in from the value in checks or whatever we want the user to specify.

sosiouxme · 2017-07-21T10:53:11Z

playbooks/byo/openshift-checks/README.md

@@ -27,6 +27,9 @@ callback plugin summarizes execution errors at the end of a playbook run.
 3. Certificate expiry playbooks ([certificate_expiry](certificate_expiry)) -
   check that certificates in use are valid and not expiring soon.

+4. Adhoc playbook ([adhoc.yml](adhoc.yml)) - useful for running adhoc checks.
+   See the next section for an usage example.


s/an/a/

("y" sound at beginning of word apparently doesn't feel like a vowel... English is weird)

Fixed, thanks!

sosiouxme · 2017-07-21T11:03:34Z

playbooks/byo/openshift-checks/adhoc.yml

@@ -0,0 +1,16 @@
+---
+- name: OpenShift health checks
+  hosts: OSEv3


This playbook needs to follow the pattern of the other check playbooks, invoking a parallel playbooks/common/openshift-checks/adhoc.yml playbook and evaluating groups along the way. openshift_version chokes otherwise. Also at some point we'll want to switch from using BYO group names to the common group names in our is_active methods.

I see. I have two questions about how that is working now, maybe you could help:

Why do we include: ../openshift-cluster/initialize_groups.yml in byo/openshift-checks and then include: ../openshift-cluster/evaluate_groups.yml in common/openshift-checks?

Why do we have tags: always in include: ../openshift-cluster/evaluate_groups.yml, but not in the evaluate_groups include?

I'm noticing those includes make the playbook run considerably slower, though I didn't measure time explicitly 😢

For the case when there are no checks to be run (and we print a list of available checks), the different is 6s without includes, 8s with includes.

How are you getting it to run without those includes at all? I get:

Play: OpenShift health checks Task: openshift_version : Set rpm version to configure if openshift_pkg_version specified Message: The conditional check 'inventory_hostname in groups['oo_masters_to_config'] or inventory_hostname in groups['oo_nodes_to_config']' failed. The error was: error while evaluating conditional (inventory_hostname in groups['oo_masters_to_config'] or inventory_hostname in groups['oo_nodes_to_config']): Unable to look up a name or access an attribute in template string ({% if inventory_hostname in groups['oo_masters_to_config'] or inventory_hostname in groups['oo_nodes_to_config'] %} True {% else %} False {% endif %}). Make sure your variable name does not contain invalid characters like '-': argument of type 'StrictUndefined' is not iterable The error appears to have been in '/home/lmeyer/go/src/github.com/openshift/openshift-ansible/roles/openshift_version/tasks/set_version_rpm.yml': line 2, column 3, but may be elsewhere in the file depending on the exact syntax problem.

Yup, I get the same. It worked against an older version of master.

With a patch to logging.py to return {}, what is not even a real check:

$ time ansible-playbook -i hosts playbooks/byo/openshift-checks/adhoc.yml -e checks=logging -vvv ... PLAY RECAP *************************************************************************************************************************************** localhost : ok=8 changed=0 unreachable=0 failed=0 master1 : ok=36 changed=2 unreachable=0 failed=0 master2 : ok=36 changed=2 unreachable=0 failed=0 node1 : ok=36 changed=2 unreachable=0 failed=0 node2 : ok=36 changed=2 unreachable=0 failed=0 real 1m42.949s user 0m36.358s sys 0m30.438s

So that's more than a minute and a half to run the simplest of the checks 😢

But anyway, do you understand why the includes are in different files and why the tag always is only in one of them?

Why do we include: ../openshift-cluster/initialize_groups.yml in
byo/openshift-checks and then include: ../openshift-cluster/evaluate_
groups.yml in common/openshift-checks?

The first is for translating BYO group names into g_ names. It's a pretty
simple translation in BYO, more complex for other starting points. Then
these names are fed into the second which further defines oo_first_master
and such that are to actually be used in the roles.

Why do we have tags: always in include: ../openshift-cluster/evaluate_
groups.yml, but not in the evaluate_groups include?

This is probably cruft. Basically nothing actually works if you specify
tags (maybe some very narrow dev use cases).

All the other roles that run before openshift_health_checker are a problem :( and not just because it's slow. I wish we could separate them out into roles that gather info and set variables versus those which perform an action. Unfortunately there are a few actions (installs) necessary even to gather info. But it could be that those aren't necessary for our use case, and we could further divide the info-gatherers into fast and slow and just stick to info that can be gathered quickly for our checks. Maybe. I don't really look forward to that refactoring but running the checks is pretty heavy like this.

FWIW, the original comment has been addressed and it is now in two separate playbooks, following the same pattern as others.

sosiouxme · 2017-07-21T11:06:29Z

playbooks/byo/openshift-checks/roles

@@ -0,0 +1 @@
+../../../roles


I think it is not necessary when invoking the playbooks from common.

Yes, common already has a symlink. I think back when I started this I was exploring how to have less layers to get something started, but definitely we should follow a single pattern.

Symlink removed.

sosiouxme · 2017-07-21T11:12:41Z

playbooks/byo/openshift-checks/adhoc.yml

+  - name: Run health checks
+    action: openshift_health_check
+    args:
+      checks: '{{ checks | default([]) }}'


Debating whether the checks inventory variable should be qualified. Inputs for most other things are expected to be openshift_<whatever> usually, so maybe openshift_checks? It seems kind of pedantic since this is specific to the playbook. But it's consistent.

Yes, I agree. I think we went with checks for making it easier to type.

Still, I can't say I'm happy with such a long command:

$ ansible-playbook -i hosts playbooks/byo/openshift-checks/adhoc.yml -e checks=fluentd

In this case oc diagnostics (that could have a short alias like oc diag) is much easier on the eyes and less keystrokes. Surely keystrokes matter less for scripting.

If you think openshift_checks is more appropriate, we can plan a migration to that.

What we pass in as an arg to the action plugin is irrelevant, as that's not an inventory variable. But we don't have a checks inventory variable before this PR so there's no migration necessary to change that.

You think that command line is long until you type the commands for running via the container... :)

I think the user should have openshift_checks for consistency with openshift_disable_checks...

args: checks: '{{ openshift_checks | default([]) }}'

Made it openshift_checks, thanks for pointing it out.

There is still room for debate when we compare with disable, because the latter is in singular form: openshift_disable_check...

rhcarvalho · 2017-08-03T15:22:13Z

@sosiouxme this is ready for another pass of review.

sosiouxme

Long PR... not sure I got it all. Anyway, some questions, some suggestions.

sosiouxme · 2017-08-03T18:20:16Z

playbooks/common/openshift-checks/adhoc.yml

+  pre_tasks:
+  - name: List known health checks
+    action: openshift_health_check
+    when: openshift_checks is undefined or not openshift_checks


👍 nice that this runs almost instantly

This is good for the two cases in the condition, but doesn't cover other cases such as when there is a typo in a check name.

I was thinking we could have a "preflight check" for the openshift_health_check plugin itself, implemented in the plugin perhaps with a flag like dry-run: true. But that doesn't have to be part of this PR.

pre_tasks always run before roles. Did you mean to use post_tasks here? The order of execution for play stages is:

- hosts: foo vars: pre_tasks: roles: tasks: post_tasks:

Putting these items in a different order in the yaml file does not change execution order. Same with vars:, they are defined at the beginning of the play.

sosiouxme · 2017-08-03T18:27:33Z

roles/openshift_health_checker/action_plugins/openshift_health_check.py

+                result['failed'] = True
+                result['msg'] = list_known_checks(known_checks)
+                return result
+
            resolved_checks = resolve_checks(requested_checks, known_checks.values())


Below if they specify an unknown name/tag they get "Make sure there is no typo in the playbook and no files are missing." Surely with the introduction of this playbook it is much more likely that the user actually specified the wrong name/tag.

Yes, the observation is correct. Do you suggest some change to the message?

Well in that case it's probably not a problem with the code we shipped, but rather the user's input... They can do something about it without looking through our Ansible code. So something like "Make sure you have specified only valid check tags or names. Valid tags and names are as follows..."

Right. We could list the existing checks.

I was thinking we may also detect that etcd is a tag name and should be spelled @etcd -- I made that mistake ;-)

Update: now the error message when the requested check is unknown includes the list of known checks and tags.

sosiouxme · 2017-08-03T20:06:36Z

roles/openshift_health_checker/action_plugins/openshift_health_check.py


        try:
            known_checks = self.load_known_checks(tmp, task_vars)
            args = self._task.args
            requested_checks = normalize(args.get('checks', []))
+
+            if not requested_checks:
+                result['failed'] = True


I'm not thrilled about having it show up as a failure but I can't think of anything better for now. Possibly the callback plugin could display differently so it doesn't look like a failure.

I'm also not happy with it being a failure, but that's the most expedient way to get it working without adding yet more features.

An alternative would be to have either a separate action plugin or a different operation mode of the same plugin specific to list the checks, the latter being very similar to what we have now.

The difficulty of printing arbitrary text remains. Ansible will print its operational information, and showing anything else would require either a flag or a callback plugin.

sosiouxme · 2017-08-03T20:14:09Z

roles/openshift_health_checker/action_plugins/openshift_health_check.py

+    try:
+        return check.run()
+    except Exception as exc:
+        return dict(failed=True, msg=str(exc))


In the case of an OpenShiftCheckException we probably only want to show the text. But for any other exception, it's probably a bug in the check and I would want to get a stack trace. If not in the user-facing message, maybe as another field in the check return dict so it shows up in -v? We could also give a little preamble explaining that it's an unexpected error while running the check before dumping out "KeyError: blah blah blah" that's not meaningful to the user.

I like your idea, thank you. Will make it give out more context on arbitrary errors.

For background on how it became this way: I had two except clauses, one was the original except OpenShiftCheckException and the other was the general except Exception; I dropped the first after making both handle the exception in the same way...

Update: I made it include a traceback on arbitrary exceptions using the exception key as described in http://docs.ansible.com/ansible/latest/common_return_values.html#exception

sosiouxme · 2017-08-03T21:04:54Z

roles/openshift_health_checker/action_plugins/openshift_health_check.py

+    # TODO: we could include a description of each check by taking it from a
+    # check class attribute (e.g., __doc__) when building the message below.
+    msg = (
+        'This playbook is meant to run health checks, but no checks were '


could these have \n so it's not so widescreen...

Hmm, I'm relying on the terminal doing the line wrapping...
With manual line breaks it looks weird when the text is re-flowed to fit the screen width.

Manually inserting new lines in strings is rather brittle, sort of like font tags in HTML?

I know what you mean, it looks bad if your terminal is thinner than the arbitrary point where we've placed newlines so it wraps anyway. I guess the real issue with that is on the check outputs since they're shown as indented blocks - tried to keep those short enough so that most people won't see them wrap. If we wanted to do that right we'd need to know the width of the terminal and have a real formatter on the output...

Just in general though I think for prose it makes sense to keep lines pretty short, like in a book. I may have a terminal that's 200 columns wide to look at logs and code and JSON, but it doesn't mean I want to read text at that width. I suppose ideally we'd have a formatter that broke on whitespace at the shorter of the terminal width or a comfortable-to-read width.

Well, I don't really want to start a typographical debate, just saying personally I would wrap it for legibility and consistency... it stuck out at me.

I agree with you, though the difficulty is missing the tools to do it right as you say. We write using some display object; it might mean writing to stdout, but it could be anything, even a file that is not a terminal.

sosiouxme · 2017-08-03T22:47:15Z

roles/openshift_health_checker/callback_plugins/zz_failure_summary.py

+    failures = [failure_to_dict(failure) for failure in failures]
+    failures = deduplicate_failures(failures)
+
+    summary = [u'', u'', u'Failure summary:', u'']


is there a reason for prefixing everything with u?

Yes, though my memory is a bit vague on the details; looking it up.

Ansible's stringc function operates on unicode strings (Python 2). If you give it a str with non-ascii content it will lead to an implicit conversion to the default encoding (normally ascii unless customized on the system level) and to potential decoding exceptions. We may ourselves have nothing outside of the ASCII range, but keep in mind that on systems with different locales a system message may bubble up and break our code.

In Python 3, strings are always unicode and the u prefix is simply ignored (it was added to the language at some point past 3.0 to bring backward-compatibility with Python 2 code).

There might be similar considerations to using the display object, though I did not revisit it now.

Generally speaking, it is good practice to work internally with unicode strings, decoding from a known charset when reading and encoding when writing.

sosiouxme · 2017-08-04T19:39:32Z

roles/openshift_health_checker/action_plugins/openshift_health_check.py

+            name = cls.name
+            if name in known_checks:
+                other_cls = known_checks[name].__class__
+                msg = "non-unique check name '{}' in: '{}' and '{}'".format(


So this looks better than before but... with a personal bias toward brevity... how about:

raise OpenShiftCheckException( "duplicate check name '{}' in: '{}' and '{}'" "".format(name, full_class_name(cls), full_class_name(other_cls)) )

("non-unique" feels like a double negative and takes a second to process. formatters line up somewhat with contents...)

sosiouxme · 2017-08-04T20:44:46Z

roles/openshift_health_checker/callback_plugins/zz_failure_summary.py

+    hosts are groupped together in a single entry. The relative order of
+    failures is preserved.
+    """
+    groups = defaultdict(list)


What you have below is clever but hard for me at least to follow, and it seems like you could do it in one pass through the loop without altering the original failure results:

groups = {} result = [] for failure in failures: # group the failures that are identical except for host group_key = tuple(sorted((key, value) for key, value in failure.items() if key != 'host')) if group_key not in groups: copy = failure.copy() copy['host'] = [] groups[group_key] = copy result.append(copy) groups[group_key]['host'].append(failure['host'])

One other thought... why isn't the group key just the msg? What else do we expect to care about?

could do it in one pass through the loop without altering the original failure results

Could be done in one pass. Originally, I had intended to have the two stages in separate functions and I wanted to give a name to each part instead of doing it all together. Naming was hard, and I could only think of "deduplicate_failures", though the multi-stage algorithm stayed.

The first part is concerned with grouping, the second part is concerned with reducing items from a group to a single item, keeping the original order.

Note that failure.copy() does a shallow copy. While the property of not mutating the input is appreciated, in this case it seemed not necessary because we're anyway mutating and overwriting failures. Obviously, we could revisit the whole thing.

why isn't the group key just the msg? What else do we expect to care about?

If the playbook / task / etc is different, we shall not mix the failures. Similarly if the message of two failures is "One or more checks failed" -- we can't group hosts together unless the check failures are the same, or we'd be throwing away information about check results.

sosiouxme · 2017-08-04T20:47:21Z

roles/openshift_health_checker/callback_plugins/zz_failure_summary.py

+    """Group together similar failures from different hosts.
+
+    Returns a new list of failures such that similar failures from different
+    hosts are groupped together in a single entry. The relative order of


"grouped"

also "similar" is a bit vague when it really means "identical except for host" (for now) so I'd s/similar/identical/

Thanks, will be updated in the next push.

sosiouxme · 2017-08-04T21:18:48Z

roles/openshift_health_checker/test/zz_failure_summary_test.py

@@ -0,0 +1,63 @@
+from zz_failure_summary import deduplicate_failures


👍 it's a start, thanks!

rhcarvalho · 2017-08-08T13:12:14Z

@sosiouxme could you please review the documentation changes? Wondering if that is clear enough. Being inside of a role is not the most visible location, but it is good to at least have a place to point people to. The docs here would be the base for something more in the official docs.

rhcarvalho · 2017-08-22T15:31:21Z

@sosiouxme @juanvallejo if you have some time, could you please try this out and give feedback?

I'm using it like this:

List available checks

$ ansible-playbook -i hosts playbooks/byo/openshift-checks/adhoc.yml

This is notably unintuitive, and feels equivalent to calling a command without the required arguments and getting a usage error message, though it does serve the purpose of listing checks and tags.

It also made me realize an inconsistency with pre-install and preflight tags.

Run a single check

$ ansible-playbook -i hosts playbooks/byo/openshift-checks/adhoc.yml -e openshift_checks=disk_availability

Run multiple checks

by tag

$ ansible-playbook -i hosts playbooks/byo/openshift-checks/adhoc.yml -e openshift_checks=@preflight

by name, comma-separated

$ ansible-playbook -i hosts playbooks/byo/openshift-checks/adhoc.yml -e openshift_checks=docker_storage,disk_availability

by name, YAML

$ ansible-playbook -i hosts playbooks/byo/openshift-checks/adhoc.yml -e '{"openshift_checks": ["docker_storage", "disk_availability"]}'

Except for this last one, these use cases are all described in openshift-checks/README.md. I omit the last one for it is not very convenient for command line usage.

rhcarvalho · 2017-08-22T16:13:32Z

playbooks/common/openshift-checks/adhoc.yml

+# first and that takes time. To speed up listing checks, we use a separate play
+# that runs before the include to save time and improve the UX.
+- name: OpenShift health checks
+  hosts: OSEv3


@sosiouxme, in trying to change from OSEv3 to oo_all_hosts (empty group before initialization), I realized we could as well only run this play in localhost. The downside is it will not play nice with host-specific variables.
WDYT?

I think yes, run it against localhost so we don't have to init groups at this point nor put OSEv3 in a common playbook. The only inconsistency would be if they didn't define openshift_checks globally but only on individual hosts, and I think that's not a very important use case.

Agreed on not using OSEv3 in common playbooks.

sosiouxme

LGTM aside from last comment and unit test failures

This is useful on its own, and also aids in developing/testing new checks that are not part of any playbook. Since the intent when running this playbook is to execute checks, opt for a less verbose explanation on the error summary.

This is a simple mechanism to learn what health checks are available. Note that we defer task_vars verification, so that we can compute requested_checks and resolved_checks earlier, allowing us to list checks even if openshift_facts has not run.

This prevents an exception in one check from interfering with other checks. Skips checks that raise an exception in their is_active method. Whenever capturing a broad exception in the `is_action` or `run` methods, include traceback information that can be useful in bug reports.

The intent is to deduplicate similar errors that happened in many hosts, making the summary more concise.

This serves two purposes: - Gracefully omit the summary if there was an error computing it, no confusion to the regular end user. - Provide a stacktrace of the error when running verbose, giving developers or users reporting bugs a better insight of what went wrong, as opposed to Ansible's opaque handling of errors in callback plugins.

And beautify the code a bit.

rhcarvalho · 2017-08-24T13:06:39Z

playbooks/byo/openshift-checks/adhoc.yml

+  - openshift_health_checker
+  vars:
+  - r_openshift_health_checker_playbook_context: adhoc
+  pre_tasks:


cc @mtnbikenc

At this point in time this could be any of pre_tasks, tasks or post_tasks and the observable end result would be the same.

pre_tasks run before any tasks defined in the role (there are none), and seemed to express what we want, to run the conditional as soon as possible.

pre_tasks also runs before any role dependencies from meta, which is also desirable here (though I suspect we can now remove openshift_facts from dependencies anyway).

rhcarvalho · 2017-08-24T13:21:51Z

aos-ci-test

rhcarvalho · 2017-08-24T13:23:21Z

playbooks/byo/openshift-checks/adhoc.yml

+  # individual hosts while not defined for localhost, we do not support that
+  # usage. Running this play only in localhost speeds up execution.
+  hosts: localhost
+  connection: local


@sosiouxme FYI, I made the listing of checks run on localhost only.

openshift-bot · 2017-08-25T00:22:46Z

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 51645ea (logs)

openshift-bot · 2017-08-25T00:24:56Z

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 51645ea (logs)

rhcarvalho · 2017-08-28T10:39:34Z

[merge]

rhcarvalho · 2017-08-28T13:41:19Z

Flake openshift/origin#16005

[merge]

openshift-bot · 2017-08-28T13:47:27Z

Evaluated for openshift ansible merge up to 51645ea

openshift-bot · 2017-08-28T15:23:46Z

continuous-integration/openshift-jenkins/merge Running (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/930/) (Base Commit: 60b91e4) (PR Branch Commit: 51645ea)

sdodson · 2017-08-28T17:08:33Z

flake openshift/origin#11873

rhcarvalho requested review from sosiouxme and juanvallejo June 23, 2017 14:51

juanvallejo reviewed Jun 23, 2017

View reviewed changes

rhcarvalho commented Jun 26, 2017

View reviewed changes

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 26, 2017

sosiouxme reviewed Jun 26, 2017

View reviewed changes

rhcarvalho force-pushed the adhoc-check-runner-misc branch 2 times, most recently from d2ccce6 to 44813ad Compare July 18, 2017 12:30

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 18, 2017

rhcarvalho force-pushed the adhoc-check-runner-misc branch 3 times, most recently from bfc4186 to aebc18e Compare July 21, 2017 09:44

sosiouxme reviewed Jul 21, 2017

View reviewed changes

rhcarvalho force-pushed the adhoc-check-runner-misc branch 4 times, most recently from 10aa64c to 3abbc6f Compare July 27, 2017 13:01

rhcarvalho mentioned this pull request Jul 27, 2017

Misc cleanup and input normalization #4899

Merged

rhcarvalho force-pushed the adhoc-check-runner-misc branch 3 times, most recently from b2a7fcf to f8a64cb Compare August 1, 2017 15:08

rhcarvalho force-pushed the adhoc-check-runner-misc branch 2 times, most recently from 0aa5ee6 to 90f2782 Compare August 3, 2017 15:18

sosiouxme reviewed Aug 4, 2017

View reviewed changes

rhcarvalho force-pushed the adhoc-check-runner-misc branch from 90f2782 to 4824662 Compare August 8, 2017 13:07

rhcarvalho mentioned this pull request Aug 8, 2017

openshift-checks: have playbooks invoke std_include #5026

Merged

rhcarvalho force-pushed the adhoc-check-runner-misc branch 2 times, most recently from a5f69ef to 48af29f Compare August 21, 2017 16:14

rhcarvalho mentioned this pull request Aug 22, 2017

Standardize openshift-checks initialization patterns #5136

Merged

rhcarvalho force-pushed the adhoc-check-runner-misc branch from 48af29f to ba3a4e7 Compare August 22, 2017 16:03

rhcarvalho commented Aug 22, 2017

View reviewed changes

sosiouxme approved these changes Aug 22, 2017

View reviewed changes

rhcarvalho added 9 commits August 24, 2017 14:46

Update health check README

ebf6ed8

Add playbook for running arbitrary health checks

f98c978

This is useful on its own, and also aids in developing/testing new checks that are not part of any playbook. Since the intent when running this playbook is to execute checks, opt for a less verbose explanation on the error summary.

List known checks/tags when check name is invalid

a28796f

Rewrite failure summary callback plugin

80476c7

The intent is to deduplicate similar errors that happened in many hosts, making the summary more concise.

Make pylint disables more specific

8216ea6

And beautify the code a bit.

Update error message: s/non-unique/duplicate

51645ea

rhcarvalho force-pushed the adhoc-check-runner-misc branch from ba3a4e7 to 51645ea Compare August 24, 2017 13:00

rhcarvalho commented Aug 24, 2017

View reviewed changes

rhcarvalho mentioned this pull request Aug 28, 2017

Error waiting for fluentd pod (openshift ansible logging) openshift/origin#16005

Closed

sdodson merged commit ca5ebcb into openshift:master Aug 28, 2017

rhcarvalho deleted the adhoc-check-runner-misc branch August 28, 2017 22:09

		@@ -0,0 +1,63 @@
		from zz_failure_summary import deduplicate_failures

		@@ -0,0 +1 @@
		../../../roles

Add playbook to run adhoc health checks or list existing checks #4570

Add playbook to run adhoc health checks or list existing checks #4570

Conversation

rhcarvalho commented Jun 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sosiouxme commented Jun 26, 2017

rhcarvalho commented Jun 26, 2017

sosiouxme left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhcarvalho commented Aug 3, 2017

sosiouxme left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhcarvalho commented Aug 8, 2017

rhcarvalho commented Aug 22, 2017

List available checks

Run a single check

Run multiple checks

by tag

by name, comma-separated

by name, YAML

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sosiouxme left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhcarvalho commented Aug 24, 2017