individual diagnostic parameters #16589

sosiouxme · 2017-09-27T21:45:29Z

This addresses #14640 by giving individual diagnostics the ability to receive parameters without defining any more flags (need this anyway for the app-create diagnostic I'm working on). It also deprecates the existing flags for NetworkCheck as well as the config envvar for EtcdWriteVolume in favor of the new parameter scheme.

$ oc adm diagnostics -h
This utility helps troubleshoot and diagnose known problems. It runs diagnostics using a client and/or the state of a
running master / node host. 

...

Diagnostics may be individually run by passing diagnostic name(s) as arguments. 

  oc adm diagnostics DiagnosticName
  
Diagnostic parameters may be given values similarly: 

  oc adm diagnostics DiagnosticName.parameter=value
  
...

Options:
      --cluster-context='': Client context to use for cluster administrator
      --config='': Path to the config file to use for CLI requests.
      --context='': The name of the kubeconfig context to use
  -l, --diaglevel=1: Level of diagnostic output: 4: Error, 3: Warn, 2: Notice, 1: Info, 0: Debug
      --host=false: If true, look for systemd and journald units even without master/node config
      --images='openshift/origin-${component}:${version}': Image template for DiagnosticPod to use in creating a pod
      --latest-images=false: If true, when expanding the image template, use latest version, not release version
      --list-params=false: List parameters for specified diagnostic(s) and exit
      --loglevel=0: Set the level of log output (0-10)
      --logspec='': Set per module logging with file|pattern=LEVEL,...
      --master-config='': Path to master config file (implies --host)
      --node-config='': Path to node config file (implies --host)
      --prevent-modification=false: If true, may be set to prevent diagnostics making any changes via the API

$ oc adm diagnostics  NetworkCheck --list-params
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/home/lmeyer/.kube/config'

Info:  Valid parameters for NetworkCheck are:
         logDir: Path to store diagnostic results in case of errors (default /tmp/openshift/)
         podImage: Image to use for diagnostic pod (default openshift/origin:v3.7.0-alpha.1)
         podPort: Serving port on the diagnostic test pod (default 8080)
         podProtocol: Protocol used to connect to diagnostic test pod (default TCP)
         testPodImage: Image to use for diagnostic test pod (default openshift/origin-deployer:v3.7.0-alpha.1)

$ oc adm diagnostics  EtcdWriteVolume EtcdWriteVolume.duration=10s
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/root/.kube/config'

[Note] Running diagnostic: EtcdWriteVolume
       Description: Check the volume of writes against etcd and classify them by operation and key for 10s
       
Info:  Measured 0.7 writes/sec
       /                                                                                      7 100.0%
       /v3:PUT                                                                                7 100.0%

sosiouxme · 2017-09-27T21:48:51Z

Seems odd that I'm in pkg/diagnostics/OWNERS but not pkg/oc/admin/diagnostics/OWNERS

soltysh · 2017-09-28T10:59:38Z

/unassign

sosiouxme · 2017-10-02T13:49:29Z

@smarterclayton just wanted your feedback on the UI for this:

$ oc adm diagnostics EtcdWriteVolume EtcdWriteVolume.duration=10s

A bit awkward to have to put the diagnostic name in twice but I wanted to be able to distinguish between "run this diagnostic with a parameter set" and "run all the diagnostics, with a parameter set on one."

sosiouxme · 2017-10-02T14:01:51Z

@deads2k it's a big PR, but maybe a summary will be enough to tell me if I'm doing it all wrong. Diagnostic initialization for diagnostics that take non-flag parameters have a two-step process. As before, the diagnostic struct is created with the flag contents and clients and such. Then non-flag parameters are set via the diagnostic's SetParameters() method which has an opportunity to complete the diagnostic struct or report an error. Then it runs as before. This seemed like the best solution for generic parameter logic allowing individual implementation of the actual parameters. Thoughts?

deads2k · 2017-10-02T14:11:09Z

/assign fabianofranz

@openshift/cli-review If I'm reading this correctly, this is establishing a separate way to pass parameters to subcommands.

juanvallejo · 2017-10-02T15:23:33Z

@fabianofranz I think the ui proposed by this PR is the most approachable (from a user standpoint) for now, considering the use-case the command is trying to solve. Would still like to wait for your feedback / thoughts on this

sosiouxme · 2017-10-02T15:32:43Z

oc adm diagnostics may be unique in that we want to have parameters for each of a list of arguments. (They're not strictly subcommands as you generally run more than one at a time.)

sosiouxme · 2017-10-05T19:19:26Z

@fabianofranz bump. Planning to demo this tomorrow, would be nice if I had some feedback on the direction.

sosiouxme · 2017-10-11T18:13:04Z

@fabianofranz poke again.

pravisankar · 2017-10-12T20:42:52Z

Specifying diagnostics name for every param is bit ugly (seems a lot of duplication) and passing many params is painful. I think passing diagnostic checks along with their tunable options on CLI is not scalable.

Another approach could be:

Support oadm diagnostics --get-available-checks
This returns all available diagnostic checks along with their tunable options in json or yaml format, something like:

CommonOptions:
  logLevel: 0
...
DiagnosticChecks:
  - NetworkCheck:
      logDir: "/tmp/openshift"
      diagnosticPodImage: "openshift/origin:v3.7.0-alpha.1"
      testPodImage: "openshift/origin-deployer:v3.7.0-alpha.1"
      testPodPort: 8080
...
  - MastConfigCheck:
      masterConfig: ''
...

This is like giving the user a basic template for pick and choose if they don't want to run all diagnostic checks or want to specify optional parameters.

Support oadm diagnostics --run-checks=<template-file>
This runs selected diagnostic checks with corresponding params present in the given template file.

sosiouxme · 2017-10-12T21:07:23Z

Thanks, @pravisankar, that's a pretty good idea.

Passing a lot of params on the command line would definitely be painful. On the other hand, for the more likely use case of modifying just one or two parameters (certainly the goal is to be useful with no parameters), having to know about and generate and edit this config file is also relatively painful.

So, would like to hear what others prefer.

pravisankar · 2017-10-13T16:41:47Z

@sosiouxme
Fair, If we need to change couple of params then template file is also painful/overkill.

Trying one more approach:
Diagnostic checks can grow in the future (already we have many) and providing options in a single command won't scale. Why not we treat each diagnostic check as a subcommand?
Basically we could support two patterns:

oadm diagnostics [optional-common-flags]
Runs all diagnostic checks with default params (assuming this will be used more frequently than with custom options)
oadm diagnostics <single-diagnostic-check> [optional-diagnostic-flags]
Runs the selected diagnostic check with the given options.
Note: Passing multiple diagnostic checks in the command line is not allowed in this case.

Allowing only these 2 patterns will remove the ambiguity for the params. User may need to run multiple diag commands in case of custom options but I think that's a good compromise considering the simplicity and scalability of this pattern.

sosiouxme · 2017-10-13T20:17:11Z

I could see it ending up that way. It would certainly be more consistent with how everything else works. I'm a bit reluctant though to give up the ability to run all the diagnostics at once with just a tweak or two, and without having to keep up with the expanding list of names. My dilemma is I want it to be trivial for the casual user and also completely customizable for production use.

I was thinking of combining your config file idea with this CLI implementation. So when the user gets to where they have four or five parameters to specify and that's getting tedious, they can transition to generating and using a config file.

fabianofranz · 2017-10-26T19:24:28Z

Seems odd that I'm in pkg/diagnostics/OWNERS but not pkg/oc/admin/diagnostics/OWNERS

@sosiouxme open a separate PR adding yourself as a reviewer and I'll approve.

fabianofranz · 2017-10-26T19:54:13Z

At first sight I'd say we need to convert every diagnostic name into a subcommand and make every possible parameter in a diagnostic a proper flag of that subcommand, like suggested by @pravisankar. You can still run the top-level cmd to run all diagnostics and subcommands can inherit generic flags from the top-level one, and although very verbose, you can always have a one-liner that runs all of them once, like oc adm diagnostics diag1 && oc adm diagnostics diag2 && ....

Although a possible solution, the concern I have about having a config file is that it's not the representation of any existing resource, right? For example, it's not the representation of a given resource in JSON or YAML format, the same way used to interact with the API, or accepted by oc create, etc. It would be a new file format specifically and only targeted at this command.

But I totally understand the "run all diagnostics tweaking just a couple" use case. Let me think a bit if I can find another solution.

fabianofranz · 2017-10-26T19:56:26Z

@sosiouxme How many diagnostic names and parameter names we have? Do we expect them to change often? I ask because, on the other hand, turning them into subcommands and flags automatically make them "tied" to the CLI and subject to the deprecation policy when changes are needed.

pravisankar · 2017-10-26T21:40:28Z

@fabianofranz
Currently we have 18 diagnostic names and 17 param names (common and diag specific). There is scope for adding more diags but I think mostly that will be additive. I doubt these old diags will change often but I will leave that to @sosiouxme

sosiouxme · 2017-11-10T22:10:42Z

@fabianofranz FWIW I was planning on moving / removing a number of them and deprecating them here. But after that, yes, will be mostly additive, with the occasional retiring of any that become irrelevant.

sosiouxme · 2017-11-16T14:35:06Z

@fabianofranz any further thoughts? Need to move forward in some direction for the next check I'm writing :)

fabianofranz · 2017-11-29T18:58:26Z

Sorry about the delay, I researched a few other CLI patterns but didn't find anything that addresses this use case nicely enough.

I'd suggest that we move on with turning diagnostics into subcommands and parameters into flags. It fits better with what users already know and will allow us to grow consistently. Add a glog.V to print a CLI line that runs all diagnostics invoking each one individually, so that users can copy and tweak if needed.

One idea came to mind though: have a all subcommand that registers every flag from every other subcommand. Users could invoke it with oadm diagnostics all and tweak just a couple parameters by using the flags. However we'd need to deprecate running all diagnostics with pure oadm diagnostics and make it do something else, like print help or a more specific message. Might be a traumatic change for existing users.

sosiouxme · 2017-11-29T21:35:19Z

@fabianofranz thanks, I think that makes sense.

Here's a thought for how to manage the "all" cases.

oadm diagnostics could continue to run all diagnostics, just with default parameters. Existing diagnostic-specific flags could be deprecated and hidden until they can be removed. Its output could also point the way to more sophisticated usage.

oadm diagnostics all could be as you say, a bucket for every single possible diagnostic-specific flag, probably "scoped" by diagnostic name so e.g. --diag1-foobar and --diag2-foobar. That way you can run all with just a tweak or two.

openshift-ci-robot · 2017-11-30T21:11:49Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sosiouxme
We suggest the following additional approver: fabianofranz

Assign the PR to them by writing /assign @fabianofranz in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

sosiouxme · 2017-11-30T21:14:15Z

@fabianofranz @juanvallejo One further thought. Currently you could do:

$ oc adm diagnostics Foo Bar Baz

... to run multiple named diagnostics and no others. Making them subcommands will break that - do we have any obligation to address that somehow?

If --cluster-context is specified and the context is present, use it as the cluster-admin. The logic was backward and this gave an error before.

Adds the ability to specify parameters for individual diagnostics on the command line (without proliferating flags). Addresses openshift#14640

openshift-ci-robot · 2017-12-14T00:58:52Z

@sosiouxme: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/openshift-jenkins/extended_conformance_crio	`c7cf6c8`	link	`/test crio`
ci/openshift-jenkins/verify	`4050666`	link	`/test verify`
ci/openshift-jenkins/cmd	`4050666`	link	`/test cmd`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

sosiouxme · 2017-12-14T02:29:26Z

Starting fresh with #17773

Automatic merge from submit-queue. diagnostics: individual parameters Updated version of #16589 based on feedback. This addresses #14640 by making individual diagnostics into subcommands that can have their own flags. Existing top-level flags for `NetworkCheck` are removed and the config envvar for `EtcdWriteVolume` is deprecated in favor of a flag. All individual flags are available underneath the `all` subcommand. This required rather more refactoring as the flags had to be known in order to define the command, not just at runtime. Usages are given below: ``` $ oc adm diagnostics --help This utility helps troubleshoot and diagnose known problems for an OpenShift cluster and/or local host. The base command runs a standard set of diagnostics: oc adm diagnostics [...] An individual diagnostic may be run as a subcommand which may have flags for specifying options specific to that diagnostic. Finally, the "all" subcommand runs all available diagnostics (including heavyweight ones skipped in the standard set) and provides all individual diagnostic flags. Usage: oc adm diagnostics [options] Available Commands: aggregatedlogging Check aggregated logging integration for proper configuration all Diagnose common cluster problems [...] unitstatus Check status for related systemd units Options: --cluster-context='': Client context to use for cluster administrator --config='': Path to the config file to use for CLI requests. --context='': The name of the kubeconfig context to use -l, --diaglevel=1: Level of diagnostic output: 4: Error, 3: Warn, 2: Notice, 1: Info, 0: Debug --host=false: If true, look for systemd and journald units even without master/node config --loglevel=0: Set the level of log output (0-10) --logspec='': Set per module logging with file|pattern=LEVEL,... --master-config='': Path to master config file (implies --host) --node-config='': Path to node config file (implies --host) --prevent-modification=false: If true, may be set to prevent diagnostics making any changes via the API ``` (Note `all` is now intermingled with the individual subcommands.) ``` $ oc adm diagnostics all --help This utility helps troubleshoot and diagnose known problems for an OpenShift cluster and/or local host. This subcommand exists to run all available diagnostics: oc adm diagnostics all Available diagnostics vary based on client config and local OpenShift host config. All flags from the base command work similarly here, but all possible flags for individual diagnostics are also available. Usage: oc adm diagnostics all [options] Options: --cluster-context='': Client context to use for cluster administrator --config='': Path to the config file to use for CLI requests. --context='': The name of the kubeconfig context to use -l, --diaglevel=1: Level of diagnostic output: 4: Error, 3: Warn, 2: Notice, 1: Info, 0: Debug --diagnosticpod-images='openshift/origin-${component}:${version}': Image template to use in creating a pod --diagnosticpod-latest-images=false: If true, when expanding the image template, use latest version, not release version --etcdwritevolume-duration='1m': How long to perform the write test --host=false: If true, look for systemd and journald units even without master/node config --loglevel=0: Set the level of log output (0-10) --logspec='': Set per module logging with file|pattern=LEVEL,... --master-config='': Path to master config file (implies --host) --networkcheck-logdir='/tmp/openshift/': Path to store diagnostic results in case of errors --networkcheck-pod-image='openshift/origin:v3.9.0-alpha.0': Image to use for diagnostic pod --networkcheck-test-pod-image='openshift/origin-deployer:v3.9.0-alpha.0': Image to use for diagnostic test pod --networkcheck-test-pod-port=8080: Serving port on the diagnostic test pod --networkcheck-test-pod-protocol='TCP': Protocol used to connect to diagnostic test pod --node-config='': Path to node config file (implies --host) --prevent-modification=false: If true, may be set to prevent diagnostics making any changes via the API ``` ``` $ oc adm diagnostics EtcdWriteVolume --help Runs the EtcdWriteVolume diagnostic. Check the volume of writes against etcd over a time period and classify them by operation and key Aliases: etcdwritevolume, EtcdWriteVolume Usage: oc adm diagnostics etcdwritevolume [options] Options: -l, --diaglevel=1: Level of diagnostic output: 4: Error, 3: Warn, 2: Notice, 1: Info, 0: Debug --duration='1m': How long to perform the write test --host=false: If true, look for systemd and journald units even without master/node config --loglevel=0: Set the level of log output (0-10) --logspec='': Set per module logging with file|pattern=LEVEL,... --master-config='': Path to master config file (implies --host) --node-config='': Path to node config file (implies --host) ``` ``` $ oc adm diagnostics NetworkCheck --help Runs the NetworkCheck diagnostic. Create a pod on all schedulable nodes and run network diagnostics from the application standpoint Aliases: networkcheck, NetworkCheck Usage: oc adm diagnostics networkcheck [options] Options: --cluster-context='': Client context to use for cluster administrator --config='': Path to the config file to use for CLI requests. --context='': The name of the kubeconfig context to use -l, --diaglevel=1: Level of diagnostic output: 4: Error, 3: Warn, 2: Notice, 1: Info, 0: Debug --logdir='/tmp/openshift/': Path to store diagnostic results in case of errors --loglevel=0: Set the level of log output (0-10) --logspec='': Set per module logging with file|pattern=LEVEL,... --pod-image='openshift/origin:v3.9.0-alpha.0': Image to use for diagnostic pod --prevent-modification=false: If true, may be set to prevent diagnostics making any changes via the API --test-pod-image='openshift/origin-deployer:v3.9.0-alpha.0': Image to use for diagnostic test pod --test-pod-port=8080: Serving port on the diagnostic test pod --test-pod-protocol='TCP': Protocol used to connect to diagnostic test pod ```

sosiouxme requested review from smarterclayton, pravisankar, Miciah and deads2k September 27, 2017 21:45

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 27, 2017

openshift-merge-robot assigned rajatchopra and soltysh Sep 27, 2017

openshift-ci-robot unassigned soltysh Sep 28, 2017

openshift-ci-robot assigned fabianofranz Oct 2, 2017

sosiouxme mentioned this pull request Oct 3, 2017

AppCreate diagnostic #16658

Merged

10 tasks

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2017

openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 24, 2017

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2017

sosiouxme force-pushed the 20170927-diagnostic-fixes-and-parameters branch from 820a237 to 256a8c6 Compare November 24, 2017 22:47

openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 24, 2017

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 28, 2017

sosiouxme force-pushed the 20170927-diagnostic-fixes-and-parameters branch from 256a8c6 to c7cf6c8 Compare November 30, 2017 21:11

openshift deleted a comment from openshift-merge-robot Nov 30, 2017

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2017

sosiouxme force-pushed the 20170927-diagnostic-fixes-and-parameters branch from c7cf6c8 to c92f5df Compare December 13, 2017 20:02

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 13, 2017

diagnostics: correctly use cluster-context if specified

e7bbaf9

If --cluster-context is specified and the context is present, use it as the cluster-admin. The logic was backward and this gave an error before.

sosiouxme force-pushed the 20170927-diagnostic-fixes-and-parameters branch from c92f5df to d869499 Compare December 13, 2017 20:10

sosiouxme added 3 commits December 13, 2017 19:16

diagnostics README: correct package and command

2e4c2ab

diagnostics: enable per-diagnostic parameters

228bc4b

Adds the ability to specify parameters for individual diagnostics on the command line (without proliferating flags). Addresses openshift#14640

diagnostics: in-pod command now openshift-diagnostics

4050666

sosiouxme force-pushed the 20170927-diagnostic-fixes-and-parameters branch from d869499 to 4050666 Compare December 14, 2017 00:17

openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 14, 2017

sosiouxme mentioned this pull request Dec 14, 2017

diagnostics: individual parameters #17773

Merged

sosiouxme closed this Dec 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

individual diagnostic parameters #16589

individual diagnostic parameters #16589

sosiouxme commented Sep 27, 2017

sosiouxme commented Sep 27, 2017

soltysh commented Sep 28, 2017

sosiouxme commented Oct 2, 2017

sosiouxme commented Oct 2, 2017 •

edited

Loading

deads2k commented Oct 2, 2017

juanvallejo commented Oct 2, 2017

sosiouxme commented Oct 2, 2017

sosiouxme commented Oct 5, 2017

sosiouxme commented Oct 11, 2017

pravisankar commented Oct 12, 2017

sosiouxme commented Oct 12, 2017

pravisankar commented Oct 13, 2017

sosiouxme commented Oct 13, 2017

fabianofranz commented Oct 26, 2017

fabianofranz commented Oct 26, 2017

fabianofranz commented Oct 26, 2017

pravisankar commented Oct 26, 2017

sosiouxme commented Nov 10, 2017

sosiouxme commented Nov 16, 2017

fabianofranz commented Nov 29, 2017

sosiouxme commented Nov 29, 2017

openshift-ci-robot commented Nov 30, 2017

sosiouxme commented Nov 30, 2017

openshift-ci-robot commented Dec 14, 2017 •

edited

Loading

sosiouxme commented Dec 14, 2017

individual diagnostic parameters #16589

individual diagnostic parameters #16589

Conversation

sosiouxme commented Sep 27, 2017

sosiouxme commented Sep 27, 2017

soltysh commented Sep 28, 2017

sosiouxme commented Oct 2, 2017

sosiouxme commented Oct 2, 2017 • edited Loading

deads2k commented Oct 2, 2017

juanvallejo commented Oct 2, 2017

sosiouxme commented Oct 2, 2017

sosiouxme commented Oct 5, 2017

sosiouxme commented Oct 11, 2017

pravisankar commented Oct 12, 2017

sosiouxme commented Oct 12, 2017

pravisankar commented Oct 13, 2017

sosiouxme commented Oct 13, 2017

fabianofranz commented Oct 26, 2017

fabianofranz commented Oct 26, 2017

fabianofranz commented Oct 26, 2017

pravisankar commented Oct 26, 2017

sosiouxme commented Nov 10, 2017

sosiouxme commented Nov 16, 2017

fabianofranz commented Nov 29, 2017

sosiouxme commented Nov 29, 2017

openshift-ci-robot commented Nov 30, 2017

sosiouxme commented Nov 30, 2017

openshift-ci-robot commented Dec 14, 2017 • edited Loading

sosiouxme commented Dec 14, 2017

sosiouxme commented Oct 2, 2017 •

edited

Loading

openshift-ci-robot commented Dec 14, 2017 •

edited

Loading