Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

individual diagnostic parameters #16589

Conversation

sosiouxme
Copy link
Member

This addresses #14640 by giving individual diagnostics the ability to receive parameters without defining any more flags (need this anyway for the app-create diagnostic I'm working on). It also deprecates the existing flags for NetworkCheck as well as the config envvar for EtcdWriteVolume in favor of the new parameter scheme.

$ oc adm diagnostics -h
This utility helps troubleshoot and diagnose known problems. It runs diagnostics using a client and/or the state of a
running master / node host. 

...

Diagnostics may be individually run by passing diagnostic name(s) as arguments. 

  oc adm diagnostics DiagnosticName
  
Diagnostic parameters may be given values similarly: 

  oc adm diagnostics DiagnosticName.parameter=value
  
...

Options:
      --cluster-context='': Client context to use for cluster administrator
      --config='': Path to the config file to use for CLI requests.
      --context='': The name of the kubeconfig context to use
  -l, --diaglevel=1: Level of diagnostic output: 4: Error, 3: Warn, 2: Notice, 1: Info, 0: Debug
      --host=false: If true, look for systemd and journald units even without master/node config
      --images='openshift/origin-${component}:${version}': Image template for DiagnosticPod to use in creating a pod
      --latest-images=false: If true, when expanding the image template, use latest version, not release version
      --list-params=false: List parameters for specified diagnostic(s) and exit
      --loglevel=0: Set the level of log output (0-10)
      --logspec='': Set per module logging with file|pattern=LEVEL,...
      --master-config='': Path to master config file (implies --host)
      --node-config='': Path to node config file (implies --host)
      --prevent-modification=false: If true, may be set to prevent diagnostics making any changes via the API

$ oc adm diagnostics  NetworkCheck --list-params
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/home/lmeyer/.kube/config'

Info:  Valid parameters for NetworkCheck are:
         logDir: Path to store diagnostic results in case of errors (default /tmp/openshift/)
         podImage: Image to use for diagnostic pod (default openshift/origin:v3.7.0-alpha.1)
         podPort: Serving port on the diagnostic test pod (default 8080)
         podProtocol: Protocol used to connect to diagnostic test pod (default TCP)
         testPodImage: Image to use for diagnostic test pod (default openshift/origin-deployer:v3.7.0-alpha.1)

$ oc adm diagnostics  EtcdWriteVolume EtcdWriteVolume.duration=10s
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/root/.kube/config'

[Note] Running diagnostic: EtcdWriteVolume
       Description: Check the volume of writes against etcd and classify them by operation and key for 10s
       
Info:  Measured 0.7 writes/sec
       /                                                                                      7 100.0%
       /v3:PUT                                                                                7 100.0%

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 27, 2017
@sosiouxme
Copy link
Member Author

Seems odd that I'm in pkg/diagnostics/OWNERS but not pkg/oc/admin/diagnostics/OWNERS

@soltysh
Copy link
Contributor

soltysh commented Sep 28, 2017

/unassign

@sosiouxme
Copy link
Member Author

@smarterclayton just wanted your feedback on the UI for this:

$ oc adm diagnostics EtcdWriteVolume EtcdWriteVolume.duration=10s

A bit awkward to have to put the diagnostic name in twice but I wanted to be able to distinguish between "run this diagnostic with a parameter set" and "run all the diagnostics, with a parameter set on one."

@sosiouxme
Copy link
Member Author

sosiouxme commented Oct 2, 2017

@deads2k it's a big PR, but maybe a summary will be enough to tell me if I'm doing it all wrong. Diagnostic initialization for diagnostics that take non-flag parameters have a two-step process. As before, the diagnostic struct is created with the flag contents and clients and such. Then non-flag parameters are set via the diagnostic's SetParameters() method which has an opportunity to complete the diagnostic struct or report an error. Then it runs as before. This seemed like the best solution for generic parameter logic allowing individual implementation of the actual parameters. Thoughts?

@deads2k
Copy link
Contributor

deads2k commented Oct 2, 2017

/assign fabianofranz

@openshift/cli-review If I'm reading this correctly, this is establishing a separate way to pass parameters to subcommands.

@juanvallejo
Copy link
Contributor

@fabianofranz I think the ui proposed by this PR is the most approachable (from a user standpoint) for now, considering the use-case the command is trying to solve. Would still like to wait for your feedback / thoughts on this

@sosiouxme
Copy link
Member Author

oc adm diagnostics may be unique in that we want to have parameters for each of a list of arguments. (They're not strictly subcommands as you generally run more than one at a time.)

@sosiouxme sosiouxme mentioned this pull request Oct 3, 2017
10 tasks
@sosiouxme
Copy link
Member Author

@fabianofranz bump. Planning to demo this tomorrow, would be nice if I had some feedback on the direction.

@sosiouxme
Copy link
Member Author

@fabianofranz poke again.

@pravisankar
Copy link

Specifying diagnostics name for every param is bit ugly (seems a lot of duplication) and passing many params is painful. I think passing diagnostic checks along with their tunable options on CLI is not scalable.

Another approach could be:

  • Support oadm diagnostics --get-available-checks
    This returns all available diagnostic checks along with their tunable options in json or yaml format, something like:
CommonOptions:
  logLevel: 0
...
DiagnosticChecks:
  - NetworkCheck:
      logDir: "/tmp/openshift"
      diagnosticPodImage: "openshift/origin:v3.7.0-alpha.1"
      testPodImage: "openshift/origin-deployer:v3.7.0-alpha.1"
      testPodPort: 8080
...
  - MastConfigCheck:
      masterConfig: ''
...

This is like giving the user a basic template for pick and choose if they don't want to run all diagnostic checks or want to specify optional parameters.

  • Support oadm diagnostics --run-checks=<template-file>
    This runs selected diagnostic checks with corresponding params present in the given template file.

@sosiouxme
Copy link
Member Author

Thanks, @pravisankar, that's a pretty good idea.

Passing a lot of params on the command line would definitely be painful. On the other hand, for the more likely use case of modifying just one or two parameters (certainly the goal is to be useful with no parameters), having to know about and generate and edit this config file is also relatively painful.

So, would like to hear what others prefer.

@pravisankar
Copy link

@sosiouxme
Fair, If we need to change couple of params then template file is also painful/overkill.

Trying one more approach:
Diagnostic checks can grow in the future (already we have many) and providing options in a single command won't scale. Why not we treat each diagnostic check as a subcommand?
Basically we could support two patterns:

  • oadm diagnostics [optional-common-flags]
    Runs all diagnostic checks with default params (assuming this will be used more frequently than with custom options)

  • oadm diagnostics <single-diagnostic-check> [optional-diagnostic-flags]
    Runs the selected diagnostic check with the given options.
    Note: Passing multiple diagnostic checks in the command line is not allowed in this case.

Allowing only these 2 patterns will remove the ambiguity for the params. User may need to run multiple diag commands in case of custom options but I think that's a good compromise considering the simplicity and scalability of this pattern.

@sosiouxme
Copy link
Member Author

I could see it ending up that way. It would certainly be more consistent with how everything else works. I'm a bit reluctant though to give up the ability to run all the diagnostics at once with just a tweak or two, and without having to keep up with the expanding list of names. My dilemma is I want it to be trivial for the casual user and also completely customizable for production use.

I was thinking of combining your config file idea with this CLI implementation. So when the user gets to where they have four or five parameters to specify and that's getting tedious, they can transition to generating and using a config file.

@fabianofranz
Copy link
Member

Seems odd that I'm in pkg/diagnostics/OWNERS but not pkg/oc/admin/diagnostics/OWNERS

@sosiouxme open a separate PR adding yourself as a reviewer and I'll approve.

@fabianofranz
Copy link
Member

At first sight I'd say we need to convert every diagnostic name into a subcommand and make every possible parameter in a diagnostic a proper flag of that subcommand, like suggested by @pravisankar. You can still run the top-level cmd to run all diagnostics and subcommands can inherit generic flags from the top-level one, and although very verbose, you can always have a one-liner that runs all of them once, like oc adm diagnostics diag1 && oc adm diagnostics diag2 && ....

Although a possible solution, the concern I have about having a config file is that it's not the representation of any existing resource, right? For example, it's not the representation of a given resource in JSON or YAML format, the same way used to interact with the API, or accepted by oc create, etc. It would be a new file format specifically and only targeted at this command.

But I totally understand the "run all diagnostics tweaking just a couple" use case. Let me think a bit if I can find another solution.

@fabianofranz
Copy link
Member

@sosiouxme How many diagnostic names and parameter names we have? Do we expect them to change often? I ask because, on the other hand, turning them into subcommands and flags automatically make them "tied" to the CLI and subject to the deprecation policy when changes are needed.

@pravisankar
Copy link

@fabianofranz
Currently we have 18 diagnostic names and 17 param names (common and diag specific). There is scope for adding more diags but I think mostly that will be additive. I doubt these old diags will change often but I will leave that to @sosiouxme

@sosiouxme
Copy link
Member Author

@fabianofranz FWIW I was planning on moving / removing a number of them and deprecating them here. But after that, yes, will be mostly additive, with the occasional retiring of any that become irrelevant.

@sosiouxme
Copy link
Member Author

@fabianofranz any further thoughts? Need to move forward in some direction for the next check I'm writing :)

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2017
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 24, 2017
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2017
@sosiouxme sosiouxme force-pushed the 20170927-diagnostic-fixes-and-parameters branch from 820a237 to 256a8c6 Compare November 24, 2017 22:47
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 24, 2017
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 28, 2017
@fabianofranz
Copy link
Member

Sorry about the delay, I researched a few other CLI patterns but didn't find anything that addresses this use case nicely enough.

I'd suggest that we move on with turning diagnostics into subcommands and parameters into flags. It fits better with what users already know and will allow us to grow consistently. Add a glog.V to print a CLI line that runs all diagnostics invoking each one individually, so that users can copy and tweak if needed.

One idea came to mind though: have a all subcommand that registers every flag from every other subcommand. Users could invoke it with oadm diagnostics all and tweak just a couple parameters by using the flags. However we'd need to deprecate running all diagnostics with pure oadm diagnostics and make it do something else, like print help or a more specific message. Might be a traumatic change for existing users.

@sosiouxme
Copy link
Member Author

@fabianofranz thanks, I think that makes sense.

Here's a thought for how to manage the "all" cases.

oadm diagnostics could continue to run all diagnostics, just with default parameters. Existing diagnostic-specific flags could be deprecated and hidden until they can be removed. Its output could also point the way to more sophisticated usage.

oadm diagnostics all could be as you say, a bucket for every single possible diagnostic-specific flag, probably "scoped" by diagnostic name so e.g. --diag1-foobar and --diag2-foobar. That way you can run all with just a tweak or two.

@sosiouxme sosiouxme force-pushed the 20170927-diagnostic-fixes-and-parameters branch from 256a8c6 to c7cf6c8 Compare November 30, 2017 21:11
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sosiouxme
We suggest the following additional approver: fabianofranz

Assign the PR to them by writing /assign @fabianofranz in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2017
@sosiouxme
Copy link
Member Author

@fabianofranz @juanvallejo One further thought. Currently you could do:

$ oc adm diagnostics Foo Bar Baz

... to run multiple named diagnostics and no others. Making them subcommands will break that - do we have any obligation to address that somehow?

@sosiouxme sosiouxme force-pushed the 20170927-diagnostic-fixes-and-parameters branch from c7cf6c8 to c92f5df Compare December 13, 2017 20:02
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 13, 2017
If --cluster-context is specified and the context is present, use it as
the cluster-admin. The logic was backward and this gave an error before.
@sosiouxme sosiouxme force-pushed the 20170927-diagnostic-fixes-and-parameters branch from c92f5df to d869499 Compare December 13, 2017 20:10
Adds the ability to specify parameters for individual diagnostics on the
command line (without proliferating flags).

Addresses openshift#14640
@sosiouxme sosiouxme force-pushed the 20170927-diagnostic-fixes-and-parameters branch from d869499 to 4050666 Compare December 14, 2017 00:17
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 14, 2017
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 14, 2017

@sosiouxme: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/extended_conformance_crio c7cf6c8 link /test crio
ci/openshift-jenkins/verify 4050666 link /test verify
ci/openshift-jenkins/cmd 4050666 link /test cmd

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sosiouxme
Copy link
Member Author

Starting fresh with #17773

@sosiouxme sosiouxme closed this Dec 14, 2017
openshift-merge-robot added a commit that referenced this pull request Jan 23, 2018
Automatic merge from submit-queue.

diagnostics: individual parameters

Updated version of #16589 based on feedback.

This addresses #14640 by making individual diagnostics into subcommands that can have their own flags. Existing top-level flags for `NetworkCheck` are removed and the config envvar for `EtcdWriteVolume` is deprecated in favor of a flag. All individual flags are available underneath the `all` subcommand.

This required rather more refactoring as the flags had to be known in order to define the command, not just at runtime. Usages are given below:

```
$ oc adm diagnostics --help
This utility helps troubleshoot and diagnose known problems for an OpenShift cluster and/or local host. The base command
runs a standard set of diagnostics: 

  oc adm diagnostics
  
[...]

An individual diagnostic may be run as a subcommand which may have flags for specifying options specific to that
diagnostic. 

Finally, the "all" subcommand runs all available diagnostics (including heavyweight ones skipped in the standard set)
and provides all individual diagnostic flags.

Usage:
  oc adm diagnostics [options]

Available Commands:
  aggregatedlogging          Check aggregated logging integration for proper configuration
  all                        Diagnose common cluster problems
[...]
  unitstatus                 Check status for related systemd units

Options:
      --cluster-context='': Client context to use for cluster administrator
      --config='': Path to the config file to use for CLI requests.
      --context='': The name of the kubeconfig context to use
  -l, --diaglevel=1: Level of diagnostic output: 4: Error, 3: Warn, 2: Notice, 1: Info, 0: Debug
      --host=false: If true, look for systemd and journald units even without master/node config
      --loglevel=0: Set the level of log output (0-10)
      --logspec='': Set per module logging with file|pattern=LEVEL,...
      --master-config='': Path to master config file (implies --host)
      --node-config='': Path to node config file (implies --host)
      --prevent-modification=false: If true, may be set to prevent diagnostics making any changes via the API
```

(Note `all` is now intermingled with the individual subcommands.)

```
$ oc adm diagnostics all --help
This utility helps troubleshoot and diagnose known problems for an OpenShift cluster and/or local host. This subcommand
exists to run all available diagnostics: 

  oc adm diagnostics all
  
Available diagnostics vary based on client config and local OpenShift host config. All flags from the base command work
similarly here, but all possible flags for individual diagnostics are also available.

Usage:
  oc adm diagnostics all [options]

Options:
      --cluster-context='': Client context to use for cluster administrator
      --config='': Path to the config file to use for CLI requests.
      --context='': The name of the kubeconfig context to use
  -l, --diaglevel=1: Level of diagnostic output: 4: Error, 3: Warn, 2: Notice, 1: Info, 0: Debug
      --diagnosticpod-images='openshift/origin-${component}:${version}': Image template to use in creating a pod
      --diagnosticpod-latest-images=false: If true, when expanding the image template, use latest version, not release
version
      --etcdwritevolume-duration='1m': How long to perform the write test
      --host=false: If true, look for systemd and journald units even without master/node config
      --loglevel=0: Set the level of log output (0-10)
      --logspec='': Set per module logging with file|pattern=LEVEL,...
      --master-config='': Path to master config file (implies --host)
      --networkcheck-logdir='/tmp/openshift/': Path to store diagnostic results in case of errors
      --networkcheck-pod-image='openshift/origin:v3.9.0-alpha.0': Image to use for diagnostic pod
      --networkcheck-test-pod-image='openshift/origin-deployer:v3.9.0-alpha.0': Image to use for diagnostic test pod
      --networkcheck-test-pod-port=8080: Serving port on the diagnostic test pod
      --networkcheck-test-pod-protocol='TCP': Protocol used to connect to diagnostic test pod
      --node-config='': Path to node config file (implies --host)
      --prevent-modification=false: If true, may be set to prevent diagnostics making any changes via the API
```

```
$ oc adm diagnostics EtcdWriteVolume --help
Runs the EtcdWriteVolume diagnostic. 

Check the volume of writes against etcd over a time period and classify them by operation and key

Aliases:
etcdwritevolume, EtcdWriteVolume

Usage:
  oc adm diagnostics etcdwritevolume [options]

Options:
  -l, --diaglevel=1: Level of diagnostic output: 4: Error, 3: Warn, 2: Notice, 1: Info, 0: Debug
      --duration='1m': How long to perform the write test
      --host=false: If true, look for systemd and journald units even without master/node config
      --loglevel=0: Set the level of log output (0-10)
      --logspec='': Set per module logging with file|pattern=LEVEL,...
      --master-config='': Path to master config file (implies --host)
      --node-config='': Path to node config file (implies --host)
```

```
$ oc adm diagnostics NetworkCheck --help
Runs the NetworkCheck diagnostic. 

Create a pod on all schedulable nodes and run network diagnostics from the application standpoint

Aliases:
networkcheck, NetworkCheck

Usage:
  oc adm diagnostics networkcheck [options]

Options:
      --cluster-context='': Client context to use for cluster administrator
      --config='': Path to the config file to use for CLI requests.
      --context='': The name of the kubeconfig context to use
  -l, --diaglevel=1: Level of diagnostic output: 4: Error, 3: Warn, 2: Notice, 1: Info, 0: Debug
      --logdir='/tmp/openshift/': Path to store diagnostic results in case of errors
      --loglevel=0: Set the level of log output (0-10)
      --logspec='': Set per module logging with file|pattern=LEVEL,...
      --pod-image='openshift/origin:v3.9.0-alpha.0': Image to use for diagnostic pod
      --prevent-modification=false: If true, may be set to prevent diagnostics making any changes via the API
      --test-pod-image='openshift/origin-deployer:v3.9.0-alpha.0': Image to use for diagnostic test pod
      --test-pod-port=8080: Serving port on the diagnostic test pod
      --test-pod-protocol='TCP': Protocol used to connect to diagnostic test pod
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants