Skip to content

Conversation

@jsquyres
Copy link
Member

Convert the MCA parameter "opal_mca_base_component_show_load_errors" to be a flexible mechanism to specify when (and when not) to emit warnings about errors when trying to load DSO components.

  1. Convert the existing MCA parameter opal_mca_base_component_show_load_errors from a boolean to a string. It will still accept all prior valid boolean values, but it will also accept comma-delimited list of "framework[/component]" tokens. If the MCA base encounters an error when loading a DSO, opal_mca_base_component_show_load_errors is checked to see if a warning should be emitted.

    • If the value is boolean true or the string "all", then emit a warning
    • If the value is boolean false or the string "none", then do not emit a warning
    • If the value is a comma-delimited list of tokens: emit a warning about any dynamic component that fails to open and matches a token in the list. "Match" is defined as:
      • If a token in the list is only a framework name, then any component in that framework will match.
      • If a token in the list specifies both a framework name and a component name (in the form framework/component), then only the specified component in the specified framework will match.
    • The value can also be a "^" character followed by a comma-delimited list of "framework[/component]" values: This is similar to the comma-delimited list of tokens, except it will only emit warnings about dynamic components that fail to load and do not match a token in the list.

    NOTE: The equivalence of "all" with boolean true values, and
    "none" with boolean false values is only intended as a
    backwards compatibility mechanism, since prior to this
    commit, opal_mca_base_component_show_load_errors was a
    boolean value. It is not intended as a general mechanism
    that should be copied to all other include/exclude-type MCA
    params.

  2. Remove the configure option --enable-show-load-errors-by-default, replace it with --with-show-load-errors[=value]. The value specified will become the default value of the opal_mca_base_component_show_load_errors MCA variable (it defaults to "all").

    The CLI option name change is intentional. The previous MCA parameter only accepted boolean values; the new CLI name reflects that it can accept more than just boolean values.

The rationale for this commit is to allow packagers more granular control over whether to warn about component DSO load failures or not.

The canonical example of where this is useful is accelerator libraries: since accelerators are expensive, they may only be available on a subset of nodes in a given HPC environment. Consequently, the accelerator's support libraries may only be loaded on the nodes that actually have accelerators physically present. In such an environment, an administrator or packager may wish to configure Open MPI:

  1. With accelerator components built as DSOs.
  2. Do not warn about about accelerator DSO component load failures.

For example:

./configure --enable-mca-dso=accelerator ...
make install
mpirun --mca opal_mca_base_component_show_load_errors '^accelerator' ...

Signed-off-by: Jeff Squyres [email protected]
(cherry picked from commit 20bbf27)

This is the v5.0.x PR corresponding to the main PR #10763

Convert the MCA parameter "opal_mca_base_component_show_load_errors"
to be a flexible mechanism to specify when (and when not) to emit
warnings about errors when trying to load DSO components.

1. Convert the existing MCA parameter
   opal_mca_base_component_show_load_errors from a boolean to a
   string.  It will still accept all prior valid boolean values, but
   it will also accept comma-delimited list of "framework[/component]"
   tokens.  If the MCA base encounters an error when loading a DSO,
   opal_mca_base_component_show_load_errors is checked to see if a
   warning should be emitted.

   - If the value is boolean true or the string "all", then emit a
     warning
   - If the value is boolean false or the string "none", then do not
     emit a warning
   - If the value is a comma-delimited list of tokens: emit a warning
     about any dynamic component that fails to open and matches a
     token in the list.  "Match" is defined as:
     - If a token in the list is only a framework name, then any
       component in that framework will match.
     - If a token in the list specifies both a framework name and a
       component name (in the form ``framework/component``), then only
       the specified component in the specified framework will match.
   - The value can also be a "^" character followed by a
     comma-delimited list of "framework[/component]" values: This is
     similar to the comma-delimited list of tokens, except it will
     only emit warnings about dynamic components that fail to load and
     do *not* match a token in the list.

   *NOTE*: The equivalence of "all" with boolean true values, and
	   "none" with boolean false values is only intended as a
	   backwards compatibility mechanism, since prior to this
	   commit, opal_mca_base_component_show_load_errors was a
	   boolean value.  It is not intended as a general mechanism
	   that should be copied to all other include/exclude-type MCA
	   params.

1. Remove the configure option --enable-show-load-errors-by-default,
   replace it with --with-show-load-errors[=value].  The value
   specified will become the default value of the
   opal_mca_base_component_show_load_errors MCA variable (it defaults
   to "all").

   The CLI option name change is intentional.  The previous MCA
   parameter only accepted boolean values; the new CLI name reflects
   that it can accept more than just boolean values.

The rationale for this commit is to allow packagers more granular
control over whether to warn about component DSO load failures or not.

The canonical example of where this is useful is accelerator
libraries: since accelerators are expensive, they may only be
available on a subset of nodes in a given HPC environment.
Consequently, the accelerator's support libraries may only be loaded
on the nodes that actually have accelerators physically present.  In
such an environment, an administrator or packager may wish to
configure Open MPI:

1. With accelerator components built as DSOs.
2. Do not warn about about accelerator DSO component load failures.

For example:

```
./configure --enable-mca-dso=accelerator ...
make install
mpirun --mca opal_mca_base_component_show_load_errors '^accelerator' ...
```

Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit 20bbf27)
@jsquyres
Copy link
Member Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@awlauria awlauria merged commit a0f66b2 into open-mpi:v5.0.x Sep 23, 2022
@jsquyres jsquyres deleted the pr/v5.0.x/show-load-errors----or-not branch September 23, 2022 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants