Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 19 additions & 20 deletions config/opal_configure_options.m4
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ dnl Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
dnl University of Stuttgart. All rights reserved.
dnl Copyright (c) 2004-2005 The Regents of the University of California.
dnl All rights reserved.
dnl Copyright (c) 2006-2020 Cisco Systems, Inc. All rights reserved
dnl Copyright (c) 2006-2022 Cisco Systems, Inc. All rights reserved
dnl Copyright (c) 2007 Sun Microsystems, Inc. All rights reserved.
dnl Copyright (c) 2009 IBM Corporation. All rights reserved.
dnl Copyright (c) 2009 Los Alamos National Security, LLC. All rights
Expand Down Expand Up @@ -327,25 +327,24 @@ fi
#

AC_MSG_CHECKING([for default value of mca_base_component_show_load_errors])
AC_ARG_ENABLE([show-load-errors-by-default],
[AS_HELP_STRING([--enable-show-load-errors-by-default],
[Set the default value for the MCA parameter
mca_base_component_show_load_errors (but can be
overridden at run time by the usual
MCA-variable-setting mechansism). This MCA variable
controls whether warnings are displayed when an MCA
component fails to load at run time due to an error.
(default: enabled, meaning that
mca_base_component_show_load_errors is enabled
by default])])
if test "$enable_show_load_errors_by_default" = "no" ; then
OPAL_SHOW_LOAD_ERRORS_DEFAULT=0
AC_MSG_RESULT([disabled by default])
else
OPAL_SHOW_LOAD_ERRORS_DEFAULT=1
AC_MSG_RESULT([enabled by default])
fi
AC_DEFINE_UNQUOTED(OPAL_SHOW_LOAD_ERRORS_DEFAULT, $OPAL_SHOW_LOAD_ERRORS_DEFAULT,
AC_ARG_WITH([show-load-errors],
[AS_HELP_STRING([--with-show-load-errors],
[Set the default value for the MCA
parameter
mca_base_component_show_load_errors (but
can be overridden at run time by the usual
MCA-variable-setting mechansism).
(default: "all")])])

AS_IF([test -z "$with_show_load_errors" -o "$with_show_load_errors" = "yes"],
[with_show_load_errors=all
AC_MSG_RESULT([enabled for all])],
[AS_IF([test "$with_show_load_errors" = "no"],
[with_show_load_errors=none
AC_MSG_RESULT([disabled for all])],
[AC_MSG_RESULT([$with_show_load_errors])])])

AC_DEFINE_UNQUOTED(OPAL_SHOW_LOAD_ERRORS_DEFAULT, ["$with_show_load_errors"],
[Default value for mca_base_component_show_load_errors MCA variable])


Expand Down
63 changes: 63 additions & 0 deletions docs/running-apps/tuning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -445,3 +445,66 @@ presented here so that they can easily be found via internet searches:
.. important:: You can only use the "include" *or* the "exclude"
parameter |mdash| they are mutually exclusive from each
other.
* ``opal_mca_base_component_show_load_errors``: By default, Open MPI
emits a warning message if it fails to open a DSO component at run
time. This typically happens when a shared library that the DSO
requires is not available.

.. admonition:: Rationale
:class: tip

In prior versions of Open MPI, components defaulted to building
as DSOs (vs. being included in their parent libraries, such as
``libmpi.so``). On misconfigured systems, sometimes network
acceleration libraries would not be present, meaning that
HPC-class networking components failed to open at run time. As
such, Open MPI would typically fall back to TCP as a network
transport, which usually led to poor performance of end-user
applications.

Having Open MPI warn about such failures to load was useful
because it alerted users to the misconfiguration.

.. note:: By default, Open MPI |ompi_ver| includes all components in
its base libraries (e.g., on Linux, ``libmpi.so`` includes
all the components that were built with Open MPI, and
therefore no component need to be opened dynamically), and
does not build its components as DSOs.

This MCA parameter *only* affects the behavior of when a
component DSO fails to open.

This MCA parameter can take four general values:

#. ``yes`` or a boolean "true" value (e.g., ``1``): Open MPI will
emit a warning about every component DSO that fails to load.

#. ``no`` or a boolean "false" value (e.g., ``0``): Open MPI will
never emit warnings about component DSOs that fail to load.

#. A comma-delimited list of frameworks and/or components: Open MPI
will emit a warning about any dynamic component that fails to
open and matches a token in the list. "Match" is defined as:

* If a token in the list is only a framework name, then any
component in that framework will match.
* If a token in the list specifies both a framework name and a
component name (in the form ``framework/component``), then
only the specified component in the specified framework will
match.

For example, if the value of this MCA parameter is
``accelerator,btl/uct``, then Open MPI warn if any component in
the accelerator framework or if the UCT BTL fails to load at run
time.

#. The value can also be a ``^`` character followed by a
comma-delimited list of ``framework[/component]`` values: This
is similar to the comma-delimited list of tokens, except it will
only emit warnings about dynamic components that fail to load
and do *not* match a token in the list.

For example, if the value of this MCA parameter is
``^accelerator,btl/uct``, then Open MPI will only warn about the
failure to load DSOs that are neither in the accelerator
framework nor are the UCT BTL.
8 changes: 6 additions & 2 deletions opal/mca/base/base.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2009 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2009-2022 Cisco Systems, Inc. All rights reserved
* Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2015 Research Organization for Information Science
Expand Down Expand Up @@ -69,7 +69,7 @@ OPAL_DECLSPEC OBJ_CLASS_DECLARATION(mca_base_component_priority_list_item_t);
* Public variables
*/
OPAL_DECLSPEC extern char *mca_base_component_path;
OPAL_DECLSPEC extern bool mca_base_component_show_load_errors;
OPAL_DECLSPEC extern char *mca_base_component_show_load_errors;
OPAL_DECLSPEC extern bool mca_base_component_track_load_errors;
OPAL_DECLSPEC extern bool mca_base_component_disable_dlopen;
OPAL_DECLSPEC extern char *mca_base_system_default_path;
Expand Down Expand Up @@ -214,6 +214,10 @@ OPAL_DECLSPEC int mca_base_framework_components_register(struct mca_base_framewo
mca_base_register_flag_t flags);

/* mca_base_components_open.c */
OPAL_DECLSPEC int mca_base_show_load_errors_init(void);
OPAL_DECLSPEC int mca_base_show_load_errors_finalize(void);
OPAL_DECLSPEC bool mca_base_show_load_errors(const char *framework_name,
const char *component_name);
OPAL_DECLSPEC int mca_base_framework_components_open(struct mca_base_framework_t *framework,
mca_base_open_flag_t flags);

Expand Down
22 changes: 21 additions & 1 deletion opal/mca/base/help-mca-base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
# University of Stuttgart. All rights reserved.
# Copyright (c) 2004-2005 The Regents of the University of California.
# All rights reserved.
# Copyright (c) 2008-2014 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2008-2022 Cisco Systems, Inc. All rights reserved
# $COPYRIGHT$
#
# Additional copyrights may follow
Expand Down Expand Up @@ -59,3 +59,23 @@ all components *except* a and b", while "c,d" specifies the inclusive
behavior and means "use *only* components c and d."

You cannot mix inclusive and exclusive behavior.
#
[internal error during init]
An internal error has occurred during the startup of Open MPI. This
is highly unusual and shouldn't happen. Open MPI will now abort your
job.

The following message may provide additional insight into the error:

Failure at: %s (%s:%d)
Error: %d (%s)
#
[show_load_errors: too many /]
The opal_mca_base_component_show_load_errors MCA variable cannot
contain a token that has more than one "/" character in it.

The opal_mca_base_component_show_load_errors MCA variable can only
contain the values: all, none, or a comma-delimited list of tokens in
the form of "framework" or "framework/component".

Erroneous value: %s
5 changes: 4 additions & 1 deletion opal/mca/base/mca_base_close.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2009 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2009-2022 Cisco Systems, Inc. All rights reserved
* Copyright (c) 2015 Los Alamos National Security, LLC. All rights
* reserved.
* $COPYRIGHT$
Expand Down Expand Up @@ -61,6 +61,9 @@ void mca_base_close(void)
/* Shut down the dynamic component finder */
mca_base_component_find_finalize();

/* Shut down the show_load_errors processing */
mca_base_show_load_errors_finalize();

/* Close opal output stream 0 */
opal_output_close(0);
}
5 changes: 3 additions & 2 deletions opal/mca/base/mca_base_component_repository.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2008-2015 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2008-2022 Cisco Systems, Inc. All rights reserved
* Copyright (c) 2015 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2015 Research Organization for Information Science
Expand Down Expand Up @@ -372,7 +372,8 @@ int mca_base_component_repository_open(mca_base_framework_t *framework,
"%s MCA component \"%s\" at path %s",
ri->ri_type, ri->ri_name, ri->ri_path);

vl = mca_base_component_show_load_errors ? MCA_BASE_VERBOSE_ERROR : MCA_BASE_VERBOSE_INFO;
vl = mca_base_show_load_errors(ri->ri_type,
ri->ri_name) ? MCA_BASE_VERBOSE_ERROR : MCA_BASE_VERBOSE_INFO;

/* Ensure that this component is not already loaded (should only happen
if it was statically loaded). It's an error if it's already
Expand Down
Loading