diff --git a/docs/faq/general-tuning.rst b/docs/faq/general-tuning.rst index 783d2dc5f37..23787eac1b2 100644 --- a/docs/faq/general-tuning.rst +++ b/docs/faq/general-tuning.rst @@ -5,136 +5,6 @@ General Tuning ///////////////////////////////////////////////////////////////////////// -What is the Modular Component Architecture (MCA)? -------------------------------------------------- - -The Modular Component Architecture (MCA) is the backbone for much of -Open MPI's functionality. It is a series of *projects*, *frameworks*, -*components*, and *modules* that are assembled at run-time to create -an MPI implementation. - -* **Projects:** An Open MPI project is essentially the highest - abstraction layer division of code. - - .. note:: The word "project" is unfortunately overloaded. It can be - used to mean the code/resources/people in the greater Open - MPI community associated with the development of a - particular software package, but it can also be used to - mean a section of code within the Open MPI code base. - - For the purposes of this documentation, "project" means - the latter: a section of code within the Open MPI code - base. - -* **Frameworks:** An MCA framework manages zero or more components at - run-time and is targeted at a specific task (e.g., providing MPI - collective operation functionality). Each MCA framework supports a - single component type, but may support multiple versions of that - type. The framework uses the services from the MCA base - functionality to find and/or load components. - -* **Components:** An MCA component is an implementation of a - framework's interface. It is a standalone collection of code that - can be bundled into a plugin that can be inserted into the Open MPI - code base, either at run-time and/or compile-time. - -* **Modules:** An MCA module is an instance of a component (in the C++ - sense of the word "instance"; an MCA component is analogous to a C++ - class). For example, if a node running an Open MPI application has - multiple ethernet NICs, the Open MPI application will contain one - TCP MPI point-to-point *component*, but two TCP point-to-point - *modules*. - -///////////////////////////////////////////////////////////////////////// - -What are MCA parameters? ------------------------- - -MCA parameters are the basic unit of run-time tuning for Open -MPI. They are simple "key = value" pairs that are used extensively -throughout the code base. The general rules of thumb that the -developers use are: - -#. Instead of using a constant for an important value, make it an MCA - parameter. -#. If a task can be implemented in multiple, user-discernible ways, - implement as many as possible and make choosing between them be an MCA - parameter. - -For example, an easy MCA parameter to describe is the boundary between -short and long messages in TCP wire-line transmissions. "Short" -messages are sent eagerly whereas "long" messages use a rendezvous -protocol. The decision point between these two protocols is the -overall size of the message (in bytes). By making this value an MCA -parameter, it can be changed at run-time by the user or system -administrator to use a sensible value for a particular environment or -set of hardware (e.g., a value suitable for 1Gpbs Ethernet is probably -not suitable for 100 Gigabit Ethernet, and may require even a third -different value for 40 Gigabit Ethernet). - -Note that MCA parameters may be set in several different ways -(described in another FAQ entry). This allows, for example, system -administrators to fine-tune the Open MPI installation for their -hardware / environment such that normal users can simply use the -default values. - -More specifically, HPC environments |mdash| and the applications that run -on them |mdash| tend to be unique. Providing extensive run-time tuning -capabilities through MCA parameters allows the customization of Open -MPI to each system's / user's / application's particular needs. - -///////////////////////////////////////////////////////////////////////// - -What projects are included in the Open MPI code base? ------------------------------------------------------ - -The following *projects* exist in Open MPI |ompi_ver|: - -* **Open Porability Access Layer (OPAL):** Low-level, operating - system and architecture portability code. -* **Open MPI (OMPI):** The MPI API and supporting infrastructure. -* **OpenSHMEM (OSHMEM):** The OpenSHMEM API and supporting - infrastructure. - -.. note:: Prior versions of Open MPI also included an Open MPI - Runtime Envionrment (ORTE) project. ORTE essentially - evolved into the standalone `PMIx Runtime Reference - Environment (PRRTE) `_, - and is now considered a 3rd-party dependency of Open MPI - -- not one of its included projects. - -///////////////////////////////////////////////////////////////////////// - -What frameworks are in Open MPI? --------------------------------- - -Each project has its own frameworks. - -.. error:: TODO This question may be moot due to :doc:`this list - already in the higher-level doc `. - - -///////////////////////////////////////////////////////////////////////// - -How do I know what components are in my Open MPI installation? --------------------------------------------------------------- - -The ``ompi_info`` command, in addition to providing a wealth of -configuration information about your Open MPI installation, will list -all components (and the frameworks that they belong to) that are -available. These include system-provided components as well as -user-provided components. - -Please note that starting with Open MPI v1.8, ``ompi_info`` categorizes its -parameter parameters in so-called levels, as defined by the MPI_T -interface. You will need to specify ``--level 9`` (or -``--all``) to show *all* MCA parameters. -`See this Cisco Blog entry -`_ -for further information. - -///////////////////////////////////////////////////////////////////////// - .. _faq-general-tuning-install-components: How do I install my own components into an Open MPI installation? @@ -171,353 +41,7 @@ automatically send components to remote nodes when MPI jobs are run. ///////////////////////////////////////////////////////////////////////// -How do I know what MCA parameters are available? ------------------------------------------------- - -The ``ompi_info`` command can list the parameters for a given -component, all the parameters for a specific framework, or all -parameters. Most parameters contain a description of the parameter; -all will show the parameter's current value. - -For example, the following shows all the MCA parameters for all -components that ``ompi_info`` finds: - -.. code-block:: sh - - # Starting with Open MPI v1.7, you must use "--level 9" to see - # all the MCA parameters (the default is "--level 1"): - shell$ ompi_info --param all all --level 9 - - # Before Open MPI v1.7, the "--level" command line options - # did not exist; do not use it. - shell$ ompi_info --param all all - -This example shows all the MCA parameters for all BTL components that -``ompi_info`` finds: - -.. code-block:: sh - - # All remaining examples assume Open MPI v1.7 or later (i.e., - # they assume the use of the "--level" command line option) - shell$ ompi_info --param btl all --level 9 - -This example shows all the MCA parameters for the TCP BTL component: - -.. code-block:: sh - - shell$ ompi_info --param btl tcp --level 9 - -///////////////////////////////////////////////////////////////////////// - -.. _faq-general-tuning-setting-mca-params: - -How do I set the value of MCA parameters? ------------------------------------------ - -There are multiple ways to set MCA parameters, each of which are -listed below, and are resolved in the following priority order: - -#. **Command line:** The highest-precedence method is setting MCA - parameters on the command line. For example: - - .. code-block:: sh - - shell$ mpirun --mca mpi_show_handle_leaks 1 -n 4 a.out - - This sets the MCA parameter ``mpi_show_handle_leaks`` to the value - of 1 before running ``a.out`` with four processes. In general, the - format used on the command line is ``--mca ``. - - Note that when setting multi-word values, you need to use quotes to - ensure that the shell and Open MPI understand that they are a - single value. For example: - - .. code-block:: sh - - shell$ mpirun --mca param "value with multiple words" ... - -#. **Environment variable:** Next, environment variables are searched. - Any environment variable named ``OMPI_MCA_`` will be - used. For example, the following has the same effect as the - previous example (for sh-flavored shells): - - .. code-block:: sh - - shell$ OMPI_MCA_mpi_show_handle_leaks=1 - shell$ export OMPI_MCA_mpi_show_handle_leaks - shell$ mpirun -n 4 a.out - - Note that setting environment variables to values with multiple words - requires quoting, such as: - - .. code-block:: sh - - shell$ OMPI_MCA_param="value with multiple words" - -#. **Tuning MCA parameter files:** Simple text files can be used to - set MCA parameter values for a specific application. :ref:`See this FAQ - entry for more details `. - -#. **Aggregate MCA parameter files:** Simple text files can be used to - set MCA parameter values for a specific application. :ref:`See this FAQ - entry for more details `. - - .. warning:: The use of AMCA param files is deprecated. - -#. **Files:** Finally, simple text files can be used to set MCA - parameter values. Parameters are set one per line (comments are - permitted). For example: - - .. code-block:: ini - - # This is a comment - # Set the same MCA parameter as in previous examples - mpi_show_handle_leaks = 1 - - Note that quotes are *not* necessary for setting multi-word values - in MCA parameter files. Indeed, if you use quotes in the MCA - parameter file, they will be used as part of the value itself. For - example: - - .. code-block:: ini - - # The following two values are different: - param1 = value with multiple words - param2 = "value with multiple words" - - By default, two files are searched (in order): - - #. ``$HOME/.openmpi/mca-params.conf``: The user-supplied set of - values takes the highest precedence. - #. ``$prefix/etc/openmpi-mca-params.conf``: The system-supplied set - of values has a lower precedence. - - More specifically, the MCA parameter ``mca_param_files`` specifies - a colon-delimited path of files to search for MCA parameters. - Files to the left have lower precedence; files to the right are - higher precedence. - - .. note:: Keep in mind that, just like components, these parameter - files are *only* relevant where they are "visible" - (:ref:`see this FAQ entry - `). Specifically, - Open MPI does not read all the values from these files - during startup and then send them to all nodes in the job - |mdash| the files are read on each node during each - process' startup. This is intended behavior: it allows - for per-node customization, which is especially relevant - in heterogeneous environments. - -///////////////////////////////////////////////////////////////////////// - -.. _faq-general-tuning-amca-param-files: - -What are Aggregate MCA (AMCA) parameter files? ----------------------------------------------- - -.. error:: TODO This entire entry needs to be checked for correctness. - Are AMCA files actually deprecated? - -.. warning:: The use of AMCA param files is still available in Open - MPI |ompi_ver|, but is deprecated, and may disappear - in a future version of Open MPI. - -Aggregate MCA (AMCA) parameter files contain MCA parameter key/value -pairs similar to the ``$HOME/.openmpi/mca-params.conf`` file described -in :ref:`this FAQ entry `. - -The motivation behind AMCA parameter sets came from the realization -that certain applications require a large number of MCA parameters are -to run well and/or execute as the user expects. Since these MCA -parameters are application-specific (or even application-run-specific) -they should not be set in a global manner, but only pulled in as -determined by the user. - -MCA parameters set in AMCA parameter files will override any MCA -parameters supplied in global parameter files (e.g., -``$HOME/.openmpi/mca-params.conf``), but not command line or -environment parameters. - -AMCA parameter files are typically supplied on the command line via -the ``--am`` option. - -For example, consider an AMCA parameter file called ``foo.conf`` -placed in the same directory as the application ``a.out``. A user -will typically run the application as: - -.. code-block:: sh - - shell$ mpirun -n 2 a.out - -To use the ``foo.conf`` AMCA parameter file, this command line -changes to: - -.. code-block:: sh - - shell$ mpirun -n 2 --am foo.conf a.out - -If the user wants to override a parameter set in ``foo.conf`` they -can add it to the command line: - -.. code-block:: sh - - shell$ mpirun -n 2 --am foo.conf --mca btl tcp,self a.out - -AMCA parameter files can be coupled if more than one file is to be -used. If we have another AMCA parameter file called ``bar.conf`` -that we want to use, we add it to the command line as follows: - -.. code-block:: sh - - shell$ mpirun -n 2 --am foo.conf:bar.conf a.out - -AMCA parameter files are loaded in priority order. This means that -``foo.conf`` AMCA file has priority over the ``bar.conf`` file. So -if the ``bar.conf`` file sets the MCA parameter -``mpi_leave_pinned=0`` and the ``foo.conf`` file sets this MCA -parameter to ``mpi_leave_pinned=1`` then the latter will be used. - -The location of AMCA parameter files are resolved in a similar way as -the shell: - -#. If no path operator is provided (i.e., ``foo.conf``), then - Open MPI will search the ``$sysconfdir/amca-param-sets`` directory, - then the current working directory. -#. If a relative path is specified, then only that path will be - searched (e.g., ``./foo.conf``, ``baz/foo.conf``). -#. If an absolute path is specified, then only that path will be - searched (e.g., ``/bip/boop/foo.conf``). - -Although the typical use case for AMCA parameter files is to be -specified on the command line, they can also be set as MCA parameters -in the environment. The MCA parameter ``mca_base_param_file_prefix`` -contains a ``:``-delimited list of AMCA parameter files exactly as -they would be passed to the ``--am`` command line option. The MCA -parameter ``mca_base_param_file_path`` specifies the path to search -for AMCA files with relative paths. By default this is -``$sysconfdir/amca-param-sets/:$CWD``. - -///////////////////////////////////////////////////////////////////////// - -.. _faq-general-tuning-tune-param-files: - -How do I set application specific environment variables in global parameter files? ----------------------------------------------------------------------------------- - -.. error:: TODO This entire entry needs to be checked for correctness. - -The ``mpirun`` ``--tune`` CLI options allows users to specify both MCA -parameters and environment variables from within a single file. - -MCA parameters set in tuned parameter files will override any MCA -parameters supplied in global parameter files (e.g., -``$HOME/.openmpi/mca-params.conf``), but not command line or -environment parameters. - -Tuned parameter files are typically supplied on the command line via -the ``--tune`` option. - -For example, consider an tuned parameter file called ``foo.conf`` -placed in the same directory as the application ``a.out``. A user -will typically run the application as: - -.. code-block:: sh - - shell$ mpirun -n 2 a.out - -To use the ``foo.conf`` tuned parameter file, this command line -changes to: - -.. code-block:: sh - - shell$ mpirun -n 2 --tune foo.conf a.out - -Tuned parameter files can be coupled if more than one file is to be -used. If we have another tuuned parameter file called ``bar.conf`` -that we want to use, we add it to the command line as follows: - -.. code-block:: sh - - shell$ mpirun -n 2 --tune foo.conf,bar.conf a.out - - -The contents of tuned files consist of one or more lines, each of -which contain zero or more `-x` and `--mca` options. Comments are not -allowed. For example, the following tuned file: - -.. code-block:: - - -x envvar1=value1 -mca param1 value1 -x envvar2 - -mca param2 value2 - -x envvar3 - -is equivalent to: - -.. code-block:: sh - - shell$ mpirun \ - -x envvar1=value1 -mca param1 value1 -x envvar2 \ - -mca param2 value2 - -x envvar3 \ - ...rest of mpirun command line... - -Although the typical use case for tuned parameter files is to be -specified on the command line, they can also be set as MCA parameters -in the environment. The MCA parameter ``mca_base_envvar_file_prefix`` -contains a ``,``-delimited list of tuned parameter files exactly as -they would be passed to the ``--tune`` command line option. The MCA -parameter ``mca_base_envvar_file_path`` specifies the path to search -for tune files with relative paths. - -.. error:: TODO Check that these MCA var names ^^ are correct. - -///////////////////////////////////////////////////////////////////////// - -How do I select which components are used? ------------------------------------------- - -Each MCA framework has a top-level MCA parameter that helps guide -which components are selected to be used at run-time. Specifically, -there is an MCA parameter of the same name as each MCA framework that -can be used to *include* or *exclude* components from a given run. - -For example, the ``btl`` MCA parameter is used to control which BTL -components are used (e.g., MPI point-to-point communications; -:doc:`see the MCA frameworks listing ` for a full -listing). It can take as a value a comma-separated list of components -with the optional prefix ``^``. For example: - -.. code-block:: sh - - # Tell Open MPI to exclude the tcp and uct BTL components - # and implicitly include all the rest - shell$ mpirun --mca btl ^tcp,uct ... - - # Tell Open MPI to include *only* the components listed here and - # implicitly ignore all the rest (i.e., the loopback, shared memory, - # etc.) MPI point-to-point components): - shell$ mpirun --mca btl self,sm,usnic ... - -Note that ``^`` can *only* be the prefix of the entire value because -the inclusive and exclusive behavior are mutually exclusive. -Specifically, since the exclusive behavior means "use all components -*except* these", it does not make sense to mix it with the inclusive -behavior of not specifying it (i.e., "use all of these components"). -Hence, something like this: - -.. code-block:: sh - - shell$ mpirun --mca btl self,sm,usnic,^tcp ... - -does not make sense because it says both "use only the ``self``, ``sm``, -and ``usnic`` components" and "use all components except ``tcp``" and -will result in an error. - -Just as with all MCA parameters, the ``btl`` parameter (and all -framework parameters) :ref:`can be set in multiple ways -`. - -///////////////////////////////////////////////////////////////////////// +.. _faq-tuning-using-paffinity-label: What is processor affinity? Does Open MPI support it? ------------------------------------------------------ @@ -638,7 +162,8 @@ the ``mpi_warn_on_fork`` MCA parameter. For example: shell$ mpirun --mca mpi_warn_on_fork 0 ... Of course, systems that ``dlopen("libmpi.so", ...)`` may not use Open -MPI's ``mpirun``, and therefore may need to use :ref:`a different +MPI's ``mpirun``, and therefore may need to use (JMS: this ref no +longer exists -- it moved to running-apps/tuning.rst) a different mechanism to set MCA parameters `. diff --git a/docs/faq/index.rst b/docs/faq/index.rst index d2109311b68..c75c0ff38b7 100644 --- a/docs/faq/index.rst +++ b/docs/faq/index.rst @@ -27,5 +27,4 @@ that they are worth categorizing in an official way. ompio macos - tuning general-tuning diff --git a/docs/faq/large-clusters.rst b/docs/faq/large-clusters.rst index 80db5a0930a..f501dea574f 100644 --- a/docs/faq/large-clusters.rst +++ b/docs/faq/large-clusters.rst @@ -154,7 +154,7 @@ as soon as possible. This parameter can be included in the default MCA parameter file, placed in the user's environment, or added to the ``mpirun`` command -line. See :ref:`this FAQ entry ` +line. See :ref:`this FAQ entry ` for more details on how to set MCA parameters. ///////////////////////////////////////////////////////////////////////// diff --git a/docs/faq/ompio.rst b/docs/faq/ompio.rst index 3633a707c7c..47500f86177 100644 --- a/docs/faq/ompio.rst +++ b/docs/faq/ompio.rst @@ -235,7 +235,7 @@ perspective. value). The value of ``io_ompio_bytes_per_agg`` could be set by system administrators in the system-wide Open MPI configuration file, or by users individually. See :ref:`this - FAQ item ` on setting + FAQ item ` on setting MCA parameters for details. For more exhaustive tuning of I/O parameters, we recommend the diff --git a/docs/faq/running-mpi-apps.rst b/docs/faq/running-mpi-apps.rst index 28d9057a614..6777e4c9ec2 100644 --- a/docs/faq/running-mpi-apps.rst +++ b/docs/faq/running-mpi-apps.rst @@ -743,8 +743,8 @@ Several notable options are: :ref:`this FAQ entry for more details `). * ``-n``: Indicate the number of processes to start. -* ``--mca``: Set MCA parameters (see the :doc:`Run-Time Tuning FAQ - category ` for more details). +* ``--mca``: Set MCA parameters (see :ref:`how to set MCA params + ` for more details). * ``--wdir DIRECTORY``: Set the working directory of the started applications. If not supplied, the current working directory is assumed (or ``$HOME``, if the current working directory does not @@ -1320,7 +1320,7 @@ Yes. The MCA parameter ``mpi_yield_when_idle`` controls whether an MPI process runs in Aggressive or Degraded performance mode. Setting it to 0 forces Aggressive mode; setting it to 1 forces Degraded mode (see -:ref:`this FAQ entry ` to see how +:ref:`this FAQ entry ` to see how to set MCA parameters). Note that this value *only* affects the behavior of MPI processes when diff --git a/docs/faq/sysadmin.rst b/docs/faq/sysadmin.rst index f6963c6b769..c63aef6586f 100644 --- a/docs/faq/sysadmin.rst +++ b/docs/faq/sysadmin.rst @@ -131,7 +131,9 @@ network at a system level, such that when users invoke ``mpirun`` or ``mpiexec`` to launch their jobs, they will automatically only be using the network meant for MPI jobs. -:doc:`See the run-time tuning FAQ category ` for information on how to set global MCA parameters. +:ref:`See how to set MCA params +` for information on how to +set global MCA parameters. ///////////////////////////////////////////////////////////////////////// @@ -161,8 +163,10 @@ I have power users who will want to override my global MCA parameters; is this p Absolutely. -:doc:`See the run-time tuning FAQ category ` for information how to set MCA parameters, both at the -system level and on a per-user (or per-MPI-job) basis. +:ref:`See how to set MCA params +` for information how to set +MCA parameters, both at the system level and on a per-user (or +per-MPI-job) basis. ///////////////////////////////////////////////////////////////////////// diff --git a/docs/faq/tuning.rst b/docs/faq/tuning.rst deleted file mode 100644 index af482effc59..00000000000 --- a/docs/faq/tuning.rst +++ /dev/null @@ -1,13 +0,0 @@ -Run-Time Tuning -=============== - -Placeholder; haven't converted over the "run time tuning" FAQ entry to -restructured text yet. - -.. _faq-tuning-setting-mca-params-label: - -setting mca params - -.. _faq-tuning-using-paffinity-label: - -blah using paffinity diff --git a/docs/man-openmpi/man1/ompi_info.1.rst b/docs/man-openmpi/man1/ompi_info.1.rst index a5188b9ccc0..21313d7bc3b 100644 --- a/docs/man-openmpi/man1/ompi_info.1.rst +++ b/docs/man-openmpi/man1/ompi_info.1.rst @@ -163,65 +163,65 @@ more MCA parameters, use the ``--level`` command line option. EXAMPLES -------- +Show the default output of options and listing of installed +components in a human-readable / prettyprint format: + .. code-block:: ompi_info -Show the default output of options and listing of installed -components in a human-readable / prettyprint format. +Show the default output of options and listing of installed components +in a machine-parsable format: .. code-block:: ompi_info --parsable -Show the default output of options and listing of installed components -in a machine-parsable format. +Show the level 1 MCA parameters of the "tcp" BTL component in a +human-readable / prettyprint format: .. code-block:: ompi_info --param btl tcp -Show the level 1 MCA parameters of the "tcp" BTL component in a -human-readable / prettyprint format. +Show the level 1 through level 6 MCA parameters of the "tcp" BTL +component in a human-readable / prettyprint format: .. code-block:: ompi_info --param btl tcp --level 6 -Show the level 1 through level 6 MCA parameters of the "tcp" BTL -component in a human-readable / prettyprint format. +Show the level 1 MCA parameters of the "tcp" BTL component in a +machine-parsable format: .. code-block:: ompi_info --param btl tcp --parsable -Show the level 1 MCA parameters of the "tcp" BTL component in a -machine-parsable format. +Show the level 1 through level 3 MCA parameters of string type in a +human-readable / prettyprint format: .. code-block:: ompi_info --type string --pretty-print --level 3 -Show the level 3 MCA parameters of string type in a human-readable / -prettyprint format. +Show the "bindir" that Open MPI was configured with: .. code-block:: ompi_info --path bindir -Show the "bindir" that Open MPI was configured with. +Show the version of Open MPI version numbers in a prettyprint format: .. code-block:: ompi_info --version -Show the version of Open MPI version numbers in a prettyprint format. +Show *all* information about the Open MPI installation, including all +components that can be found, all the MCA parameters that they support +(i.e., levels 1 through 9), versions of Open MPI and the components, +etc.: .. code-block:: ompi_info --all - -Show *all* information about the Open MPI installation, including all -components that can be found, all the MCA parameters that they support -(i.e., levels 1 through 9), versions of Open MPI and the components, -etc. diff --git a/docs/running-apps/index.rst b/docs/running-apps/index.rst index dca8473b54b..e9343a3cf4d 100644 --- a/docs/running-apps/index.rst +++ b/docs/running-apps/index.rst @@ -19,6 +19,7 @@ but they can generally be broken down into two categories: quickstart pmix-and-prrte + tuning localhost ssh diff --git a/docs/running-apps/pmix-and-prrte.rst b/docs/running-apps/pmix-and-prrte.rst index 31329815001..3d1ce923a05 100644 --- a/docs/running-apps/pmix-and-prrte.rst +++ b/docs/running-apps/pmix-and-prrte.rst @@ -1,3 +1,5 @@ +.. _label-running-role-of-pmix-and-prte: + The role of PMIx and PRRTE ========================== diff --git a/docs/running-apps/tuning.rst b/docs/running-apps/tuning.rst new file mode 100644 index 00000000000..88d2a782f90 --- /dev/null +++ b/docs/running-apps/tuning.rst @@ -0,0 +1,447 @@ +.. _label-run-time-tuning: + +Run-time tuning +=============== + +Open MPI is a highly-customizable system; it can be configured via +configuration files, command line parameters, and environment +variables. + +The main functionality of Open MPI's configuration system is through +the Modular Component Architecture (MCA). + +.. note:: :ref:`The PMIx and PRRTE software packages + ` also use the MCA for + their configuration, composition, and run-time tuning. + +///////////////////////////////////////////////////////////////////////// + +The Modular Component Architecture (MCA) +---------------------------------------- + +The Modular Component Architecture (MCA) is the backbone for much of +Open MPI's functionality. It is a series of *projects*, *frameworks*, +*components*, and *modules* that are assembled at run-time to create +an MPI implementation. + +MCA *parameters* (also known as MCA *variables*) are used to customize +Open MPI's behavior at run-time. + +Each of these entities are described below. + +Projects +^^^^^^^^ + +A *project* is essentially the highest abstraction layer division in +the Open MPI code base. + +.. note:: The word "project" is unfortunately overloaded. It can be + used to mean the code/resources/people in the greater Open + MPI community associated with the development of a + particular software package, but it can also be used to mean + a major, top-level section of code within the Open MPI code + base. + + For the purposes of this documentation, "project" means the + latter: a major, top-level section of code within the Open + MPI code base. + +The following *projects* exist in Open MPI |ompi_ver|: + +* **Open Portability Access Layer (OPAL):** Low-level, operating + system and architecture portability code. +* **Open MPI (OMPI):** The MPI API and supporting infrastructure. +* **OpenSHMEM (OSHMEM):** The OpenSHMEM API and supporting + infrastructure. + +.. note:: Prior versions of Open MPI also included an Open MPI + Runtime Environment (ORTE) project. ORTE essentially + evolved into the standalone `PMIx Runtime Reference + Environment (PRRTE) `_ + and is now considered a 3rd-party dependency of Open MPI + |mdash| not one of its included projects. + + See :ref:`the role of PMIx and PRRTE + ` for more information. + +Frameworks +^^^^^^^^^^ + +An MCA framework manages zero or more components at run-time and is +targeted at a specific task (e.g., providing MPI collective operation +functionality). Although each MCA framework supports only a single +type of component, it may support multiple components of that type. + +Some of the more common frameworks that users may want or need to +customize include the following: + +* ``btl``: Byte Transport Layer; these components are exclusively used + as the underlying transports for the ``ob1`` PML component. +* ``coll``: MPI collective algorithms +* ``io``: MPI I/O +* ``mtl``: MPI Matching Transport Layer (MTL); these components are + exclusively used as the underlying transports for the ``cm`` PML + component. +* ``pml``: Point-to-point Messaging Layer (PML). These components are + used to implement MPI point-to-point messaging functionality. + +There are many frameworks within Open MPI; the exact set varies +between different versions of Open MPI. You can use the +:ref:`ompi_info(1) ` command to see the full list of +frameworks that are included in Open MPI |ompi_ver|. + +Components +^^^^^^^^^^ + +An MCA component is an implementation of a framework's formal +interface. It is a standalone collection of code that can be bundled +into a plugin that can be inserted into the Open MPI code base, either +at run-time and/or compile-time. + +.. note:: Good synonyms for Open MPI's "component" concept are + "plugin", or "add-on". + +The exact set of components varies between different versions of Open +MPI. Open MPI's code base includes support for many components, but +not all of them may be present or available on your system. You can +use the :ref:`ompi_info(1) ` command to see what +components are included in Open MPI |ompi_ver| on your system. + +Modules +^^^^^^^ + +An MCA module is an instance of a component (in the C++ sense of the +word "instance"; an MCA component is analogous to a C++ class). For +example, if a node running an Open MPI application has two Ethernet +NICs, the Open MPI application will contain one TCP MPI point-to-point +*component*, but two TCP point-to-point *modules*. + +Parameters (variables) +^^^^^^^^^^^^^^^^^^^^^^ + +MCA *parameters* (sometimes called MCA *variables*) are the basic unit +of run-time tuning for Open MPI. They are simple "key = value" pairs +that are used extensively throughout Open MPI. The general rules of +thumb that the developers use are: + +#. Instead of using a constant for an important value, make it an MCA + parameter. +#. If a task can be implemented in multiple, user-discernible ways, + implement as many as possible, and use an an MCA parameter to + choose between them at run-time. + +For example, an easy MCA parameter to describe is the boundary between +short and long messages in TCP wire-line transmissions. "Short" +messages are sent eagerly whereas "long" messages use a rendezvous +protocol. The decision point between these two protocols is the +overall size of the message (in bytes). By making this value an MCA +parameter, it can be changed at run-time by the user or system +administrator to use a sensible value for a particular environment or +set of hardware (e.g., a value suitable for 1Gpbs Ethernet is probably +not suitable for 100 Gigabit Ethernet, and may require even a third +different value for 25 Gigabit Ethernet). + +///////////////////////////////////////////////////////////////////////// + +.. _label-running-setting-mca-param-values: + +Setting MCA parameter values +---------------------------- + +MCA parameters may be set in several different ways. + +.. admonition:: Rationale + :class: tip + + Having multiple methods to set MCA parameters allows, for example, + system administrators to fine-tune the Open MPI installation for + their hardware / environment such that normal users can simply use + the default values (that were set by the system administrators). + + HPC environments |mdash| and the applications that run on them + |mdash| tend to be unique. Providing extensive run-time tuning + capabilities through MCA parameters allows the customization of + Open MPI to each system's / user's / application's particular + needs. + +The following are the different methods to set MCA parameters, listed +in priority order: + +#. Command line parameters +#. Environment variables +#. Tuning MCA parameter files +#. Configuration files + +Command line parameters +^^^^^^^^^^^^^^^^^^^^^^^ + +The highest-precedence method is setting MCA parameters on the command +line. For example: + +.. code-block:: sh + + shell$ mpirun --mca mpi_show_handle_leaks 1 -np 4 a.out + +This sets the MCA parameter ``mpi_show_handle_leaks`` to the value of +1 before running ``a.out`` with four processes. In general, the +format used on the command line is ``--mca ``. + +.. note:: When setting a value that includes spaces, you need to use + quotes to ensure that the shell understands that the + multiple tokens are a single value. For example: + + .. code-block:: sh + + shell$ mpirun --mca param "value with multiple words" ... + +Environment variables +^^^^^^^^^^^^^^^^^^^^^ + +Next, environment variables are searched. Any environment variable +named ``OMPI_MCA_`` will be used. For example, the +following has the same effect as the previous example (for sh-flavored +shells): + +.. code-block:: sh + + shell$ export OMPI_MCA_mpi_show_handle_leaks=1 + shell$ mpirun -np 4 a.out + +.. note:: Just like with command line values, setting environment + variables to values with multiple words requires shell + quoting, such as: + + .. code-block:: sh + + shell$ export OMPI_MCA_param="value with multiple words" + +Tuning MCA parameter files +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. error:: TODO This entire entry needs to be checked for correctness. + +Simple text files can be used to set MCA parameter values for a +specific application. + +The ``mpirun --tune`` CLI option allows users to specify both MCA +parameters and environment variables from within a single file. + +MCA parameters set in tuned parameter files will override any MCA +parameters supplied in global parameter files (e.g., +``$HOME/.openmpi/mca-params.conf``), but not command line or +environment parameters. + +Consider a tuned parameter file name ``foo.conf`` that is placed in +the same directory as the application ``a.out``. A user will typically +run the application as: + +.. code-block:: sh + + shell$ mpirun -np 2 a.out + +To use the ``foo.conf`` tuned parameter file, this command line +changes to: + +.. code-block:: sh + + shell$ mpirun -np 2 --tune foo.conf a.out + +Tuned parameter files can be coupled if more than one file is to be +used. If there is another tuned parameter file called ``bar.conf``, it +can be added to the command line as follows: + +.. code-block:: sh + + shell$ mpirun -np 2 --tune foo.conf,bar.conf a.out + +The contents of tuned files consist of one or more lines, each of +which contain zero or more `-x` and `--mca` options. Comments are not +allowed. For example, the following tuned file: + +.. code-block:: + + -x envvar1=value1 -mca param1 value1 -x envvar2 + -mca param2 value2 + -x envvar3 + +is equivalent to: + +.. code-block:: sh + + shell$ mpirun \ + -x envvar1=value1 -mca param1 value1 -x envvar2 \ + -mca param2 value2 + -x envvar3 \ + ...rest of mpirun command line... + +Although the typical use case for tuned parameter files is to be +specified on the command line, they can also be set as MCA parameters +in the environment. The MCA parameter ``mca_base_envvar_file_prefix`` +contains a comma-delimited list of tuned parameter files exactly as +they would be passed to the ``--tune`` command line option. The MCA +parameter ``mca_base_envvar_file_path`` specifies the path to search +for tuned files with relative paths. + +.. error:: TODO Check that these MCA var names ^^ are correct. + +Configuration files +^^^^^^^^^^^^^^^^^^^ + +Finally, simple configuration text files can be used to set MCA +parameter values. Parameters are set one per line (comments are +permitted). For example: + +.. code-block:: ini + + # This is a comment + # Set the same MCA parameter as in previous examples + mpi_show_handle_leaks = 1 + +Note that quotes are *not* necessary for setting multi-word values +in MCA parameter files. Indeed, if you use quotes in the MCA +parameter file, they will be used as part of the value itself. For +example: + +.. code-block:: ini + + # The following two values are different: + param1 = value with multiple words + param2 = "value with multiple words" + +By default, two files are searched (in order): + +#. ``$HOME/.openmpi/mca-params.conf``: The user-supplied set of + values takes the highest precedence. +#. ``$prefix/etc/openmpi-mca-params.conf``: The system-supplied set + of values has a lower precedence. + +More specifically, the MCA parameter ``mca_param_files`` specifies a +colon-delimited path of files to search for MCA parameters. Files to +the left have lower precedence; files to the right are higher +precedence. + +.. note:: Keep in mind that, just like components, these parameter + files are *only* relevant where they are "visible" + (:ref:`see this FAQ entry + `). Specifically, + Open MPI does not read all the values from these files + during startup and then send them to all nodes in the job. + Instead, the files are read on each node during each + process' startup. + + *This is intended behavior:* it allows for per-node + customization, which is especially relevant in heterogeneous + environments. + +///////////////////////////////////////////////////////////////////////// + +.. _label-running-selecting-framework-components: + +Selecting which Open MPI components are used at run time +-------------------------------------------------------- + +Each MCA framework has a top-level MCA parameter that helps guide +which components are selected to be used at run-time. Specifically, +every framework has an MCA parameter of the same name that can be used +to *include* or *exclude* components from a given run. + +For example, the ``btl`` MCA parameter is used to control which BTL +components are used. It takes a comma-delimited list of component +names, and may be optionally prefixed with ``^``. For example: + +.. note:: The Byte Transfer Layer (BTL) framework is used as the + underlying network transports with the `ob1` Point-to-point + Messaging Layer (PML) component. + +.. code-block:: sh + + # Tell Open MPI to include *only* the BTL components listed here and + # implicitly ignore all the rest: + shell$ mpirun --mca btl self,sm,usnic ... + + # Tell Open MPI to exclude the tcp and uct BTL components + # and implicitly include all the rest + shell$ mpirun --mca btl ^tcp,uct ... + +Note that ``^`` can *only* be the prefix of the *entire* +comma-delimited list because the inclusive and exclusive behavior are +mutually exclusive. Specifically, since the exclusive behavior means +"use all components *except* these", it does not make sense to mix it +with the inclusive behavior of not specifying it (i.e., "use all of +these components"). Hence, something like this: + +.. code-block:: sh + + shell$ mpirun --mca btl self,sm,usnic,^tcp ... + +does not make sense |mdash| and will cause an error |mdash| because it +says "use only the ``self``, ``sm``, and ``usnic`` components" but +also "use all components except ``tcp``". These two statements +clearly contradict each other. + +///////////////////////////////////////////////////////////////////////// + +Common MCA parameters +--------------------- + +Open MPI has a *large* number of MCA parameters available. Users can +use the :ref:`ompi_info(1) ` command to see *all* +available MCA parameters. + +The vast majority of these MCA parameters, however, are not useful to +most users. Indeed, there only are a handful of MCA parameters that +are commonly used by end users. :ref:`As described in the +ompi_info(1) man page `, MCA parameters are +grouped into nine levels, corresponding to the MPI standard's tool +support verbosity levels. In general: + +* Levels 1-3 are intended for the end user. + + * These parameters are generally used to effect whether an Open MPI + job will be able to run correctly. + + .. tip:: Parameters in levels 1-3 are probably applicable to + most end users. + +* Levels 4-6 are intended for the application tuner. + + * These parameters are generally used to tune the performance of an + Open MPI job. + +* Levels 7-9 are intended for the MPI implementer. + + * These parameters are esoteric and really only intended for those + who work deep within the implementation of Open MPI code base + itself. + +Although the full list of MCA parameters can be found in the output of +``ompi_info(1)``, the following list of commonly-used parameters is +presented here so that they can easily be found via internet searches: + +* Individual framework names are used as MCA parameters to + :ref:`select which components will be used + `. For example, the + ``btl`` MCA parameter is used to select which components will be + used from the ``btl`` framework. The ``coll`` MCA parameter is used + to select which ``coll`` components are used. And so on. + +* Individual framework names with the ``_base_verbose`` suffix + appended (e.g., ``btl_base_verbose``, ``coll_base_verbose``, etc.) + can be used to set the general verbosity level of all the components + in that framework. + + * This can be helpful when troubleshooting why certain components + are or are not being selected at run time. + +* Many network-related components support "include" and "exclude" + types of components (e.g., ``btl_tcp_if_include`` and + ``btl_tcp_if_exclude``). The "include" parameters specify an + explicit set of network interfaces to use; the "exclude" parameters + specify an explicit set of network interfaces to ignore. Check the + output from :ref:`ompi_info(1)'s ` full list to see + if the network-related component you are using has "include" and + "exclude" network interface parameters. + + .. important:: You can only use the "include" *or* the "exclude" + parameter |mdash| they are mutually exclusive from each + other.