diff --git a/docs/developers/frameworks.rst b/docs/developers/frameworks.rst index 76fd118335e..ddbbc05acbc 100644 --- a/docs/developers/frameworks.rst +++ b/docs/developers/frameworks.rst @@ -73,146 +73,70 @@ to send bytes across different types underlying networks. The ``tcp`` ``btl``, for example, sends messages across TCP-based networks; the ``ucx`` ``pml`` sends messages across InfiniBand-based networks. +MCA parameter notes +------------------- + Each component typically has some tunable parameters that can be -changed at run-time. Use the ``ompi_info`` command to check a component -to see what its tunable parameters are. For example: +changed at run-time. Use the :ref:`ompi_info(1) ` +command to check a component to see what its tunable parameters are. +For example: .. code-block:: sh shell$ ompi_info --param btl tcp shows some of the parameters (and default values) for the ``tcp`` ``btl`` -component (use ``--level`` to show *all* the parameters; see below). - -Note that ``ompi_info`` only shows a small number a component's MCA -parameters by default. Each MCA parameter has a "level" value from 1 -to 9, corresponding to the MPI-3 MPI_T tool interface levels. In Open -MPI, we have interpreted these nine levels as three groups of three: - -#. End user / basic -#. End user / detailed -#. End user / all -#. Application tuner / basic -#. Application tuner / detailed -#. Application tuner / all -#. MPI/OpenSHMEM developer / basic -#. MPI/OpenSHMEM developer / detailed -#. MPI/OpenSHMEM developer / all - -Here's how the three sub-groups are defined: - -#. End user: Generally, these are parameters that are required for - correctness, meaning that someone may need to set these just to - get their MPI/OpenSHMEM application to run correctly. -#. Application tuner: Generally, these are parameters that can be - used to tweak MPI application performance. -#. MPI/OpenSHMEM developer: Parameters that either don't fit in the - other two, or are specifically intended for debugging / - development of Open MPI itself. - -Each sub-group is broken down into three classifications: - -#. Basic: For parameters that everyone in this category will want to - see. -#. Detailed: Parameters that are useful, but you probably won't need - to change them often. -#. All: All other parameters -- probably including some fairly - esoteric parameters. - -To see *all* available parameters for a given component, specify that -ompi_info should use level 9: - -.. code-block:: sh - - shell$ ompi_info --param btl tcp --level 9 - -.. error:: TODO The following content seems redundant with the FAQ. - Additionally, information about how to set MCA params should be - prominently documented somewhere that is easy for users to find -- - not buried here in the developer's section. - -These values can be overridden at run-time in several ways. At -run-time, the following locations are examined (in order) for new -values of parameters: - -#. ``PREFIX/etc/openmpi-mca-params.conf``: - This file is intended to set any system-wide default MCA parameter - values -- it will apply, by default, to all users who use this Open - MPI installation. The default file that is installed contains many - comments explaining its format. - -#. ``$HOME/.openmpi/mca-params.conf``: - If this file exists, it should be in the same format as - ``PREFIX/etc/openmpi-mca-params.conf``. It is intended to provide - per-user default parameter values. - -#. environment variables of the form ``OMPI_MCA_`` set equal to a - ``VALUE``: - - Where ```` is the name of the parameter. For example, set the - variable named ``OMPI_MCA_btl_tcp_frag_size`` to the value 65536 - (Bourne-style shells): - - .. code-block:: sh - - shell$ OMPI_MCA_btl_tcp_frag_size=65536 - shell$ export OMPI_MCA_btl_tcp_frag_size - - .. error:: TODO Do we need content here about PMIx and PRTE env vars? - -#. the ``mpirun``/``oshrun`` command line: ``--mca NAME VALUE`` - - Where ```` is the name of the parameter. For example: - - .. code-block:: sh - - shell$ mpirun --mca btl_tcp_frag_size 65536 -n 2 hello_world_mpi - - .. error:: TODO Do we need content here about PMIx and PRTE MCA vars - and corresponding command line switches? - -These locations are checked in order. For example, a parameter value -passed on the ``mpirun`` command line will override an environment -variable; an environment variable will override the system-wide -defaults. - -Each component typically activates itself when relevant. For example, -the usNIC component will detect that usNIC devices are present and -will automatically be used for MPI communications. The Slurm -component will automatically detect when running inside a Slurm job -and activate itself. And so on. - -Components can be manually activated or deactivated if necessary, of -course. The most common components that are manually activated, -deactivated, or tuned are the ``btl`` components -- components that are -used for MPI point-to-point communications on many types common -networks. - -For example, to *only* activate the ``tcp`` and ``self`` (process loopback) -components are used for MPI communications, specify them in a -comma-delimited list to the ``btl`` MCA parameter: - -.. code-block:: sh - - shell$ mpirun --mca btl tcp,self hello_world_mpi - -To add shared memory support, add ``sm`` into the command-delimited list -(list order does not matter): - -.. code-block:: sh - - shell$ mpirun --mca btl tcp,sm,self hello_world_mpi - -.. note:: There used to be a ``vader`` ``btl`` component for shared - memory support; it was renamed to ``sm`` in Open MPI v5.0.0, - but the alias ``vader`` still works as well. - -To specifically deactivate a specific component, the comma-delimited -list can be prepended with a ``^`` to negate it: - -.. code-block:: sh - - shell$ mpirun --mca btl ^tcp hello_mpi_world - -The above command will use any other ``btl`` component other than the -``tcp`` component. +component (use ``--all`` or ``--level 9`` to show *all* the parameters). + +Note that ``ompi_info`` (without ``--all`` or a specified level) only +shows a small number a component's MCA parameters by default. Each +MCA parameter has a "level" value from 1 to 9, corresponding to the +MPI-3 MPI_T tool interface levels. :ref:`See the LEVELS section in +the ompi_info(1) man page ` for an explanation +of the levels and how they correspond to Open MPI's code. + +Here's rules of thumb to keep in mind when using Open MPI's levels: + +* Levels 1-3: + + * These levels should contain only a few MCA parameters. + * Generally, only put MCA parameters in these levels that matter to + users who just need to *run* Open MPI applications (and don't + know/care anything about MPI). Examples (these are not + comprehensive): + + * Selection of which network interfaces to use. + * Selection of which MCA components to use. + * Selective disabling of warning messages (e.g., show warning + message XYZ unless a specific MCA parameter is set, which + disables showing that warning message). + * Enabling additional stderr logging verbosity. This allows a + user to run with this logging enabled, and then use that output + to get technical assistance. + +* Levels 4-6: + + * These levels should contain any other MCA parameters that are + useful to expose to end users. + * There is an expectation that "power users" will utilize these MCA + parameters |mdash| e.g., those who are trying to tune the system + and extract more performance. + * Here's some examples of MCA parameters suitable for these levels + (these are not comprehensive): + + * When you could have hard-coded a constant size of a resource + (e.g., a resource pool size or buffer length), make it an MCA + parameter instead. + * When there are multiple different algorithms available for a + particular operation, code them all up and provide an MCA + parameter to let the user select between them. + +* Levels 7-9: + + * Put any other MCA parameters here. + * It's ok for these MCA parameters to be esoteric and only relevant + to deep magic / the internals of Open MPI. + * There is little expectation of users using these MCA parameters. + +See :ref:`this section ` for +details on how to set MCA parameters at run time. diff --git a/docs/developers/index.rst b/docs/developers/index.rst index 82e68c26b1d..83de74048b9 100644 --- a/docs/developers/index.rst +++ b/docs/developers/index.rst @@ -17,7 +17,7 @@ probably don't need to read this section. autogen building-open-mpi terminology - source-code-tree-layout + source-code frameworks gnu-autotools sphinx diff --git a/docs/developers/source-code-tree-layout.rst b/docs/developers/source-code-tree-layout.rst deleted file mode 100644 index bf8a10212eb..00000000000 --- a/docs/developers/source-code-tree-layout.rst +++ /dev/null @@ -1,88 +0,0 @@ -Source code tree layout -======================= - -There are a few notable top-level directories in the source -tree: - -* The main sub-projects: - - * ``oshmem``: Top-level OpenSHMEM code base - * ``ompi``: The Open MPI code base - * ``opal``: The OPAL code base - -* ``config``: M4 scripts supporting the top-level ``configure`` script - ``mpi.h`` -* ``etc``: Some miscellaneous text files -* ``docs``: Source code for Open MPI documentation -* ``examples``: Trivial MPI / OpenSHMEM example programs -* ``3rd-party``: Included copies of required core libraries (either - via Git submodules in Git clones or via binary tarballs). - - .. note:: While it may be considered unusual, we include binary - tarballs (instead of Git submodules) for 3rd party projects that - are: - - #. Needed by Open MPI for correct operation, and - #. Not universally included in OS distributions, and - #. Rarely updated. - -Each of the three main source directories (``oshmem``, ``ompi``, and -``opal``) generate at least a top-level library named ``liboshmem``, -``libmpi``, and ``libopen-pal``, respectively. They can be built as -either static or shared libraries. Executables are also produced in -subdirectories of some of the trees. - -Each of the sub-project source directories have similar (but not -identical) directory structures under them: - -* ``class``: C++-like "classes" (using the OPAL class system) - specific to this project -* ``include``: Top-level include files specific to this project -* ``mca``: MCA frameworks and components specific to this project -* ``runtime``: Startup and shutdown of this project at runtime -* ``tools``: Executables specific to this project (currently none in - OPAL) -* ``util``: Random utility code - -There are other top-level directories in each of the sub-projects, -each having to do with specific logic and code for that project. For -example, the MPI API implementations can be found under -``ompi/mpi/LANGUAGE``, where ``LANGUAGE`` is ``c`` or ``fortran``. - -The layout of the ``mca`` trees are strictly defined. They are of the -form: - -.. code-block:: text - - PROJECT/mca/FRAMEWORK/COMPONENT - -To be explicit: it is forbidden to have a directory under the ``mca`` -trees that does not meet this template (with the exception of ``base`` -directories, explained below). Hence, only framework and component -code can be in the ``mca`` trees. - -That is, framework and component names must be valid directory names -(and C variables; more on that later). For example, the TCP BTL -component is located in ``opal/mca/btl/tcp/``. - -The name ``base`` is reserved; there cannot be a framework or component -named ``base``. Directories named ``base`` are reserved for the -implementation of the MCA and frameworks. Here are a few examples (as -of the |ompi_series| source tree): - -.. code-block:: sh - - # Main implementation of the MCA - opal/mca/base - - # Implementation of the btl framework - opal/mca/btl/base - - # Implementation of the sysv framework - oshmem/mcs/sshmem/sysv - - # Implementation of the pml framework - ompi/mca/pml/base - -Under these mandated directories, frameworks and/or components may have -arbitrary directory structures, however. diff --git a/docs/developers/source-code.rst b/docs/developers/source-code.rst new file mode 100644 index 00000000000..e8260910f49 --- /dev/null +++ b/docs/developers/source-code.rst @@ -0,0 +1,292 @@ +Source code +=========== + +Code style +---------- + +We intentionally do not have too many code conventions in the Open MPI +code base. + +All languages +^^^^^^^^^^^^^ + +* 4 space tabs. No more, no less. +* **NEVER** use actual tab characters; always use spaces. Both emacs + and vim have secret mojo that can automatically use spaces when you + hit the ```` key. This makes the code look the same in every + browser, regardless of individual tab display settings. + +C / C++ +^^^^^^^ + +* When comparing constants for equality or inequality, always put the + constant on the left. This is defensive programming: if you have a + typo in the test and miss a ``!`` or ``=``, you'll get a compiler error. + For example: + + .. code-block:: c + + /* Do this */ + if (NULL == foo) { ... } + + /* Because if you have a typo (i.e., = instead of ==), this will + be a compile error rather than a subtle bug */ + if (NULL = foo) { ... } + +* More defensive programming: *always* include blocks in curly braces + ``{ }``, even if they are only one line long. For example: + + .. code-block:: c + + /* Do this */ + if (whatever) { + return OMPI_SUCCESS; + } + + /* Not this */ + if (whatever) + return OMPI_SUCCESS; + +* Starting with Open MPI 1.7, Open MPI requires a C99-compliant + compiler. + + * C++-style comments are now allowed (and preferred). + * C99-style mixing declarations are allow allowable (and preferred). + +* **ALWAYS** include ``_config.h`` as your first #include file, + where ```` is one of ``ompi``, ``oshmem``, or ``opal`` -- the + level that you're writing in. There are very, very few cases where + this is not true (E.g., some bizarre Windows scenarios). But in + 99.9999% of cases, this file should be included **first** so that it + can affect system-level #include files if necessary. +* Filenames and symbols must follow the **prefix rule** (see [e-mail + thread](http://www.open-mpi.org/community/lists/devel/2009/07/6389.php)): + + * Filenames must be prefixed with ``_``. + * Public symbols must be prefixed in components with + ``__``, where ```` is one + of ``mca``, ``ompi``, ``oshmem``, or ``opal``. Note that `mca` + used to be the most common, but it has fallen out of favor + compared to the other ```` prefixes. When in doubt about + whether a symbol is public, be safe and add the prefix. + * Non-public symbols must be declared ``static`` or otherwise made to + not appear in the global scope. + +* **ALWAYS** #define macros, even for logical values. + + * The GNU Way is to ``#define`` a macro when it is "true" and to + ``#undef`` it when it is "false". + * In Open MPI, we **always** ``#define`` a logical macro to be + either 0 or 1 -- we never ``#undef`` it. + * The reason for this is defensive programming: if you are only + checking if a preprocessor macro is defined (via ``#ifdef`` or + ``"#if`` defined(FOO)"), you will get no warning when compiling if + you accidentally misspell the macro name. However, if you use the + logic test ``#if FOO`` with an undefined macro (e.g., because you + misspelled it), you'll get a compiler warning or error. + Misspelled macro names can be tremendously difficult to find when + they are buried in thousands of lines of code, so we will take all + the help from the preprocessor/compiler that we can get! + + .. code-block:: c + + /* Gnu Way - you will get no warning from the compiler if you + misspell "FOO"; the test will simply be false */ + #ifdef FOO + ... + #else + ... + #endif + + /* Open MPI Way - you will get a warning from the compiler if you + misspell "FOO"; the result of the test is a different value + than whether you spelled the macro name right or not */ + #if FOO + ... + #else + ... + #endif + +Fortran +^^^^^^^ + +We do not have specific coding style guidelines for Fortran. Please +read some of the existing Fortran code in the source code tree and try +to use a similar style. + +Shell scripting +^^^^^^^^^^^^^^^ + +Please read some of the existing shell code in the source code tree +and try to use a similar style. + +* Always enclose evaluated shell variables in quotes to ensure that + multi-token values are handled properly. + + .. code-block:: sh + + # This is bad + if test $foo = bar; then + + # This is good + if test "$foo" = "bar"; then + + * The one exception to this is that when doing an assignment to a + shell variable from another shell variable, it is not necessary to + use quotes on the right hand side: + + .. code-block:: sh + + # This is harmless, but unnecessary + foo="$bar" + + # This is actually sufficient, even for multi-token values of $bar + foo=$bar + +* Do not use the ``==`` operator for ``test`` |mdash| this is a GNU + extension and can cause portability problems on BSD systems. + Instead, use the single ``=`` operator. + + .. code-block:: sh + + # This is bad + if test "$foo" == "bar"; then + + # This is good + if test "$foo" = "bar"; then + +* Do not use the ``-a`` or ``-o`` operators for ``test`` |mdash| this + has caused portability problems with ``test(1)`` on BSD systems. + Instead, use the ``&&`` or ``||`` shell operators. + + .. code-block:: sh + + # This is bad + if test "$foo" = "bar" -a "$baz" = "yow"; then + + # This is good + if test "$foo" = "bar" && test "$baz" = "yow"; then + +m4 +^^^ + +We do not have specific coding style guidelines for m4 (the language +used to create the ``configure`` script). Please read some of the +existing m4 code in the source code tree and try to use a similar +style. + +Tree layout +----------- + +There are a few notable top-level directories in the source +tree: + +* The main sub-projects: + + * ``oshmem``: Top-level OpenSHMEM code base + * ``ompi``: The Open MPI code base + * ``opal``: The OPAL code base + +* ``config``: M4 scripts supporting the top-level ``configure`` script + ``mpi.h`` +* ``etc``: Some miscellaneous text files +* ``docs``: Source code for Open MPI documentation +* ``examples``: Trivial MPI / OpenSHMEM example programs +* ``3rd-party``: Included copies of required core libraries (either + via Git submodules in Git clones or via binary tarballs). + + .. note:: While it may be considered unusual, we include binary + tarballs (instead of Git submodules) for 3rd party projects that + are: + + #. Needed by Open MPI for correct operation, and + #. Not universally included in OS distributions, and + #. Rarely updated. + +Each of the three main source directories (``oshmem``, ``ompi``, and +``opal``) generate at least a top-level library named ``liboshmem``, +``libmpi``, and ``libopen-pal``, respectively. They can be built as +either static or shared libraries. Executables are also produced in +subdirectories of some of the trees. + +Each of the sub-project source directories have similar (but not +identical) directory structures under them: + +* ``class``: C++-like "classes" (using the OPAL class system) + specific to this project +* ``include``: Top-level include files specific to this project +* ``mca``: MCA frameworks and components specific to this project +* ``runtime``: Startup and shutdown of this project at runtime +* ``tools``: Executables specific to this project +* ``util``: Random utility code + +There are other top-level directories in each of the sub-projects, +each having to do with specific logic and code for that project. For +example, the MPI API implementations can be found under +``ompi/mpi/LANGUAGE``, where ``LANGUAGE`` is ``c``, ``fortran``, or +``java``. + +The layout of the ``mca`` trees are strictly defined. They are of the +form: + +.. code-block:: text + + PROJECT/mca/FRAMEWORK/COMPONENT + +To be explicit: it is forbidden to have a directory under the ``mca`` +trees that does not meet this template (with the exception of ``base`` +directories, explained below). Hence, only framework and component +code can be in the ``mca`` trees. + +That is, framework and component names must be valid directory names +(and C variables; more on that later). For example, the TCP BTL +component is located in ``opal/mca/btl/tcp/``. + +The name ``base`` is reserved; there cannot be a framework or component +named ``base``. Directories named ``base`` are reserved for the +implementation of the MCA and frameworks. Here are a few examples (as +of the |ompi_series| source tree): + +.. code-block:: sh + + # Main implementation of the MCA + opal/mca/base + + # Implementation of the btl framework + opal/mca/btl/base + + # Implementation of the sysv framework + oshmem/mcs/sshmem/sysv + + # Implementation of the pml framework + ompi/mca/pml/base + +Under these mandated directories, frameworks and/or components may have +arbitrary directory structures, however. + +Symbol Visibility +----------------- + +The ``*_DECLSPEC`` macros provide a method to annotate symbols to indicate +their intended visibility when compiling dynamically shared object files +(e.g., ``libmpi.so``). The macros are defined on a per project basis: + +* Open MPI: ``OMPI_DECLSPEC`` +* Open PAL: ``OPAL_DECLSPEC`` +* OpenSHMEM: ``OSHMEM_DECLSPEC`` + +The macros expand to the appropriate compiler and platform flags for marking +whether a symbol should be explicitly made public in the target project's +library namespace. +The ``*_DECLSPEC`` attributes are used to declare that a symbol is to be +visible outside of that library/DSO's scope. For example, ``OMPI_DECLSPEC`` +is used to control what symbols are visible in the ``libmpi.so`` scope. + +.. note:: This is entirely related to dynamic library compilation and does not + apply to static compilation. + +.. note:: The macros were originally introduced when Open MPI supported + Windows (circa Open MPI v1.0.0) and are motivated by the Windows + `__declspec `_. + While support for Windows has been dropped from Open MPI, the symbol + visibility macros remain. diff --git a/docs/developers/terminology.rst b/docs/developers/terminology.rst index 7dae0053df0..e5b2a0a5d10 100644 --- a/docs/developers/terminology.rst +++ b/docs/developers/terminology.rst @@ -1,22 +1,32 @@ Open MPI terminology ==================== -Open MPI is a large project containing many different -sub-systems and a relatively large code base. Let's first cover some -fundamental terminology in order to make the rest of the discussion -easier. +Open MPI is a large project containing many different sub-systems and +a relatively large code base. Let's first cover some fundamental +terminology in order to make the rest of the discussion easier. -Open MPI has multiple main sections of code: +Modular Component Architecture (MCA) +------------------------------------ -* *OSHMEM:* The OpenSHMEM API and supporting logic -* *OMPI:* The MPI API and supporting logic -* *OPAL:* The Open Portable Access Layer (utility and "glue" code) +:doc:`See this section ` for a discussion of the +Modular Component Architecture (MCA). Seriously. Go read it now. +From reading that section, you should understand the following terms +before continuing reading these docs: -There are strict abstraction barriers in the code between these -sections. That is, they are compiled into separate libraries: -``liboshmem``, ``libmpi``, ``libopen-pal`` with a strict dependency order: -OSHMEM depends on OMPI, OMPI depends on OPAL. For example, MPI -executables are linked with: +* Project +* Framework +* Component +* Module +* Parameters (variables) + +Notes on projects +----------------- + +Projects are strict abstraction barriers in the code. That is, they +are compiled into separate libraries: ``liboshmem``, ``libmpi``, +``libopen-pal`` with a strict dependency order: OSHMEM depends on +OMPI, OMPI depends on OPAL. For example, MPI executables are linked +with: .. code-block:: sh @@ -31,9 +41,8 @@ overall link step. Strictly speaking, these are not "layers" in the classic software engineering sense (even though it is convenient to refer to them as such). They are listed above in dependency order, but that does not -mean that, for example, the OMPI code must go through the -OPAL code in order to reach the operating system or a network -interface. +mean that, for example, the OMPI code must go through the OPAL code in +order to reach the operating system or a network interface. As such, this code organization more reflects abstractions and software engineering, not a strict hierarchy of functions that must be @@ -43,6 +52,23 @@ OPAL). Indeed, many top-level MPI API functions are quite performance sensitive; it would not make sense to force them to traverse an arbitrarily deep call stack just to move some bytes across a network. +Frameworks, components, and modules can be dynamic or static. That is, +they can be available as plugins or they may be compiled statically +into libraries (e.g., ``libmpi``). + +In Open MPI |ompi_ver|, ``configure`` defaults to: + +* Building projects as dynamic libraries +* Linking all components into their parent project libraries + (vs. compiling them as independent DSOs) + +Although these defaults can be modified by :doc:`command line +arguments to configure +`. + +Required 3rd party libraries +---------------------------- + Note that Open MPI also uses some third-party libraries for core functionality: @@ -53,46 +79,3 @@ functionality: These are discussed in detail in the :ref:`required support libraries section `. - -Here's a list of terms that are frequently used in discussions about -the Open MPI code base: - -* *MCA:* The Modular Component Architecture (MCA) is the foundation - upon which the entire Open MPI project is built. It provides all the - component architecture services that the rest of the system uses. - Although it is the fundamental heart of the system, its - implementation is actually quite small and lightweight |mdash| it is - nothing like CORBA, COM, JINI, or many other well-known component - architectures. It was designed for HPC |mdash| meaning that it is small, - fast, and reasonably efficient |mdash| and therefore offers few services - other than finding, loading, and unloading components. - -* *Framework:* An MCA *framework* is a construct that is created for a - single, targeted purpose. It provides a public interface that is - used by external code, but it also has its own internal services. - :ref:`See the list of Open MPI frameworks in this version of Open - MPI `. An MCA framework uses the MCA's services - to find and load *components* at run-time |mdash| implementations of - the framework's interface. An easy example framework to discuss is - the MPI framework named ``btl``, or the Byte Transfer Layer. It is - used to send and receive data on different kinds of networks. - Hence, Open MPI has ``btl`` components for shared memory, - OpenFabrics interfaces, various protocols over Ethernet, etc. - -* *Component:* An MCA *component* is an implementation of a - framework's interface. Another common word for component is - "plugin". It is a standalone collection of code that can be bundled - into a unit that can be inserted into the Open MPI code base, either - at run-time and/or compile-time. - -* *Module:* An MCA *module* is an instance of a component (in the C++ - sense of the word "instance"; an MCA component is analogous to a C++ - class, and an MCA module is analogous to a C++ object). For example, - if a node running an Open MPI application has two Ethernet NICs, the - Open MPI application will contain one TCP ``btl`` component, but two - TCP ``btl`` modules. This difference between components and modules - is important because modules have private state; components do not. - -Frameworks, components, and modules can be dynamic or static. That is, -they can be available as plugins or they may be compiled statically -into libraries (e.g., ``libmpi``). diff --git a/docs/faq/general-tuning.rst b/docs/faq/general-tuning.rst index 783d2dc5f37..23787eac1b2 100644 --- a/docs/faq/general-tuning.rst +++ b/docs/faq/general-tuning.rst @@ -5,136 +5,6 @@ General Tuning ///////////////////////////////////////////////////////////////////////// -What is the Modular Component Architecture (MCA)? -------------------------------------------------- - -The Modular Component Architecture (MCA) is the backbone for much of -Open MPI's functionality. It is a series of *projects*, *frameworks*, -*components*, and *modules* that are assembled at run-time to create -an MPI implementation. - -* **Projects:** An Open MPI project is essentially the highest - abstraction layer division of code. - - .. note:: The word "project" is unfortunately overloaded. It can be - used to mean the code/resources/people in the greater Open - MPI community associated with the development of a - particular software package, but it can also be used to - mean a section of code within the Open MPI code base. - - For the purposes of this documentation, "project" means - the latter: a section of code within the Open MPI code - base. - -* **Frameworks:** An MCA framework manages zero or more components at - run-time and is targeted at a specific task (e.g., providing MPI - collective operation functionality). Each MCA framework supports a - single component type, but may support multiple versions of that - type. The framework uses the services from the MCA base - functionality to find and/or load components. - -* **Components:** An MCA component is an implementation of a - framework's interface. It is a standalone collection of code that - can be bundled into a plugin that can be inserted into the Open MPI - code base, either at run-time and/or compile-time. - -* **Modules:** An MCA module is an instance of a component (in the C++ - sense of the word "instance"; an MCA component is analogous to a C++ - class). For example, if a node running an Open MPI application has - multiple ethernet NICs, the Open MPI application will contain one - TCP MPI point-to-point *component*, but two TCP point-to-point - *modules*. - -///////////////////////////////////////////////////////////////////////// - -What are MCA parameters? ------------------------- - -MCA parameters are the basic unit of run-time tuning for Open -MPI. They are simple "key = value" pairs that are used extensively -throughout the code base. The general rules of thumb that the -developers use are: - -#. Instead of using a constant for an important value, make it an MCA - parameter. -#. If a task can be implemented in multiple, user-discernible ways, - implement as many as possible and make choosing between them be an MCA - parameter. - -For example, an easy MCA parameter to describe is the boundary between -short and long messages in TCP wire-line transmissions. "Short" -messages are sent eagerly whereas "long" messages use a rendezvous -protocol. The decision point between these two protocols is the -overall size of the message (in bytes). By making this value an MCA -parameter, it can be changed at run-time by the user or system -administrator to use a sensible value for a particular environment or -set of hardware (e.g., a value suitable for 1Gpbs Ethernet is probably -not suitable for 100 Gigabit Ethernet, and may require even a third -different value for 40 Gigabit Ethernet). - -Note that MCA parameters may be set in several different ways -(described in another FAQ entry). This allows, for example, system -administrators to fine-tune the Open MPI installation for their -hardware / environment such that normal users can simply use the -default values. - -More specifically, HPC environments |mdash| and the applications that run -on them |mdash| tend to be unique. Providing extensive run-time tuning -capabilities through MCA parameters allows the customization of Open -MPI to each system's / user's / application's particular needs. - -///////////////////////////////////////////////////////////////////////// - -What projects are included in the Open MPI code base? ------------------------------------------------------ - -The following *projects* exist in Open MPI |ompi_ver|: - -* **Open Porability Access Layer (OPAL):** Low-level, operating - system and architecture portability code. -* **Open MPI (OMPI):** The MPI API and supporting infrastructure. -* **OpenSHMEM (OSHMEM):** The OpenSHMEM API and supporting - infrastructure. - -.. note:: Prior versions of Open MPI also included an Open MPI - Runtime Envionrment (ORTE) project. ORTE essentially - evolved into the standalone `PMIx Runtime Reference - Environment (PRRTE) `_, - and is now considered a 3rd-party dependency of Open MPI - -- not one of its included projects. - -///////////////////////////////////////////////////////////////////////// - -What frameworks are in Open MPI? --------------------------------- - -Each project has its own frameworks. - -.. error:: TODO This question may be moot due to :doc:`this list - already in the higher-level doc `. - - -///////////////////////////////////////////////////////////////////////// - -How do I know what components are in my Open MPI installation? --------------------------------------------------------------- - -The ``ompi_info`` command, in addition to providing a wealth of -configuration information about your Open MPI installation, will list -all components (and the frameworks that they belong to) that are -available. These include system-provided components as well as -user-provided components. - -Please note that starting with Open MPI v1.8, ``ompi_info`` categorizes its -parameter parameters in so-called levels, as defined by the MPI_T -interface. You will need to specify ``--level 9`` (or -``--all``) to show *all* MCA parameters. -`See this Cisco Blog entry -`_ -for further information. - -///////////////////////////////////////////////////////////////////////// - .. _faq-general-tuning-install-components: How do I install my own components into an Open MPI installation? @@ -171,353 +41,7 @@ automatically send components to remote nodes when MPI jobs are run. ///////////////////////////////////////////////////////////////////////// -How do I know what MCA parameters are available? ------------------------------------------------- - -The ``ompi_info`` command can list the parameters for a given -component, all the parameters for a specific framework, or all -parameters. Most parameters contain a description of the parameter; -all will show the parameter's current value. - -For example, the following shows all the MCA parameters for all -components that ``ompi_info`` finds: - -.. code-block:: sh - - # Starting with Open MPI v1.7, you must use "--level 9" to see - # all the MCA parameters (the default is "--level 1"): - shell$ ompi_info --param all all --level 9 - - # Before Open MPI v1.7, the "--level" command line options - # did not exist; do not use it. - shell$ ompi_info --param all all - -This example shows all the MCA parameters for all BTL components that -``ompi_info`` finds: - -.. code-block:: sh - - # All remaining examples assume Open MPI v1.7 or later (i.e., - # they assume the use of the "--level" command line option) - shell$ ompi_info --param btl all --level 9 - -This example shows all the MCA parameters for the TCP BTL component: - -.. code-block:: sh - - shell$ ompi_info --param btl tcp --level 9 - -///////////////////////////////////////////////////////////////////////// - -.. _faq-general-tuning-setting-mca-params: - -How do I set the value of MCA parameters? ------------------------------------------ - -There are multiple ways to set MCA parameters, each of which are -listed below, and are resolved in the following priority order: - -#. **Command line:** The highest-precedence method is setting MCA - parameters on the command line. For example: - - .. code-block:: sh - - shell$ mpirun --mca mpi_show_handle_leaks 1 -n 4 a.out - - This sets the MCA parameter ``mpi_show_handle_leaks`` to the value - of 1 before running ``a.out`` with four processes. In general, the - format used on the command line is ``--mca ``. - - Note that when setting multi-word values, you need to use quotes to - ensure that the shell and Open MPI understand that they are a - single value. For example: - - .. code-block:: sh - - shell$ mpirun --mca param "value with multiple words" ... - -#. **Environment variable:** Next, environment variables are searched. - Any environment variable named ``OMPI_MCA_`` will be - used. For example, the following has the same effect as the - previous example (for sh-flavored shells): - - .. code-block:: sh - - shell$ OMPI_MCA_mpi_show_handle_leaks=1 - shell$ export OMPI_MCA_mpi_show_handle_leaks - shell$ mpirun -n 4 a.out - - Note that setting environment variables to values with multiple words - requires quoting, such as: - - .. code-block:: sh - - shell$ OMPI_MCA_param="value with multiple words" - -#. **Tuning MCA parameter files:** Simple text files can be used to - set MCA parameter values for a specific application. :ref:`See this FAQ - entry for more details `. - -#. **Aggregate MCA parameter files:** Simple text files can be used to - set MCA parameter values for a specific application. :ref:`See this FAQ - entry for more details `. - - .. warning:: The use of AMCA param files is deprecated. - -#. **Files:** Finally, simple text files can be used to set MCA - parameter values. Parameters are set one per line (comments are - permitted). For example: - - .. code-block:: ini - - # This is a comment - # Set the same MCA parameter as in previous examples - mpi_show_handle_leaks = 1 - - Note that quotes are *not* necessary for setting multi-word values - in MCA parameter files. Indeed, if you use quotes in the MCA - parameter file, they will be used as part of the value itself. For - example: - - .. code-block:: ini - - # The following two values are different: - param1 = value with multiple words - param2 = "value with multiple words" - - By default, two files are searched (in order): - - #. ``$HOME/.openmpi/mca-params.conf``: The user-supplied set of - values takes the highest precedence. - #. ``$prefix/etc/openmpi-mca-params.conf``: The system-supplied set - of values has a lower precedence. - - More specifically, the MCA parameter ``mca_param_files`` specifies - a colon-delimited path of files to search for MCA parameters. - Files to the left have lower precedence; files to the right are - higher precedence. - - .. note:: Keep in mind that, just like components, these parameter - files are *only* relevant where they are "visible" - (:ref:`see this FAQ entry - `). Specifically, - Open MPI does not read all the values from these files - during startup and then send them to all nodes in the job - |mdash| the files are read on each node during each - process' startup. This is intended behavior: it allows - for per-node customization, which is especially relevant - in heterogeneous environments. - -///////////////////////////////////////////////////////////////////////// - -.. _faq-general-tuning-amca-param-files: - -What are Aggregate MCA (AMCA) parameter files? ----------------------------------------------- - -.. error:: TODO This entire entry needs to be checked for correctness. - Are AMCA files actually deprecated? - -.. warning:: The use of AMCA param files is still available in Open - MPI |ompi_ver|, but is deprecated, and may disappear - in a future version of Open MPI. - -Aggregate MCA (AMCA) parameter files contain MCA parameter key/value -pairs similar to the ``$HOME/.openmpi/mca-params.conf`` file described -in :ref:`this FAQ entry `. - -The motivation behind AMCA parameter sets came from the realization -that certain applications require a large number of MCA parameters are -to run well and/or execute as the user expects. Since these MCA -parameters are application-specific (or even application-run-specific) -they should not be set in a global manner, but only pulled in as -determined by the user. - -MCA parameters set in AMCA parameter files will override any MCA -parameters supplied in global parameter files (e.g., -``$HOME/.openmpi/mca-params.conf``), but not command line or -environment parameters. - -AMCA parameter files are typically supplied on the command line via -the ``--am`` option. - -For example, consider an AMCA parameter file called ``foo.conf`` -placed in the same directory as the application ``a.out``. A user -will typically run the application as: - -.. code-block:: sh - - shell$ mpirun -n 2 a.out - -To use the ``foo.conf`` AMCA parameter file, this command line -changes to: - -.. code-block:: sh - - shell$ mpirun -n 2 --am foo.conf a.out - -If the user wants to override a parameter set in ``foo.conf`` they -can add it to the command line: - -.. code-block:: sh - - shell$ mpirun -n 2 --am foo.conf --mca btl tcp,self a.out - -AMCA parameter files can be coupled if more than one file is to be -used. If we have another AMCA parameter file called ``bar.conf`` -that we want to use, we add it to the command line as follows: - -.. code-block:: sh - - shell$ mpirun -n 2 --am foo.conf:bar.conf a.out - -AMCA parameter files are loaded in priority order. This means that -``foo.conf`` AMCA file has priority over the ``bar.conf`` file. So -if the ``bar.conf`` file sets the MCA parameter -``mpi_leave_pinned=0`` and the ``foo.conf`` file sets this MCA -parameter to ``mpi_leave_pinned=1`` then the latter will be used. - -The location of AMCA parameter files are resolved in a similar way as -the shell: - -#. If no path operator is provided (i.e., ``foo.conf``), then - Open MPI will search the ``$sysconfdir/amca-param-sets`` directory, - then the current working directory. -#. If a relative path is specified, then only that path will be - searched (e.g., ``./foo.conf``, ``baz/foo.conf``). -#. If an absolute path is specified, then only that path will be - searched (e.g., ``/bip/boop/foo.conf``). - -Although the typical use case for AMCA parameter files is to be -specified on the command line, they can also be set as MCA parameters -in the environment. The MCA parameter ``mca_base_param_file_prefix`` -contains a ``:``-delimited list of AMCA parameter files exactly as -they would be passed to the ``--am`` command line option. The MCA -parameter ``mca_base_param_file_path`` specifies the path to search -for AMCA files with relative paths. By default this is -``$sysconfdir/amca-param-sets/:$CWD``. - -///////////////////////////////////////////////////////////////////////// - -.. _faq-general-tuning-tune-param-files: - -How do I set application specific environment variables in global parameter files? ----------------------------------------------------------------------------------- - -.. error:: TODO This entire entry needs to be checked for correctness. - -The ``mpirun`` ``--tune`` CLI options allows users to specify both MCA -parameters and environment variables from within a single file. - -MCA parameters set in tuned parameter files will override any MCA -parameters supplied in global parameter files (e.g., -``$HOME/.openmpi/mca-params.conf``), but not command line or -environment parameters. - -Tuned parameter files are typically supplied on the command line via -the ``--tune`` option. - -For example, consider an tuned parameter file called ``foo.conf`` -placed in the same directory as the application ``a.out``. A user -will typically run the application as: - -.. code-block:: sh - - shell$ mpirun -n 2 a.out - -To use the ``foo.conf`` tuned parameter file, this command line -changes to: - -.. code-block:: sh - - shell$ mpirun -n 2 --tune foo.conf a.out - -Tuned parameter files can be coupled if more than one file is to be -used. If we have another tuuned parameter file called ``bar.conf`` -that we want to use, we add it to the command line as follows: - -.. code-block:: sh - - shell$ mpirun -n 2 --tune foo.conf,bar.conf a.out - - -The contents of tuned files consist of one or more lines, each of -which contain zero or more `-x` and `--mca` options. Comments are not -allowed. For example, the following tuned file: - -.. code-block:: - - -x envvar1=value1 -mca param1 value1 -x envvar2 - -mca param2 value2 - -x envvar3 - -is equivalent to: - -.. code-block:: sh - - shell$ mpirun \ - -x envvar1=value1 -mca param1 value1 -x envvar2 \ - -mca param2 value2 - -x envvar3 \ - ...rest of mpirun command line... - -Although the typical use case for tuned parameter files is to be -specified on the command line, they can also be set as MCA parameters -in the environment. The MCA parameter ``mca_base_envvar_file_prefix`` -contains a ``,``-delimited list of tuned parameter files exactly as -they would be passed to the ``--tune`` command line option. The MCA -parameter ``mca_base_envvar_file_path`` specifies the path to search -for tune files with relative paths. - -.. error:: TODO Check that these MCA var names ^^ are correct. - -///////////////////////////////////////////////////////////////////////// - -How do I select which components are used? ------------------------------------------- - -Each MCA framework has a top-level MCA parameter that helps guide -which components are selected to be used at run-time. Specifically, -there is an MCA parameter of the same name as each MCA framework that -can be used to *include* or *exclude* components from a given run. - -For example, the ``btl`` MCA parameter is used to control which BTL -components are used (e.g., MPI point-to-point communications; -:doc:`see the MCA frameworks listing ` for a full -listing). It can take as a value a comma-separated list of components -with the optional prefix ``^``. For example: - -.. code-block:: sh - - # Tell Open MPI to exclude the tcp and uct BTL components - # and implicitly include all the rest - shell$ mpirun --mca btl ^tcp,uct ... - - # Tell Open MPI to include *only* the components listed here and - # implicitly ignore all the rest (i.e., the loopback, shared memory, - # etc.) MPI point-to-point components): - shell$ mpirun --mca btl self,sm,usnic ... - -Note that ``^`` can *only* be the prefix of the entire value because -the inclusive and exclusive behavior are mutually exclusive. -Specifically, since the exclusive behavior means "use all components -*except* these", it does not make sense to mix it with the inclusive -behavior of not specifying it (i.e., "use all of these components"). -Hence, something like this: - -.. code-block:: sh - - shell$ mpirun --mca btl self,sm,usnic,^tcp ... - -does not make sense because it says both "use only the ``self``, ``sm``, -and ``usnic`` components" and "use all components except ``tcp``" and -will result in an error. - -Just as with all MCA parameters, the ``btl`` parameter (and all -framework parameters) :ref:`can be set in multiple ways -`. - -///////////////////////////////////////////////////////////////////////// +.. _faq-tuning-using-paffinity-label: What is processor affinity? Does Open MPI support it? ------------------------------------------------------ @@ -638,7 +162,8 @@ the ``mpi_warn_on_fork`` MCA parameter. For example: shell$ mpirun --mca mpi_warn_on_fork 0 ... Of course, systems that ``dlopen("libmpi.so", ...)`` may not use Open -MPI's ``mpirun``, and therefore may need to use :ref:`a different +MPI's ``mpirun``, and therefore may need to use (JMS: this ref no +longer exists -- it moved to running-apps/tuning.rst) a different mechanism to set MCA parameters `. diff --git a/docs/faq/index.rst b/docs/faq/index.rst index d2109311b68..c75c0ff38b7 100644 --- a/docs/faq/index.rst +++ b/docs/faq/index.rst @@ -27,5 +27,4 @@ that they are worth categorizing in an official way. ompio macos - tuning general-tuning diff --git a/docs/faq/large-clusters.rst b/docs/faq/large-clusters.rst index 80db5a0930a..f501dea574f 100644 --- a/docs/faq/large-clusters.rst +++ b/docs/faq/large-clusters.rst @@ -154,7 +154,7 @@ as soon as possible. This parameter can be included in the default MCA parameter file, placed in the user's environment, or added to the ``mpirun`` command -line. See :ref:`this FAQ entry ` +line. See :ref:`this FAQ entry ` for more details on how to set MCA parameters. ///////////////////////////////////////////////////////////////////////// diff --git a/docs/faq/ompio.rst b/docs/faq/ompio.rst index 3633a707c7c..47500f86177 100644 --- a/docs/faq/ompio.rst +++ b/docs/faq/ompio.rst @@ -235,7 +235,7 @@ perspective. value). The value of ``io_ompio_bytes_per_agg`` could be set by system administrators in the system-wide Open MPI configuration file, or by users individually. See :ref:`this - FAQ item ` on setting + FAQ item ` on setting MCA parameters for details. For more exhaustive tuning of I/O parameters, we recommend the diff --git a/docs/faq/running-mpi-apps.rst b/docs/faq/running-mpi-apps.rst index 28d9057a614..6777e4c9ec2 100644 --- a/docs/faq/running-mpi-apps.rst +++ b/docs/faq/running-mpi-apps.rst @@ -743,8 +743,8 @@ Several notable options are: :ref:`this FAQ entry for more details `). * ``-n``: Indicate the number of processes to start. -* ``--mca``: Set MCA parameters (see the :doc:`Run-Time Tuning FAQ - category ` for more details). +* ``--mca``: Set MCA parameters (see :ref:`how to set MCA params + ` for more details). * ``--wdir DIRECTORY``: Set the working directory of the started applications. If not supplied, the current working directory is assumed (or ``$HOME``, if the current working directory does not @@ -1320,7 +1320,7 @@ Yes. The MCA parameter ``mpi_yield_when_idle`` controls whether an MPI process runs in Aggressive or Degraded performance mode. Setting it to 0 forces Aggressive mode; setting it to 1 forces Degraded mode (see -:ref:`this FAQ entry ` to see how +:ref:`this FAQ entry ` to see how to set MCA parameters). Note that this value *only* affects the behavior of MPI processes when diff --git a/docs/faq/sysadmin.rst b/docs/faq/sysadmin.rst index f6963c6b769..c63aef6586f 100644 --- a/docs/faq/sysadmin.rst +++ b/docs/faq/sysadmin.rst @@ -131,7 +131,9 @@ network at a system level, such that when users invoke ``mpirun`` or ``mpiexec`` to launch their jobs, they will automatically only be using the network meant for MPI jobs. -:doc:`See the run-time tuning FAQ category ` for information on how to set global MCA parameters. +:ref:`See how to set MCA params +` for information on how to +set global MCA parameters. ///////////////////////////////////////////////////////////////////////// @@ -161,8 +163,10 @@ I have power users who will want to override my global MCA parameters; is this p Absolutely. -:doc:`See the run-time tuning FAQ category ` for information how to set MCA parameters, both at the -system level and on a per-user (or per-MPI-job) basis. +:ref:`See how to set MCA params +` for information how to set +MCA parameters, both at the system level and on a per-user (or +per-MPI-job) basis. ///////////////////////////////////////////////////////////////////////// diff --git a/docs/faq/tuning.rst b/docs/faq/tuning.rst deleted file mode 100644 index af482effc59..00000000000 --- a/docs/faq/tuning.rst +++ /dev/null @@ -1,13 +0,0 @@ -Run-Time Tuning -=============== - -Placeholder; haven't converted over the "run time tuning" FAQ entry to -restructured text yet. - -.. _faq-tuning-setting-mca-params-label: - -setting mca params - -.. _faq-tuning-using-paffinity-label: - -blah using paffinity diff --git a/docs/index.rst b/docs/index.rst index 31493562613..a2806da90fd 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -26,15 +26,11 @@ Documentation for Open can be found in the following locations: - Documentation location * - v5.0.0 and later - - Open MPI documentation has consolidated and moved to + - Web: https://docs.open-mpi.org/ - https://docs.open-mpi.org/. + Tarball: ``docs/_build/html/index.html`` - This particular documentation is for |ompi_ver|; use the - selector in the - - bottom-left of the navigation column to select - documentation for different version. + Installed: ``$prefix/share/doc/openmpi/html/index.html`` * - v4.1.x and earlier - See the `legacy Open MPI FAQ `_ diff --git a/docs/man-openmpi/man1/ompi_info.1.rst b/docs/man-openmpi/man1/ompi_info.1.rst index a5188b9ccc0..21313d7bc3b 100644 --- a/docs/man-openmpi/man1/ompi_info.1.rst +++ b/docs/man-openmpi/man1/ompi_info.1.rst @@ -163,65 +163,65 @@ more MCA parameters, use the ``--level`` command line option. EXAMPLES -------- +Show the default output of options and listing of installed +components in a human-readable / prettyprint format: + .. code-block:: ompi_info -Show the default output of options and listing of installed -components in a human-readable / prettyprint format. +Show the default output of options and listing of installed components +in a machine-parsable format: .. code-block:: ompi_info --parsable -Show the default output of options and listing of installed components -in a machine-parsable format. +Show the level 1 MCA parameters of the "tcp" BTL component in a +human-readable / prettyprint format: .. code-block:: ompi_info --param btl tcp -Show the level 1 MCA parameters of the "tcp" BTL component in a -human-readable / prettyprint format. +Show the level 1 through level 6 MCA parameters of the "tcp" BTL +component in a human-readable / prettyprint format: .. code-block:: ompi_info --param btl tcp --level 6 -Show the level 1 through level 6 MCA parameters of the "tcp" BTL -component in a human-readable / prettyprint format. +Show the level 1 MCA parameters of the "tcp" BTL component in a +machine-parsable format: .. code-block:: ompi_info --param btl tcp --parsable -Show the level 1 MCA parameters of the "tcp" BTL component in a -machine-parsable format. +Show the level 1 through level 3 MCA parameters of string type in a +human-readable / prettyprint format: .. code-block:: ompi_info --type string --pretty-print --level 3 -Show the level 3 MCA parameters of string type in a human-readable / -prettyprint format. +Show the "bindir" that Open MPI was configured with: .. code-block:: ompi_info --path bindir -Show the "bindir" that Open MPI was configured with. +Show the version of Open MPI version numbers in a prettyprint format: .. code-block:: ompi_info --version -Show the version of Open MPI version numbers in a prettyprint format. +Show *all* information about the Open MPI installation, including all +components that can be found, all the MCA parameters that they support +(i.e., levels 1 through 9), versions of Open MPI and the components, +etc.: .. code-block:: ompi_info --all - -Show *all* information about the Open MPI installation, including all -components that can be found, all the MCA parameters that they support -(i.e., levels 1 through 9), versions of Open MPI and the components, -etc. diff --git a/docs/running-apps/index.rst b/docs/running-apps/index.rst index dca8473b54b..e9343a3cf4d 100644 --- a/docs/running-apps/index.rst +++ b/docs/running-apps/index.rst @@ -19,6 +19,7 @@ but they can generally be broken down into two categories: quickstart pmix-and-prrte + tuning localhost ssh diff --git a/docs/running-apps/pmix-and-prrte.rst b/docs/running-apps/pmix-and-prrte.rst index 31329815001..3d1ce923a05 100644 --- a/docs/running-apps/pmix-and-prrte.rst +++ b/docs/running-apps/pmix-and-prrte.rst @@ -1,3 +1,5 @@ +.. _label-running-role-of-pmix-and-prte: + The role of PMIx and PRRTE ========================== diff --git a/docs/running-apps/tuning.rst b/docs/running-apps/tuning.rst new file mode 100644 index 00000000000..88d2a782f90 --- /dev/null +++ b/docs/running-apps/tuning.rst @@ -0,0 +1,447 @@ +.. _label-run-time-tuning: + +Run-time tuning +=============== + +Open MPI is a highly-customizable system; it can be configured via +configuration files, command line parameters, and environment +variables. + +The main functionality of Open MPI's configuration system is through +the Modular Component Architecture (MCA). + +.. note:: :ref:`The PMIx and PRRTE software packages + ` also use the MCA for + their configuration, composition, and run-time tuning. + +///////////////////////////////////////////////////////////////////////// + +The Modular Component Architecture (MCA) +---------------------------------------- + +The Modular Component Architecture (MCA) is the backbone for much of +Open MPI's functionality. It is a series of *projects*, *frameworks*, +*components*, and *modules* that are assembled at run-time to create +an MPI implementation. + +MCA *parameters* (also known as MCA *variables*) are used to customize +Open MPI's behavior at run-time. + +Each of these entities are described below. + +Projects +^^^^^^^^ + +A *project* is essentially the highest abstraction layer division in +the Open MPI code base. + +.. note:: The word "project" is unfortunately overloaded. It can be + used to mean the code/resources/people in the greater Open + MPI community associated with the development of a + particular software package, but it can also be used to mean + a major, top-level section of code within the Open MPI code + base. + + For the purposes of this documentation, "project" means the + latter: a major, top-level section of code within the Open + MPI code base. + +The following *projects* exist in Open MPI |ompi_ver|: + +* **Open Portability Access Layer (OPAL):** Low-level, operating + system and architecture portability code. +* **Open MPI (OMPI):** The MPI API and supporting infrastructure. +* **OpenSHMEM (OSHMEM):** The OpenSHMEM API and supporting + infrastructure. + +.. note:: Prior versions of Open MPI also included an Open MPI + Runtime Environment (ORTE) project. ORTE essentially + evolved into the standalone `PMIx Runtime Reference + Environment (PRRTE) `_ + and is now considered a 3rd-party dependency of Open MPI + |mdash| not one of its included projects. + + See :ref:`the role of PMIx and PRRTE + ` for more information. + +Frameworks +^^^^^^^^^^ + +An MCA framework manages zero or more components at run-time and is +targeted at a specific task (e.g., providing MPI collective operation +functionality). Although each MCA framework supports only a single +type of component, it may support multiple components of that type. + +Some of the more common frameworks that users may want or need to +customize include the following: + +* ``btl``: Byte Transport Layer; these components are exclusively used + as the underlying transports for the ``ob1`` PML component. +* ``coll``: MPI collective algorithms +* ``io``: MPI I/O +* ``mtl``: MPI Matching Transport Layer (MTL); these components are + exclusively used as the underlying transports for the ``cm`` PML + component. +* ``pml``: Point-to-point Messaging Layer (PML). These components are + used to implement MPI point-to-point messaging functionality. + +There are many frameworks within Open MPI; the exact set varies +between different versions of Open MPI. You can use the +:ref:`ompi_info(1) ` command to see the full list of +frameworks that are included in Open MPI |ompi_ver|. + +Components +^^^^^^^^^^ + +An MCA component is an implementation of a framework's formal +interface. It is a standalone collection of code that can be bundled +into a plugin that can be inserted into the Open MPI code base, either +at run-time and/or compile-time. + +.. note:: Good synonyms for Open MPI's "component" concept are + "plugin", or "add-on". + +The exact set of components varies between different versions of Open +MPI. Open MPI's code base includes support for many components, but +not all of them may be present or available on your system. You can +use the :ref:`ompi_info(1) ` command to see what +components are included in Open MPI |ompi_ver| on your system. + +Modules +^^^^^^^ + +An MCA module is an instance of a component (in the C++ sense of the +word "instance"; an MCA component is analogous to a C++ class). For +example, if a node running an Open MPI application has two Ethernet +NICs, the Open MPI application will contain one TCP MPI point-to-point +*component*, but two TCP point-to-point *modules*. + +Parameters (variables) +^^^^^^^^^^^^^^^^^^^^^^ + +MCA *parameters* (sometimes called MCA *variables*) are the basic unit +of run-time tuning for Open MPI. They are simple "key = value" pairs +that are used extensively throughout Open MPI. The general rules of +thumb that the developers use are: + +#. Instead of using a constant for an important value, make it an MCA + parameter. +#. If a task can be implemented in multiple, user-discernible ways, + implement as many as possible, and use an an MCA parameter to + choose between them at run-time. + +For example, an easy MCA parameter to describe is the boundary between +short and long messages in TCP wire-line transmissions. "Short" +messages are sent eagerly whereas "long" messages use a rendezvous +protocol. The decision point between these two protocols is the +overall size of the message (in bytes). By making this value an MCA +parameter, it can be changed at run-time by the user or system +administrator to use a sensible value for a particular environment or +set of hardware (e.g., a value suitable for 1Gpbs Ethernet is probably +not suitable for 100 Gigabit Ethernet, and may require even a third +different value for 25 Gigabit Ethernet). + +///////////////////////////////////////////////////////////////////////// + +.. _label-running-setting-mca-param-values: + +Setting MCA parameter values +---------------------------- + +MCA parameters may be set in several different ways. + +.. admonition:: Rationale + :class: tip + + Having multiple methods to set MCA parameters allows, for example, + system administrators to fine-tune the Open MPI installation for + their hardware / environment such that normal users can simply use + the default values (that were set by the system administrators). + + HPC environments |mdash| and the applications that run on them + |mdash| tend to be unique. Providing extensive run-time tuning + capabilities through MCA parameters allows the customization of + Open MPI to each system's / user's / application's particular + needs. + +The following are the different methods to set MCA parameters, listed +in priority order: + +#. Command line parameters +#. Environment variables +#. Tuning MCA parameter files +#. Configuration files + +Command line parameters +^^^^^^^^^^^^^^^^^^^^^^^ + +The highest-precedence method is setting MCA parameters on the command +line. For example: + +.. code-block:: sh + + shell$ mpirun --mca mpi_show_handle_leaks 1 -np 4 a.out + +This sets the MCA parameter ``mpi_show_handle_leaks`` to the value of +1 before running ``a.out`` with four processes. In general, the +format used on the command line is ``--mca ``. + +.. note:: When setting a value that includes spaces, you need to use + quotes to ensure that the shell understands that the + multiple tokens are a single value. For example: + + .. code-block:: sh + + shell$ mpirun --mca param "value with multiple words" ... + +Environment variables +^^^^^^^^^^^^^^^^^^^^^ + +Next, environment variables are searched. Any environment variable +named ``OMPI_MCA_`` will be used. For example, the +following has the same effect as the previous example (for sh-flavored +shells): + +.. code-block:: sh + + shell$ export OMPI_MCA_mpi_show_handle_leaks=1 + shell$ mpirun -np 4 a.out + +.. note:: Just like with command line values, setting environment + variables to values with multiple words requires shell + quoting, such as: + + .. code-block:: sh + + shell$ export OMPI_MCA_param="value with multiple words" + +Tuning MCA parameter files +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. error:: TODO This entire entry needs to be checked for correctness. + +Simple text files can be used to set MCA parameter values for a +specific application. + +The ``mpirun --tune`` CLI option allows users to specify both MCA +parameters and environment variables from within a single file. + +MCA parameters set in tuned parameter files will override any MCA +parameters supplied in global parameter files (e.g., +``$HOME/.openmpi/mca-params.conf``), but not command line or +environment parameters. + +Consider a tuned parameter file name ``foo.conf`` that is placed in +the same directory as the application ``a.out``. A user will typically +run the application as: + +.. code-block:: sh + + shell$ mpirun -np 2 a.out + +To use the ``foo.conf`` tuned parameter file, this command line +changes to: + +.. code-block:: sh + + shell$ mpirun -np 2 --tune foo.conf a.out + +Tuned parameter files can be coupled if more than one file is to be +used. If there is another tuned parameter file called ``bar.conf``, it +can be added to the command line as follows: + +.. code-block:: sh + + shell$ mpirun -np 2 --tune foo.conf,bar.conf a.out + +The contents of tuned files consist of one or more lines, each of +which contain zero or more `-x` and `--mca` options. Comments are not +allowed. For example, the following tuned file: + +.. code-block:: + + -x envvar1=value1 -mca param1 value1 -x envvar2 + -mca param2 value2 + -x envvar3 + +is equivalent to: + +.. code-block:: sh + + shell$ mpirun \ + -x envvar1=value1 -mca param1 value1 -x envvar2 \ + -mca param2 value2 + -x envvar3 \ + ...rest of mpirun command line... + +Although the typical use case for tuned parameter files is to be +specified on the command line, they can also be set as MCA parameters +in the environment. The MCA parameter ``mca_base_envvar_file_prefix`` +contains a comma-delimited list of tuned parameter files exactly as +they would be passed to the ``--tune`` command line option. The MCA +parameter ``mca_base_envvar_file_path`` specifies the path to search +for tuned files with relative paths. + +.. error:: TODO Check that these MCA var names ^^ are correct. + +Configuration files +^^^^^^^^^^^^^^^^^^^ + +Finally, simple configuration text files can be used to set MCA +parameter values. Parameters are set one per line (comments are +permitted). For example: + +.. code-block:: ini + + # This is a comment + # Set the same MCA parameter as in previous examples + mpi_show_handle_leaks = 1 + +Note that quotes are *not* necessary for setting multi-word values +in MCA parameter files. Indeed, if you use quotes in the MCA +parameter file, they will be used as part of the value itself. For +example: + +.. code-block:: ini + + # The following two values are different: + param1 = value with multiple words + param2 = "value with multiple words" + +By default, two files are searched (in order): + +#. ``$HOME/.openmpi/mca-params.conf``: The user-supplied set of + values takes the highest precedence. +#. ``$prefix/etc/openmpi-mca-params.conf``: The system-supplied set + of values has a lower precedence. + +More specifically, the MCA parameter ``mca_param_files`` specifies a +colon-delimited path of files to search for MCA parameters. Files to +the left have lower precedence; files to the right are higher +precedence. + +.. note:: Keep in mind that, just like components, these parameter + files are *only* relevant where they are "visible" + (:ref:`see this FAQ entry + `). Specifically, + Open MPI does not read all the values from these files + during startup and then send them to all nodes in the job. + Instead, the files are read on each node during each + process' startup. + + *This is intended behavior:* it allows for per-node + customization, which is especially relevant in heterogeneous + environments. + +///////////////////////////////////////////////////////////////////////// + +.. _label-running-selecting-framework-components: + +Selecting which Open MPI components are used at run time +-------------------------------------------------------- + +Each MCA framework has a top-level MCA parameter that helps guide +which components are selected to be used at run-time. Specifically, +every framework has an MCA parameter of the same name that can be used +to *include* or *exclude* components from a given run. + +For example, the ``btl`` MCA parameter is used to control which BTL +components are used. It takes a comma-delimited list of component +names, and may be optionally prefixed with ``^``. For example: + +.. note:: The Byte Transfer Layer (BTL) framework is used as the + underlying network transports with the `ob1` Point-to-point + Messaging Layer (PML) component. + +.. code-block:: sh + + # Tell Open MPI to include *only* the BTL components listed here and + # implicitly ignore all the rest: + shell$ mpirun --mca btl self,sm,usnic ... + + # Tell Open MPI to exclude the tcp and uct BTL components + # and implicitly include all the rest + shell$ mpirun --mca btl ^tcp,uct ... + +Note that ``^`` can *only* be the prefix of the *entire* +comma-delimited list because the inclusive and exclusive behavior are +mutually exclusive. Specifically, since the exclusive behavior means +"use all components *except* these", it does not make sense to mix it +with the inclusive behavior of not specifying it (i.e., "use all of +these components"). Hence, something like this: + +.. code-block:: sh + + shell$ mpirun --mca btl self,sm,usnic,^tcp ... + +does not make sense |mdash| and will cause an error |mdash| because it +says "use only the ``self``, ``sm``, and ``usnic`` components" but +also "use all components except ``tcp``". These two statements +clearly contradict each other. + +///////////////////////////////////////////////////////////////////////// + +Common MCA parameters +--------------------- + +Open MPI has a *large* number of MCA parameters available. Users can +use the :ref:`ompi_info(1) ` command to see *all* +available MCA parameters. + +The vast majority of these MCA parameters, however, are not useful to +most users. Indeed, there only are a handful of MCA parameters that +are commonly used by end users. :ref:`As described in the +ompi_info(1) man page `, MCA parameters are +grouped into nine levels, corresponding to the MPI standard's tool +support verbosity levels. In general: + +* Levels 1-3 are intended for the end user. + + * These parameters are generally used to effect whether an Open MPI + job will be able to run correctly. + + .. tip:: Parameters in levels 1-3 are probably applicable to + most end users. + +* Levels 4-6 are intended for the application tuner. + + * These parameters are generally used to tune the performance of an + Open MPI job. + +* Levels 7-9 are intended for the MPI implementer. + + * These parameters are esoteric and really only intended for those + who work deep within the implementation of Open MPI code base + itself. + +Although the full list of MCA parameters can be found in the output of +``ompi_info(1)``, the following list of commonly-used parameters is +presented here so that they can easily be found via internet searches: + +* Individual framework names are used as MCA parameters to + :ref:`select which components will be used + `. For example, the + ``btl`` MCA parameter is used to select which components will be + used from the ``btl`` framework. The ``coll`` MCA parameter is used + to select which ``coll`` components are used. And so on. + +* Individual framework names with the ``_base_verbose`` suffix + appended (e.g., ``btl_base_verbose``, ``coll_base_verbose``, etc.) + can be used to set the general verbosity level of all the components + in that framework. + + * This can be helpful when troubleshooting why certain components + are or are not being selected at run time. + +* Many network-related components support "include" and "exclude" + types of components (e.g., ``btl_tcp_if_include`` and + ``btl_tcp_if_exclude``). The "include" parameters specify an + explicit set of network interfaces to use; the "exclude" parameters + specify an explicit set of network interfaces to ignore. Check the + output from :ref:`ompi_info(1)'s ` full list to see + if the network-related component you are using has "include" and + "exclude" network interface parameters. + + .. important:: You can only use the "include" *or* the "exclude" + parameter |mdash| they are mutually exclusive from each + other.