Skip to content

Conversation

@hppritcha
Copy link
Member

@hppritcha hppritcha commented May 27, 2025

Implement the MPI-5 c ABI functionality.

Two external MPI libraries are now created: libmpi.so and libmpi_abi.so. Backend code that was originally in libmpi.la has been extracted into a libopen-mpi.la to be linked in to both libraries.

Almost all of the Open MPI C interface is now being generated from templates processed by the binding infrastructure.
The binding infrastructure introduced to support big count has been extended to generate code to support the MPI ABI specification. The bindings infrastructure is also used to generate the MPI ABI compliant mpi.h header file, as well as the converter functions to translate between OMPI and MPI ABI constants.

Before attempting to review this PR it would probably be good to become familiar with the MPI-5 ABI chapter and also to peruse the README ((ompi/mpi/README_ABI.md) included in this PR.

A few todos left:

  • add a bit of plumbing to assist with cleanup of resources used by the attributes callback function wrappers
  • write up docs to explain ABI support both for users and for developers
  • add man pages
  • add fortran entry points for MPI_Abi functions
  • open separate PR to deal with the libnbc release_arrays problem
  • Update python framework to use refactored libnbc release arrays code (PR nbc: rework the way to add arrays to clean up #13529)

todos for follow-up PRs:

  • optimize some converters, in particular MPI ABI datatype -> OMPI ABI datatype and reverse
  • integrate the self standing mpi.h/abi.h generator into the python framework
  • refactor/simplify python framework in particular c_type.py
  • (MPI Forum repos) fix up standard latex with annotations for not-in-abi properties, types of some constants, update pympistandard to address some issues found in that package.
  • PR to refactor how FORTRAN datatypes are handled in configury procedure. Changes to comply with section 20.4 of the MPI-5 standard concerning these datatypes is not included in this PR.
  • PR to actually use FORTRAN datatype info if provided by an application via MPI_Abi_set_fortran_info and MPI_Abi_set_fortran_booleans.

This PR supercedes #12033

@dalcinl
Copy link
Contributor

dalcinl commented Jul 11, 2025

Maybe you should somehow vendor the mpi.h header from https://github.com/mpi-forum/mpi-abi-stubs and use it as the baseline to extract values for handles and constants? That's what MPICH is doing.
Alternatively, the mpi.h header the mpi-abi-stubs repo could be downloaded on the fly by some Python script and the values in that header used to update stuff to be committed in the ompi repo.

In short, I think it would be in everyone's convenience to use https://github.com/mpi-forum/mpi-abi-stubs as the "source of truth" for ABI-related stuff, avoiding manual synchronization of handle/constant values.

@jsquyres
Copy link
Member

I wonder if we should make the bot not complain about unsigned commits on draft PRs. That would reduce some of the noise on PR's like this.

hppritcha added a commit to hppritcha/ompi that referenced this pull request Aug 25, 2025
Turns out that in commit 6bd36a7 we had a function that is not part of the MPI standard.
This showed while working on ABI support - which requires us to pay attention to the truth
rather than make stuff up.

This commit removes our made up MPI_Session_set_info method.
Turns out who ever was doing the fortran bindings knew this wasn't a method in the standard
so there's no need to change the fortran bindings.  Same thing applies to the man pages.

Related to open-mpi#13280

Signed-off-by: Howard Pritchard <[email protected]>
@dalcinl
Copy link
Contributor

dalcinl commented Aug 27, 2025

@hppritcha There is some issue with out-of-source builds, i.e

git clone ... ompi-main
...
mkdir -p ompi-BUILD/main
cd ompi-BUILD/main
../../ompi-main/configure ...
make install
Making install in mpi/c
make[2]: Entering directory '/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c'
mkdir -p standard_abi
  GENERATE abi.h
  GENERATE standard_abi/mpi.h
Traceback (most recent call last):
  File "/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c/../../../../../ompi-main/ompi/mpi/bindings/c_header.py", line 262, in <module>
    with open(OUTPUT, 'tw') as header_out:
         ~~~~^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '../../../../../ompi-main/ompi/mpi/c/standard_abi/mpi.h'
make[2]: *** [Makefile:23335: standard_abi/mpi.h] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c'
make[1]: *** [Makefile:2785: install-recursive] Error 1
make[1]: Leaving directory '/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi'
make: *** [Makefile:1526: install-recursive] Error 1

@hppritcha
Copy link
Member Author

interesting distcheck didn't check this.

@hppritcha
Copy link
Member Author

jenkins ci runs make distcheck

@dalcinl
Copy link
Contributor

dalcinl commented Aug 28, 2025

The ABI mpi.h header is missing some MPI_T_XXX types. Also, the Status f08/c converters are declared, and they should not.

I did the following manual edits to the installed mpi.h header:

diff -up ./mpi.h.orig ./mpi.h
--- ./mpi.h.orig	2025-08-28 18:55:49.842968779 +0300
+++ ./mpi.h	2025-08-28 19:03:07.192957305 +0300
@@ -490,6 +490,13 @@ enum {
 /* C preprocessor constants and Fortran parameters */
 /* $CATEGORY:C_PREPROCESSOR_CONSTANTS_FORTRAN_PARAMETERS$ */
 
+typedef struct MPI_T_enum_t* MPI_T_enum;
+typedef struct MPI_T_cvar_handle_t* MPI_T_cvar_handle;
+typedef struct MPI_T_pvar_handle_t* MPI_T_pvar_handle;
+typedef struct MPI_T_pvar_session_t* MPI_T_pvar_session;
+typedef struct MPI_T_event_registration_t* MPI_T_event_registration;
+typedef struct MPI_T_event_instance_t* MPI_T_event_instance;
+
 /* Handles used in the MPI tool information interface */
 #define MPI_T_ENUM_NULL                       ((MPI_T_enum) 0)
 #define MPI_T_CVAR_HANDLE_NULL                ((MPI_T_cvar_handle) 0)
@@ -558,20 +565,20 @@ enum {
 };
 
 /* Source event ordering guarantees in the MPI tool information interface */
-enum {
+typedef enum MPI_T_source_order {
     MPI_T_SOURCE_ORDERED                      = 1,
     MPI_T_SOURCE_UNORDERED                    = 2,
-};
+} MPI_T_source_order;
 
 /*
  * Callback safety requirement levels used in the MPI tool information interface
  */
-enum {
+typedef enum MPI_T_cb_safety {
     MPI_T_CB_REQUIRE_NONE                     = 0x00,
     MPI_T_CB_REQUIRE_MPI_RESTRICTED           = 0x03,
     MPI_T_CB_REQUIRE_THREAD_SAFE              = 0x0F,
     MPI_T_CB_REQUIRE_ASYNC_SIGNAL_SAFE        = 0x3F,
-};
+} MPI_T_cb_safety;
 
 
 /* Callback functions */
@@ -1107,13 +1114,13 @@ int MPI_Ssend_init(const void* buf, int
 int MPI_Ssend_init_c(const void* buf, MPI_Count count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request* request);
 int MPI_Start(MPI_Request* request);
 int MPI_Startall(int count, MPI_Request array_of_requests[]);
-/* int MPI_Status_c2f(const MPI_Status* c_status, MPI_Fint* f_status);
- */int MPI_Status_c2f08(const MPI_Status* c_status, MPI_F08_status* f08_status);
-int MPI_Status_f082c(const MPI_F08_status* f08_status, MPI_Status* c_status);
-/* int MPI_Status_f082f(const MPI_F08_status* f08_status, MPI_Fint* f_status);
- *//* int MPI_Status_f2c(const MPI_Fint* f_status, MPI_Status* c_status);
- *//* int MPI_Status_f2f08(const MPI_Fint* f_status, MPI_F08_status* f08_status);
- */int MPI_Status_get_error(const MPI_Status* status, int* err);
+// /* int MPI_Status_c2f(const MPI_Status* c_status, MPI_Fint* f_status);
+//  */int MPI_Status_c2f08(const MPI_Status* c_status, MPI_F08_status* f08_status);
+// int MPI_Status_f082c(const MPI_F08_status* f08_status, MPI_Status* c_status);
+// /* int MPI_Status_f082f(const MPI_F08_status* f08_status, MPI_Fint* f_status);
+//  *//* int MPI_Status_f2c(const MPI_Fint* f_status, MPI_Status* c_status);
+//  *//* int MPI_Status_f2f08(const MPI_Fint* f_status, MPI_F08_status* f08_status);
+//  */int MPI_Status_get_error(const MPI_Status* status, int* err);
 int MPI_Status_get_source(const MPI_Status* status, int* source);
 int MPI_Status_get_tag(const MPI_Status* status, int* tag);
 int MPI_Status_set_cancelled(MPI_Status* status, int flag);
@@ -1799,13 +1806,13 @@ int PMPI_Ssend_init(const void* buf, int
 int PMPI_Ssend_init_c(const void* buf, MPI_Count count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request* request);
 int PMPI_Start(MPI_Request* request);
 int PMPI_Startall(int count, MPI_Request array_of_requests[]);
-/* int PMPI_Status_c2f(const MPI_Status* c_status, MPI_Fint* f_status);
- */int PMPI_Status_c2f08(const MPI_Status* c_status, MPI_F08_status* f08_status);
-int PMPI_Status_f082c(const MPI_F08_status* f08_status, MPI_Status* c_status);
-/* int PMPI_Status_f082f(const MPI_F08_status* f08_status, MPI_Fint* f_status);
- *//* int PMPI_Status_f2c(const MPI_Fint* f_status, MPI_Status* c_status);
- *//* int PMPI_Status_f2f08(const MPI_Fint* f_status, MPI_F08_status* f08_status);
- */int PMPI_Status_get_error(const MPI_Status* status, int* err);
+// /* int PMPI_Status_c2f(const MPI_Status* c_status, MPI_Fint* f_status);
+//  */int PMPI_Status_c2f08(const MPI_Status* c_status, MPI_F08_status* f08_status);
+// int PMPI_Status_f082c(const MPI_F08_status* f08_status, MPI_Status* c_status);
+// /* int PMPI_Status_f082f(const MPI_F08_status* f08_status, MPI_Fint* f_status);
+//  *//* int PMPI_Status_f2c(const MPI_Fint* f_status, MPI_Status* c_status);
+//  *//* int PMPI_Status_f2f08(const MPI_Fint* f_status, MPI_F08_status* f08_status);
+//  */int PMPI_Status_get_error(const MPI_Status* status, int* err);
 int PMPI_Status_get_source(const MPI_Status* status, int* source);
 int PMPI_Status_get_tag(const MPI_Status* status, int* tag);
 int PMPI_Status_set_cancelled(MPI_Status* status, int flag);

I'm able to compile (using mpicc_abi) the pure-C demo/helloworld.c file from mpi4py sources using gcc-15.
Note however that the executable fails to link:

$ mpicc_abi helloworld.c
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_mpi_info_null'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_mpi_info_env'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_isendrecv'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get_nthkey'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_memkind_copy_or_set'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_set'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_delete'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_sendrecv'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get_bool'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_free'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_memkind_cb'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_memkind_process'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_dup'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_mpiinfo_init'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get_nkeys'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_allocate'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_mpiinfo_init_env'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get_valuelen'
collect2: error: ld returned 1 exit status

@dalcinl
Copy link
Contributor

dalcinl commented Sep 2, 2025

@hppritcha Actually, take a look at mpi-forum/mpi-abi-stubs#63

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

The installed mpi.h header is still broken, I had to fix it as per the patch in my previous comment.
Afterwards, my basic helloworld program compiles and links, but it fails to run:

$ mpicc_abi helloworld.c
$ mpiexec -n 1 ./a.out 
[optiplex:00000] *** An error occurred in MPI_Init_thread
[optiplex:00000] *** reported by process [3633774593,0]
[optiplex:00000] *** on a NULL communicator
[optiplex:00000] *** MPI_ERR_ARG: invalid argument of some other kind
[optiplex:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[optiplex:00000] ***    and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun has exited due to process rank 0 with PID 0 on node optiplex calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

@hppritcha The following definitions in the generated mpi.h file are wrong, they should be like ((MPI_Comm_copy_attr_function*) 0), that is, add the missing * to get a function POINTER type.
https://github.com/mpi-forum/mpi-abi-stubs/blob/main/mpi.h#L470

/* Predefined functions */
#define MPI_COMM_NULL_COPY_FN                 ((MPI_Comm_copy_attr_function) 0)
#define MPI_COMM_DUP_FN                       ((MPI_Comm_copy_attr_function) 1)
#define MPI_COMM_NULL_DELETE_FN               ((MPI_Comm_delete_attr_function) 0)
#define MPI_WIN_NULL_COPY_FN                  ((MPI_Win_copy_attr_function) 0)
#define MPI_WIN_DUP_FN                        ((MPI_Win_copy_attr_function) 1)
#define MPI_WIN_NULL_DELETE_FN                ((MPI_Win_delete_attr_function) 0)
#define MPI_TYPE_NULL_COPY_FN                 ((MPI_Type_copy_attr_function) 0)
#define MPI_TYPE_DUP_FN                       ((MPI_Type_copy_attr_function) 1)
#define MPI_TYPE_NULL_DELETE_FN               ((MPI_Type_delete_attr_function) 0)
#define MPI_CONVERSION_FN_NULL                ((MPI_Datarep_conversion_function) 0)
#define MPI_CONVERSION_FN_NULL_C              ((MPI_Datarep_conversion_function_c) 0)

/* Deprecated predefined functions */
#define MPI_NULL_COPY_FN                      ((MPI_Copy_function) 0)
#define MPI_DUP_FN                            ((MPI_Copy_function) 1)
#define MPI_NULL_DELETE_FN                    ((MPI_Delete_function) 0)

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

PS: You may have a similar problems in some MPI_T_XXX routines having arguments of type MPI_T_xxx_function, I believe you should add the * .


__opal_attribute_always_inline__ static inline int ompi_convert_abi_ts_level_intern_ts_level(int ts_level)
{
if (MPI_THREAD_SINGLE_ABI_INTERNAL == ts_level) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a switch statement be better? It may ultimately be a matter of taste (optimizing compilers may generate similar binary code). Just a mild observation.

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

@hppritcha What's your plan for stuff introduced in MPI 4.1 and MPI 5.0? The generated mpi.h header says MPI_VERSION=5, however the MPI library libmpi_abi.so.0 seems to miss a bunch of stuff from MPI 4.1 and 5.0.

This is a list of what I managed to detect as missing from mpi4py's configuration machinery for missing MPI stuff.
Note there are a few that are quite old, like MPI_Aint_add/diff, MPI_Wtick, and MPI_Pcontrol.

MPI_Aint_add
MPI_Aint_diff
MPI_Type_get_value_index
MPI_Type_get_envelope
MPI_Type_get_contents
MPI_Type_get_envelope_c
MPI_Type_get_contents_c
MPI_Buffer_flush
MPI_Buffer_iflush
MPI_Comm_attach_buffer
MPI_Comm_detach_buffer
MPI_Comm_flush_buffer
MPI_Comm_iflush_buffer
MPI_Session_attach_buffer
MPI_Session_detach_buffer
MPI_Session_flush_buffer
MPI_Session_iflush_buffer
MPI_Comm_errhandler_fn
MPI_Comm_attach_buffer_c
MPI_Comm_detach_buffer_c
MPI_Session_attach_buffer_c
MPI_Session_detach_buffer_c
MPI_Remove_error_class
MPI_Remove_error_code
MPI_Remove_error_string
MPI_Abi_get_fortran_info
MPI_Get_hw_resource_info
MPI_Wtick
MPI_Pcontrol
MPI_Comm_toint
MPI_Errhandler_toint
MPI_File_toint
MPI_Group_toint
MPI_Info_toint
MPI_Message_toint
MPI_Op_toint
MPI_Request_toint
MPI_Session_toint
MPI_Type_toint
MPI_Win_toint
MPI_Comm_fromint
MPI_Errhandler_fromint
MPI_File_fromint
MPI_Group_fromint
MPI_Info_fromint
MPI_Message_fromint
MPI_Op_fromint
MPI_Request_fromint
MPI_Session_fromint
MPI_Type_fromint
MPI_Win_fromint

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

Current status regarding mpi4py:

  • I'm able to build mpi4py using export MPICC=mpicc_abi.
  • However, due to missing symbols, I'm not able to import the Python module.
  • I can switch via LD_LIBRARY_PATH to MPICH's libmpi_abi.so, but there are a few issues related to datatype handles and a few failing tests involving MPI callbacks (errhandlers and attributes). I need to investigate further.

@hppritcha
Copy link
Member Author

Thanks for checking this stuff out in its early state @dalcinl. some of the above functions have been implemented but are sitting in various states in PRs. some of these should be defined - like MPI_Wtick and the aint related functions so i'll see what's going on. As for the fromint/toint that seems like an excellent project for a AI LLM. I'll see what can be done there.

We still need to add some plumbing in the ompi internals for callbacks. that will come in as part of this PR.

The NERSC folks would really like ABI working for Doudna so to the extent there's a "plan" it would be nice to get this working sooner than later.

@dalcinl
Copy link
Contributor

dalcinl commented Sep 4, 2025

Minor nit: Do you really need/want to install the abi.h header? IMHO, it is an implementation detail.

@dalcinl
Copy link
Contributor

dalcinl commented Sep 4, 2025

some of these should be defined - like MPI_Wtick and the aint related functions so i'll see what's going on.

  • MPI_Wtick: In ompi/mpi/c/, looks like you have to rename wtick.c -> wtick.c.in and update prototipes as needed to allow the code generation layer do its magic.
  • MPI_Pcontrol: Same thing as above.
  • MPI_Aint_add/diff: The OMPI ABI implements them as macros. There are no corresponding *.c.in files. Not sure what's the proper fix. Anyway, maybe the OMPI libmpi.so should export the Aint_add/diff routines anyway?

Comment on lines +16 to +17
#define MPI_ABI_VERSION 1
#define MPI_ABI_SUBVERSION 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set these to -1 if !OMPI_ABI_SRC? Or is this file only used in ABI scenarios?

If this file is only used in ABI scenarios, do we define MPI_ABI_[SUB]VERSION to -1 in some other file that is used in non-ABI scenarios? (I didn't find one, but I could have missed it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these should not be defined in the non-ABI header.

This is what the MPI standard says, though the wording is not great IMO.

The ABI version macros MPI_ABI_VERSION and MPI_ABI_SUBVERSION are present in
the MPI header and modules so that applications can check for consistency between the
compilation environment and the properties of the implementation at runtime.

# define MPI_ABI_VERSION 1
# define MPI_ABI_SUBVERSION 0

The idea is that users can do conditional compilation

#if defined(MPI_ABI_VERSION)
// code that assumes being compiled under the MPI ABI
#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But MPI-5.0 p844:23-24 says:

MPI_ABI_GET_VERSION produces the standard ABI version, if supported. Other-
wise, the values of the major and minor version are set to−1.

Shouldn't the same be true for the compile-time constants MPI_[SUB]VERSION?

I.e., your example conditional compilation should be:

#if MPI_ABI_VERSION > 0
// code that assumes being compiled under the MPI ABI
#endif

Copy link
Member

@jsquyres jsquyres Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe a more complete conditional compilation could be:

#if defined(MPI_ABI_VERSION) && MPI_ABI_VERSION > 0
// code that assumes being compiled under the MPI ABI
#endif

Or:

// If MPI_VERSION is greater than 5, then MPI_ABI_VERSION must exist
#if MPI_VERSION >= 5 && MPI_ABI_VERSION > 0
// code that assumes being compiled under the MPI ABI
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the same be true for the compile-time constants MPI_[SUB]VERSION?

Look, I think this was not the original intention. Nonetheless, you have a point, I'm not really opposed to your proposal. The only caveat is that the current MPI standard may not be absolutely clear about it, and most likely we need an errata entry to clarify this matter (whatever the final decision is).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffhammond @hzhou @jsquyres Can we reach an agreement in advance about whether the MPI_ABI_VERSION/SUBVERSION macros should or should not go to the non-ABI mpi.h header?

IIRC, I originally advocated for these macros to NOT be defined in the non-ABI header, but there is API symmetry arguments to have them always defined (with {1,0} or {-1/-1} values for ABI and non-ABI respectively), so maybe we should consider adding them for the non-ABI header. The fix in MPICH and this PR is quite trivial and mostly inconsequential.

Once we reach an agreement between implementations, we can think of publishing an errata to the MPI standard document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abi.h.in is only used by the code in ompi/mpi/c and ompi/mpi/tool and is also used to generate ompi/mpi/c/standard_abi/mpi.h.

But i could see the use of defining them in the OMPI mpi.h.in as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay i don't think we should add this to the OMPI mpi.h till we discuss this in the forum.

See mpi-forum/mpi-issues#1098

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay discussion is happening. rather than fill this PR with mpi forum type comments please chime in on the forum issue cited above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i answered on mpi-forum/mpi-issues#1098 so i won't repost them here

@dalcinl
Copy link
Contributor

dalcinl commented Nov 27, 2025

Perhaps we should leave it as "we agree to disagree".

I would, except that you keep commenting and throwing fear and uncertainty on what we are trying to accomplish.

We feel we are not abusing the libtool version rules

Your claim is hard to disproof. You talk in circles, and never specifically state what exact rule are you following to update libtool's C/R/A tuple. At this point I have to suspect you do that to hide the fact that you are indeed breaking the rules.

we would say that you are failing to follow the libtool objective by not properly respecting an ABI's nature.

You keep speaking in abstract terms, not specifying clearly how and what libtool objective I'm not following, nor what ABI's nature mean.

I already stated many times: backward-compatible additions to the MPI API/ABI must bump both the current and age numbers of the c/r/a libtool tuple, and therefore shared library SONAME must not change. If this statement is incorrect, please indicate clearly and precisely how it is incorrect with proper references to published standards or documentation.

🤷‍♂️ Probably makes no difference as long as users know what was done and what they are getting into.

I cannot confirm that, because you have not yet stated clearly what's the alternative.

Again: what are you proposing exactly? Are you arguing that every time the MPI standard add new features (handle types/values, functions) without making backward incompatible changes (like removing functions) then the MPI shared library should bump the SONAME from libmpi_abi.so.{N} to libmpi_abi.so.{N+1} ?

@hppritcha
Copy link
Member Author

@dalcinl

Okay I used the mpi4py infrastructure

@hppritcha Success! The mpi4py build is ABI-compatible with MPICH's and OMPI's libmpi_abi.so.0: https://github.com/mpi4py/mpi4py-testing/actions/runs/19205824355/job/55001607207

As mpi4py tests almost every API from MPI 5.0, I think we are in very good shape. Nonetheless, I'm planning on testing a few extra things, mostly related to callbacks, to make sure everything is fine.

I confirmed your results. I can build MPI4PY using OMPI mpicc_abi, then switch LD_LIBRARY_PATH and PATH to use the MPICH ( at sha dcfab13774) and run both singleton and mpiexec variants of the test suite successfully.

@dalcinl
Copy link
Contributor

dalcinl commented Dec 9, 2025

@hppritcha Did I earn a Co-authored-by entry in the final commit? Or maybe an acknowledgement in the changelog? 😉

@hppritcha
Copy link
Member Author

yes if i can correctly do co-authoring on commits!

@hppritcha
Copy link
Member Author

there you go @dalcinl

jtronge and others added 3 commits December 9, 2025 14:02
Two external MPI libraries are now created: libmpi.la and libmpi_abi.la.
Backend code that was originally in libmpi.la has been extracted into
libopen-mpi.la to be linked into both libraries.

Parts of the Open MPI C interface are now being generated by a python
script (abi.py) from modified source files (named with *.in). This
script generates files for both the ompi ABI and the standard ABI from
the same source file, also including new bigcount interfaces.

To compile standard ABI code, there's a new mpicc_abi compiler wrapper.
The standard ABI does not yet include all functions or symbols, so more
complicated source files will not compile. ROMIO must be disabled for
the code to link, since it's relying on the external MPI interface.

Signed-off-by: Jake Tronge <[email protected]>
Implement lots of missing functionality in the original
api.py script.  The functionality is now part of the
bindings generation framework used for Big Count.

Number of todos still to do, in particular provide
support for wrapping user supplied callback functions.
Also, the sendrecv_replace, etc. code needs to be
refactored to work with the bindings framework.

to heed the --enable-abi-standard config option

Additional initial fixes to merging ABI support into big count
framework.

Co-authored-by: Jake Tronge <[email protected]>
Signed-off-by: Howard Pritchard <[email protected]>
This mod switches from using "synthetic" defined values and handles for
the ones specified in the MPI 5.1 standard.

The python infrastructure included in this PR for generating both a
"canonical" abi standard compatible MPI 5.1 mpi.h using two json files:

1) mpi-standard-abi.json
2) mpi-standard-apis.json

2 is generated as part of building the MPI standard.  We import that into our project
for use in generating both the mpi.h as well as interface definitions in the man pages.

1 is generated using a separate script that processes the tables in Appendix A of
standard.  Ideally this script will be merged into the MPI standard code base at some
point.  This script is currently at https://github.com/Joe-Downs/mpi-standard/tree/pr/handle-constant-tool/const-tool .
It is used to generate the portion of mpi.h where defined values and handles are specified.

The converter functions that had been generated as part of the build out of the abi variants
of the 'c' MPI interfaces are no define in persistent file.  The methods
defined in this file will be optimized to make use of the Huffman code characteristics of
the predefined values in a subsequent PR.

This commit also enables generation of the abi interfaces and header files by default.

Signed-off-by: Joseph Downs <[email protected]>
@hppritcha
Copy link
Member Author

hmm something on with aws ci.
bot:ompi:retest

hppritcha and others added 2 commits December 11, 2025 13:58
Many fixes and improvements contributed by dalcinl.

Add fortran support functions and some other functions
missing from initial work.

Return to generating converter functions as part of the build out of the abi variants.

Signed-off-by: Howard Pritchard <[email protected]>
Co-authored-by: Lisandro Dalcin <[email protected]>
irrespective of types of datatyps feed into non-blocking/persistent
alltoallw.

We'll replace this with something better in a separate PR.

Signed-off-by: Howard Pritchard <[email protected]>
@hppritcha hppritcha marked this pull request as ready for review December 11, 2025 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants