Fortran REAL16: improve detection and wiring across OMPI/OPAL#13612
Fortran REAL16: improve detection and wiring across OMPI/OPAL#13612bosilca merged 1 commit intoopen-mpi:mainfrom
Conversation
9d83217 to
846dfb9
Compare
devreal
left a comment
There was a problem hiding this comment.
Minor nit-pick. Can we document in the code why we pick the order the way it is?
0148404 to
58cd5a4
Compare
Which order ? During configure or in the header files ? |
|
@bosilca Getting an mpi4py floating point exception in CI. Can you investigate? |
|
this doesn't fix the issue reported in issue #13564 |
a58d6a0 to
151fd6d
Compare
9b94f02 to
28290d4
Compare
|
I can't reproduce the mpi4py issue on any of the machines I could test. @dalcinl any way you can help me out |
|
@jsquyres @bosilca Looks like out-of-source builds are broken. EDIT: I tried with the main branch, and things are working OK. |
|
@bosilca Here we go ...
I'm quite surprised. I reproduced the failure straight away (after fighting with out-of-source ompi build issues) on my Fedora 42 system. Your changes somehow broke pack/unpack external for Here you have a MRE: from mpi4py import MPI
import numpy as np
c = "g"
n = 1
a = np.zeros(n, c)
dt = MPI.Datatype.fromcode(c)
print(dt.Get_name())
size = dt.Pack_external_size("external32", n)
packbuf = np.zeros(size + 1, "B")
position = dt.Pack_external("external32", a, packbuf, 0)
assert position == sizeAfter running under valgrind, seems like the datatype pack implementation is calling a NULL convertor function pointer (again, just guessing): $ valgrind -q python test.py
hwloc x86 backend cannot work under Valgrind, disabling.
May be reenabled by dumping CPUIDs with hwloc-gather-cpuid
and reloading them under Valgrind with HWLOC_CPUID_PATH.
MPI_LONG_DOUBLE
==543221== Jump to the invalid address stated on the next line
==543221== at 0x0: ???
==543221== by 0x14948F6B: opal_pack_general (opal_datatype_pack.c:536)
==543221== by 0x1493024F: opal_convertor_pack (opal_convertor.c:292)
==543221== by 0x143BF6C5: ompi_datatype_pack_external (ompi_datatype_external.c:67)
==543221== by 0x1443CBD7: PMPI_Pack_external (pack_external_generated.c:70)
==543221== by 0x1401383D: PyMPI_Pack_external_c (in /home/dalcinl/Devel/mpi4py/src/mpi4py/MPI.cpython-314-x86_64-linux-gnu.so)
==543221== by 0x140E44D1: __pyx_pf_6mpi4py_3MPI_8Datatype_72Pack_external (in /home/dalcinl/Devel/mpi4py/src/mpi4py/MPI.cpython-314-x86_64-linux-gnu.so)
==543221== by 0x140E416E: __pyx_pw_6mpi4py_3MPI_8Datatype_73Pack_external (in /home/dalcinl/Devel/mpi4py/src/mpi4py/MPI.cpython-314-x86_64-linux-gnu.so)
==543221== by 0x142BEEF4: __Pyx_CyFunction_Vectorcall_FASTCALL_KEYWORDS (in /home/dalcinl/Devel/mpi4py/src/mpi4py/MPI.cpython-314-x86_64-linux-gnu.so)
==543221== by 0x49D7C36: _PyObject_VectorcallTstate (pycore_call.h:169)
==543221== by 0x49D7C36: PyObject_Vectorcall (call.c:327)
==543221== by 0x49ECDB3: _PyEval_EvalFrameDefault (generated_cases.c.h:1619)
==543221== by 0x49E87C4: _PyEval_EvalFrame (pycore_ceval.h:121)
==543221== by 0x49E87C4: _PyEval_Vector (ceval.c:2083)
==543221== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==543221==
[optiplex:543221] *** Process received signal ***
[optiplex:543221] Signal: Segmentation fault (11)
[optiplex:543221] Signal code: Invalid permissions (2)
[optiplex:543221] Failing at address: (nil)
[optiplex:543221] [ 0] /lib64/libc.so.6(+0x1a290) [0x4ece290]
[optiplex:543221] *** End of error message ***
Segmentation fault (core dumped) valgrind -q python test.py |
17bd2fc to
361bee4
Compare
|
Thanks @dalcinl , with your reproducer I was able to identify (and fix) the issue. |
Probe REAL*16 against _Float128 first, then __float128, and finally _Quad (Intel) to find a C type with matching bit representation. Ensure OPAL’s FLOAT12/FLOAT16 constructors are always available and map OMPI/MPI REAL16 based on architecture/language specifics. Wire FLOAT128 types through copy/pack/unpack paths and hook REAL16 into the base MPI_Op table. This enables software-only reductions for REAL16 for as long as the Fortran type has a C equivalent. When an OPAL type description is decided at build time (such as float12 and float16), create an OPAL-level #define with their selected size. This allow the rest of the code to simply use this size instead of trying to figure out what is the real size of the type. Signed-off-by: George Bosilca <gbosilca@nvidia.com>
361bee4 to
941af5e
Compare
|
just fyi - I ran this through an AI code reviewer. And while they can generally be the equivalent of a fast-food cashier trying to upsell you on items, this one had literally zero complaints :) I guess it did suggest to have unit tests, but we already do/did that I assume? |
|
From the MPI standard perspective this is an optional Fortran type with no C equivalent. If there are tests they should be among the Fortran tests. |
Probe REAL*16 against _Float128 first, then __float128, and finally _Quad (Intel) to find a C type with matching bit representation. Ensure OPAL’s FLOAT12/FLOAT16 constructors are always available and map OMPI/MPI REAL16 based on architecture/language specifics.
Wire FLOAT128 types through copy/pack/unpack paths and hook REAL16 into the base MPI_Op table. This enables software-only reductions for REAL16 for as long as the Fortran type has a C equivalent.
Fixes #13564