Skip to content

Fortran REAL16: improve detection and wiring across OMPI/OPAL#13612

Merged
bosilca merged 1 commit intoopen-mpi:mainfrom
bosilca:topic/better_support_for_fortran_real16
Feb 2, 2026
Merged

Fortran REAL16: improve detection and wiring across OMPI/OPAL#13612
bosilca merged 1 commit intoopen-mpi:mainfrom
bosilca:topic/better_support_for_fortran_real16

Conversation

@bosilca
Copy link
Member

@bosilca bosilca commented Jan 6, 2026

Probe REAL*16 against _Float128 first, then __float128, and finally _Quad (Intel) to find a C type with matching bit representation. Ensure OPAL’s FLOAT12/FLOAT16 constructors are always available and map OMPI/MPI REAL16 based on architecture/language specifics.

Wire FLOAT128 types through copy/pack/unpack paths and hook REAL16 into the base MPI_Op table. This enables software-only reductions for REAL16 for as long as the Fortran type has a C equivalent.

Fixes #13564

@bosilca bosilca force-pushed the topic/better_support_for_fortran_real16 branch from 9d83217 to 846dfb9 Compare January 6, 2026 05:00
Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit-pick. Can we document in the code why we pick the order the way it is?

@bosilca bosilca force-pushed the topic/better_support_for_fortran_real16 branch 2 times, most recently from 0148404 to 58cd5a4 Compare January 6, 2026 16:39
@bosilca
Copy link
Member Author

bosilca commented Jan 6, 2026

Minor nit-pick. Can we document in the code why we pick the order the way it is?

Which order ? During configure or in the header files ?

@bosilca bosilca self-assigned this Jan 6, 2026
@jsquyres
Copy link
Member

@bosilca Getting an mpi4py floating point exception in CI. Can you investigate?

@hppritcha
Copy link
Member

this doesn't fix the issue reported in issue #13564

@bosilca bosilca force-pushed the topic/better_support_for_fortran_real16 branch 2 times, most recently from a58d6a0 to 151fd6d Compare January 21, 2026 23:31
@bosilca bosilca changed the title Better support for Fortran REAL16 types Fortran REAL16: improve detection and wiring across OMPI/OPAL Jan 22, 2026
@bosilca bosilca force-pushed the topic/better_support_for_fortran_real16 branch 5 times, most recently from 9b94f02 to 28290d4 Compare January 27, 2026 17:22
@bosilca
Copy link
Member Author

bosilca commented Jan 27, 2026

I can't reproduce the mpi4py issue on any of the machines I could test. @dalcinl any way you can help me out

@dalcinl
Copy link
Contributor

dalcinl commented Jan 28, 2026

@jsquyres @bosilca Looks like out-of-source builds are broken.

make[2]: Entering directory '/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c'
  CC       attr_fn.lo
Traceback (most recent call last):
  File "/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c/../../../../../ompi-main/ompi/mpi/bindings/bindings.py", line 79, in <module>
    main()
    ~~~~^^
  File "/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c/../../../../../ompi-main/ompi/mpi/bindings/bindings.py", line 75, in main
    args.handler(args, OutputFile(f))
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c/../../../../../ompi-main/ompi/mpi/bindings/bindings.py", line 65, in <lambda>
    parser_gen.set_defaults(handler=lambda args, out: c.generate_source(args, out))
                                                      ~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/dalcinl/Devel/REPOS/ompi-BUILD/main/../../ompi-main/ompi/mpi/bindings/ompi_bindings/c.py", line 381, in generate_source
    template = SourceTemplate.load(args.source_file, type_constructor=Type.construct)
  File "/home/dalcinl/Devel/REPOS/ompi-BUILD/main/../../ompi-main/ompi/mpi/bindings/ompi_bindings/parser.py", line 97, in load
    with open(fname) as fp:
         ~~~~^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'abort.c.in'
make[2]: *** [Makefile:13618: mpi_bindings_generated.stamp] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c'
make[1]: *** [Makefile:2704: install-recursive] Error 1
make[1]: Leaving directory '/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi'
make: *** [Makefile:1546: install-recursive] Error 1

EDIT: I tried with the main branch, and things are working OK.
@bosilca I noticed your recent VPATH fixes in main. Maybe you should rebase this PR?

@dalcinl
Copy link
Contributor

dalcinl commented Jan 28, 2026

@bosilca Here we go ...

I can't reproduce the mpi4py issue on any of the machines I could test.

I'm quite surprised. I reproduced the failure straight away (after fighting with out-of-source ompi build issues) on my Fedora 42 system.

Your changes somehow broke pack/unpack external for MPI_LONG_DOUBLE. As long double and __float128 could be confused on x68_64 due to both types having sizeof 16, maybe your changes uncovered a previous hidden issue. Just uninformed guessing ...

Here you have a MRE:

from mpi4py import MPI
import numpy as np

c = "g"
n = 1

a = np.zeros(n, c)
dt = MPI.Datatype.fromcode(c)
print(dt.Get_name())

size = dt.Pack_external_size("external32", n)
packbuf = np.zeros(size + 1, "B")
position = dt.Pack_external("external32", a, packbuf, 0)
assert position == size

After running under valgrind, seems like the datatype pack implementation is calling a NULL convertor function pointer (again, just guessing):

$ valgrind -q python test.py
hwloc x86 backend cannot work under Valgrind, disabling.
May be reenabled by dumping CPUIDs with hwloc-gather-cpuid
and reloading them under Valgrind with HWLOC_CPUID_PATH.
MPI_LONG_DOUBLE
==543221== Jump to the invalid address stated on the next line
==543221==    at 0x0: ???
==543221==    by 0x14948F6B: opal_pack_general (opal_datatype_pack.c:536)
==543221==    by 0x1493024F: opal_convertor_pack (opal_convertor.c:292)
==543221==    by 0x143BF6C5: ompi_datatype_pack_external (ompi_datatype_external.c:67)
==543221==    by 0x1443CBD7: PMPI_Pack_external (pack_external_generated.c:70)
==543221==    by 0x1401383D: PyMPI_Pack_external_c (in /home/dalcinl/Devel/mpi4py/src/mpi4py/MPI.cpython-314-x86_64-linux-gnu.so)
==543221==    by 0x140E44D1: __pyx_pf_6mpi4py_3MPI_8Datatype_72Pack_external (in /home/dalcinl/Devel/mpi4py/src/mpi4py/MPI.cpython-314-x86_64-linux-gnu.so)
==543221==    by 0x140E416E: __pyx_pw_6mpi4py_3MPI_8Datatype_73Pack_external (in /home/dalcinl/Devel/mpi4py/src/mpi4py/MPI.cpython-314-x86_64-linux-gnu.so)
==543221==    by 0x142BEEF4: __Pyx_CyFunction_Vectorcall_FASTCALL_KEYWORDS (in /home/dalcinl/Devel/mpi4py/src/mpi4py/MPI.cpython-314-x86_64-linux-gnu.so)
==543221==    by 0x49D7C36: _PyObject_VectorcallTstate (pycore_call.h:169)
==543221==    by 0x49D7C36: PyObject_Vectorcall (call.c:327)
==543221==    by 0x49ECDB3: _PyEval_EvalFrameDefault (generated_cases.c.h:1619)
==543221==    by 0x49E87C4: _PyEval_EvalFrame (pycore_ceval.h:121)
==543221==    by 0x49E87C4: _PyEval_Vector (ceval.c:2083)
==543221==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==543221== 
[optiplex:543221] *** Process received signal ***
[optiplex:543221] Signal: Segmentation fault (11)
[optiplex:543221] Signal code: Invalid permissions (2)
[optiplex:543221] Failing at address: (nil)
[optiplex:543221] [ 0] /lib64/libc.so.6(+0x1a290) [0x4ece290]
[optiplex:543221] *** End of error message ***
Segmentation fault         (core dumped) valgrind -q python test.py

@bosilca bosilca force-pushed the topic/better_support_for_fortran_real16 branch 2 times, most recently from 17bd2fc to 361bee4 Compare January 28, 2026 20:34
@bosilca
Copy link
Member Author

bosilca commented Jan 28, 2026

Thanks @dalcinl , with your reproducer I was able to identify (and fix) the issue.

Probe REAL*16 against _Float128 first, then __float128, and finally _Quad
(Intel) to find a C type with matching bit representation. Ensure OPAL’s
FLOAT12/FLOAT16 constructors are always available and map OMPI/MPI REAL16
based on architecture/language specifics.

Wire FLOAT128 types through copy/pack/unpack paths and hook REAL16 into the
base MPI_Op table. This enables software-only reductions for REAL16 for
as long as the Fortran type has a C equivalent.

When an OPAL type description is decided at build time (such as float12
and float16), create an OPAL-level #define with their selected size.
This allow the rest of the code to simply use this size instead of
trying to figure out what is the real size of the type.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
@bosilca bosilca force-pushed the topic/better_support_for_fortran_real16 branch from 361bee4 to 941af5e Compare January 28, 2026 21:35
@bosilca bosilca requested a review from devreal January 29, 2026 00:25
@janjust
Copy link
Contributor

janjust commented Jan 31, 2026

just fyi - I ran this through an AI code reviewer. And while they can generally be the equivalent of a fast-food cashier trying to upsell you on items, this one had literally zero complaints :) I guess it did suggest to have unit tests, but we already do/did that I assume?

@bosilca
Copy link
Member Author

bosilca commented Feb 2, 2026

From the MPI standard perspective this is an optional Fortran type with no C equivalent. If there are tests they should be among the Fortran tests.

@bosilca bosilca merged commit a8e0c9d into open-mpi:main Feb 2, 2026
17 checks passed
@bosilca bosilca deleted the topic/better_support_for_fortran_real16 branch February 2, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

quad precision (128 bit) reductions return bad results

6 participants