Big MPI---point-to-point considerations (MPI_Rank only) #97

tonyskjellum · 2018-06-14T14:17:51Z

Problem

For 64-bit clean functionality, convenience, and symmetry, the Big MPI principles being applied in Ticket #80 to collective operations should be applied to MPI more widely. In this case, we consider the idea that you might want more than 2^31 MPI ranks, hence needing a new data type, MPI_Rank.

Proposal

MPI needs to be 64-bit clean throughout.

Changes to the Text

MPI_Rank will replace int for ranks; support > 2^31 MPI processes in a communicator.

A separate ticket considers MPI_Count and miscellaneous concerns for point-to-point.

Impact on Implementations

No current API is impacted. New _X APIs for all point-to-point operations affected will be needed.

MPI implementations will have to be 64-bit clean inside since count*extent > 2^31 is already problematic for some implementations. New APIs will have to be added and the internals of MPI will have to be 64-bit capable for buffers and related issues.

Impact on Users

Users who opt in with the new API will be able to have communicators larger than 2^31. [MPI_Rank]

References

MPICH

/* The order of these elements must match that in mpif.h, mpi_f08_types.f90,
   and mpi_c_interface_types.f90 */
typedef struct MPI_Status {
    int count_lo;
    int count_hi_and_cancelled;
    int MPI_SOURCE;
    int MPI_TAG;
    int MPI_ERROR;
} MPI_Status;

Open-MPI

struct ompi_status_public_t {
    /* These fields are publicly defined in the MPI specification.
       User applications may freely read from these fields. */
    int MPI_SOURCE;
    int MPI_TAG;
    int MPI_ERROR;
    /* The following two fields are internal to the Open MPI
       implementation and should not be accessed by MPI applications.
       They are subject to change at any time.  These are not the
       droids you're looking for. */
    int _cancelled;
    size_t _ucount;
};
typedef struct ompi_status_public_t ompi_status_public_t;

jeffhammond · 2018-06-14T14:41:26Z

You will also need to revise the matching rules for how this works when users send using MPI_Rank and probe or recv using int. What is the semantic of this? Return an error code and expect the user to try again with the new function?

tonyskjellum · 2018-06-14T14:48:34Z

Jeff, we will consider that.

Process: Martin specifically asked that we consider the MPI_Rank option. We will specifically split the vote in a way to allow the forum to accept/reject this change for MPI-4. It was pointed out that endpoints and GPU-like devices and fine-grain accelerators could yield > 2^31 ranks in a communicator.

Technical: The problem with the heterogeneous use of the APIs will have to be fully considered like you're saying. Seems like, to allow this, protocols will carry an extra 32-bits of rank space. (Not my favorite answer, that's my straw answer. That means a tax on current performance.)

tonyskjellum · 2018-06-14T14:53:41Z

How about if I split this ticket now --- a) MPI_Count + miscellaneous; b) MPI_Rank ?

jdinan · 2018-06-14T14:59:16Z

There are several constants that can be passed through rank arguments -- e.g. MPI_ANY_SOURCE and MPI_PROC_NULL. We would potentially need to introduce _X versions of these constants or require implementations to define them in a way that is compatible with both int and MPI_Rank (in both C and Fortran language interfaces).

hjelmn · 2018-06-14T15:01:07Z

I maintain my view that we should just break backwards compatibility in MPI-4.0. Yes, this will require a period of time where MPI implementors have an MPI-3.x release and an MPI-4.x+ release but it would be worth it to avoid having _x and _with_info versions all over the place.

jeffhammond · 2018-06-14T16:56:33Z

@hjelmn Note that this will not only break ABI compatibility but user code that uses int for ranks. For example, MPI_Group{_range}_[in,ex]cl and MPI_Group_translate_ranks take vectors of int in C. Any user code that has arrays of ints to pass to these functions will likely segfault if MPI_Rank is more than 32b.

As out-of-bounds array accesses are undefined behavior in C, your proposal not only breaks applications in practice but also causes them to violate the base language in which they are written.

The only reasonable thing to do here is expect ILP64 support if more than 2Bi ranks are required.

jeffhammond · 2018-06-14T16:59:04Z

Any tickets proposing to add MPI_Rank or break ABI compatibility must be accompanied by a production-grade implementation that demonstrates successful execution on more than 2Bi ranks to know that the proposal is technically sound.

hjelmn · 2018-06-14T17:02:25Z

MPI does not define ABI compatibility so I am not concerned about that.

jeffhammond · 2018-06-14T18:19:48Z

Breaking user code is the bigger issue. Please describe how you’ll address that.

hjelmn · 2018-06-14T18:21:55Z

The way I see it, if we break API user will have to modify their code for MPI-4.0. All the changes will be simple to make but will take some work. Thats why I imagine that a high-quality implementation will provide an MPI-3.x layer during some transition period.

jeffhammond · 2018-06-14T18:35:08Z

So you want implementations to support two complete MPI APIs in parallel? You think maintaining passive target is a burden but the int and MPI_Rank APIs aren’t?

hjelmn · 2018-06-14T19:09:18Z

Not at all. My preference is that we will continue to support an older version for a little longer than usual and drop the MPI-3.x API in the new releases.

This happens all the time in the software world. MPI's API has issues. We should fix it the right way now and be done with it. None of this _foo nonsense.

jeffhammond · 2018-06-14T20:46:15Z

2^31+1 ranks is a stupid reason to break MPI. This is a hill on which I am prepared to die.

hjelmn · 2018-06-18T21:04:52Z

Not even remotely saying we break API for ranks. Just saying if we break it because of info, counts, etc might as well change ranks as well.

tonyskjellum · 2019-05-31T15:13:06Z

I am closing this ticket for now, it is highly controversial and it will distract from the rest of the Big MPI activities.

tonyskjellum added not ready wg-p2p Point-to-Point Working Group labels Jun 14, 2018

tonyskjellum assigned tonyskjellum and dholmes-epcc-ed-ac-uk Jun 14, 2018

This was referenced Jun 14, 2018

Big MPI--MPI I/O functionality #98

Closed

Big MPI--RMA functionality #99

Closed

tonyskjellum added the wg-large-counts Large Counts Working Group label Jun 14, 2018

tonyskjellum changed the title ~~Big MPI---point-to-point considerations~~ Big MPI---point-to-point considerations (MPI_Rank only) Jun 14, 2018

tonyskjellum mentioned this issue Jun 14, 2018

Big MPI---point-to-point functionality (MPI_Count + miscellaneous) #100

Closed

tonyskjellum mentioned this issue Sep 5, 2018

Big MPI---large-count and displacement support--collective chapter #80

Closed

tonyskjellum closed this as completed May 31, 2019

martinruefenacht mentioned this issue May 31, 2019

The Embiggenment ("BigCount") #137

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big MPI---point-to-point considerations (MPI_Rank only) #97

Big MPI---point-to-point considerations (MPI_Rank only) #97

tonyskjellum commented Jun 14, 2018 •

edited

Loading

jeffhammond commented Jun 14, 2018

jeffhammond commented Jun 14, 2018

jeffhammond commented Jun 14, 2018

tonyskjellum commented Jun 14, 2018

tonyskjellum commented Jun 14, 2018

jdinan commented Jun 14, 2018

hjelmn commented Jun 14, 2018

jeffhammond commented Jun 14, 2018

jeffhammond commented Jun 14, 2018

hjelmn commented Jun 14, 2018

jeffhammond commented Jun 14, 2018 via email

hjelmn commented Jun 14, 2018

jeffhammond commented Jun 14, 2018 via email

hjelmn commented Jun 14, 2018

jeffhammond commented Jun 14, 2018 via email

hjelmn commented Jun 18, 2018

tonyskjellum commented May 31, 2019

Big MPI---point-to-point considerations (MPI_Rank only) #97

Big MPI---point-to-point considerations (MPI_Rank only) #97

Comments

tonyskjellum commented Jun 14, 2018 • edited Loading

Problem

Proposal

Changes to the Text

Impact on Implementations

Impact on Users

References

jeffhammond commented Jun 14, 2018

jeffhammond commented Jun 14, 2018

MPICH

Open-MPI

jeffhammond commented Jun 14, 2018

tonyskjellum commented Jun 14, 2018

tonyskjellum commented Jun 14, 2018

jdinan commented Jun 14, 2018

hjelmn commented Jun 14, 2018

jeffhammond commented Jun 14, 2018

jeffhammond commented Jun 14, 2018

hjelmn commented Jun 14, 2018

jeffhammond commented Jun 14, 2018 via email

hjelmn commented Jun 14, 2018

jeffhammond commented Jun 14, 2018 via email

hjelmn commented Jun 14, 2018

jeffhammond commented Jun 14, 2018 via email

hjelmn commented Jun 18, 2018

tonyskjellum commented May 31, 2019

tonyskjellum commented Jun 14, 2018 •

edited

Loading