You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, using a derived datatype with an MPI reduction operation also requires the use of a user-defined MPI_Op. The function that implements an MPI_Op has the following C prototype:
void op_fcn(void *in, void *inout, int *count, MPI_Datatype *dtype)
Note that that user-define operations accept two buffers, but only one count and datatype. Because of this, both buffers must have the layout described by the count and datatype.
Consider a reduction on a column of a large row-major array. We can easily do a reduce operation directly on the column using an MPI vector datatype. Because this is not a built-in datatype, we must also provide a user-defined op to the reduction operation. The user-defined op expects all data to have the same layout because it takes only one datatype/count. Thus, MPI must reconstruct the sender's entire array before invoking the user-defined op, resulting in severe space inefficiency for this operation.
A test case is attached to the ticket that demonstrates the memory consumption issue.
Extended Scope
none.
History
none.
Proposed Solution
Define an MPI_Op that accepts one datatype for each buffer:
void op_fcn(void *in, int *count_in, MPI_Datatype *dtype_in, void *inout, int *count_inout, MPI_Datatype *dtype_inout)
This would allow MPI to pass one buffer in its packed form rather than recreating it's layout at the source.
This op could become challenging for a user to implement, thus it is necessary to investigate mechanisms to simplify this task. One possibility would be defining an op that takes two datatypes and one count. The MPI implementation would have to transform one or both datatypes to make individual units congruent. This seems doable for reductions since all processes must pass the same datatype.
Impact on Implementations
Impact on Applications and Users
Currently, reductions with derived datatypes are extremely inefficient. Fixing this issue would provide a significant performance enhancement.
Alternative Solutions
Several alternative solutions are possible:
Users can pack data before calling MPI_Reduce to avoid this problem.
An MPI implementation could pack both the in and inout buffers and pass both packed buffers to the user-define operation. When packed, both should share the same datatype and count. However, this approach still has significant space overhead.
The text was updated successfully, but these errors were encountered:
Originally by jdinan on 2012-06-06 13:44:07 -0500
Description
Currently, using a derived datatype with an MPI reduction operation also requires the use of a user-defined MPI_Op. The function that implements an MPI_Op has the following C prototype:
void op_fcn(void *in, void *inout, int *count, MPI_Datatype *dtype)
Note that that user-define operations accept two buffers, but only one count and datatype. Because of this, both buffers must have the layout described by the count and datatype.
Consider a reduction on a column of a large row-major array. We can easily do a reduce operation directly on the column using an MPI vector datatype. Because this is not a built-in datatype, we must also provide a user-defined op to the reduction operation. The user-defined op expects all data to have the same layout because it takes only one datatype/count. Thus, MPI must reconstruct the sender's entire array before invoking the user-defined op, resulting in severe space inefficiency for this operation.
A test case is attached to the ticket that demonstrates the memory consumption issue.
Extended Scope
none.
History
none.
Proposed Solution
Define an MPI_Op that accepts one datatype for each buffer:
void op_fcn(void *in, int *count_in, MPI_Datatype *dtype_in, void *inout, int *count_inout, MPI_Datatype *dtype_inout)
This would allow MPI to pass one buffer in its packed form rather than recreating it's layout at the source.
This op could become challenging for a user to implement, thus it is necessary to investigate mechanisms to simplify this task. One possibility would be defining an op that takes two datatypes and one count. The MPI implementation would have to transform one or both datatypes to make individual units congruent. This seems doable for reductions since all processes must pass the same datatype.
Impact on Implementations
Impact on Applications and Users
Currently, reductions with derived datatypes are extremely inefficient. Fixing this issue would provide a significant performance enhancement.
Alternative Solutions
Several alternative solutions are possible:
The text was updated successfully, but these errors were encountered: