-
Notifications
You must be signed in to change notification settings - Fork 945
A new binomial scatter using packed data on intermediary processes. #8383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
|
@bwbarrett gcc5 test failed because of a full disk |
|
@bosilca is this supposed to always work on an heterogeneous cluster (mixing big and little endian for example) ? |
mkurnosov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bosilca thank you. I did not find any bugs. I have tested for contiguous and non-contiguous data types, for all roots and MPI_IN_PLACE.
One proposal: it is possible to reduce memory consumption in non-root and non-leaf processes: scount * (size + 1) / 2 it is an upper bound. We can compute subtree size and allocate buffer of corresponding size:
int vparent = (bmtree->tree_prev - root + size) % size;
int subtree_size = vrank - vparent;
if (size - vrank < subtree_size)
subtree_size = size - vrank;
packed_size = scount * subtree_size;
|
@ggouaillardet this version is almost heterogeneous. The only remaining issue is that all participants except the root, will need to create a convertor for the root architecture (from the ompi_proc_t structure corresponding to the root) and use this convertor to unpack the local part of the message. This is true for non-leaves (where we already have an unpack operation), but also for the leaves where we will need a temporary buffer and an unpack. |
|
@mkurnosov sure, it makes sense to reduce the temporary memory for non-leaves. If we are looking at optimizations, we could also avoid packing the local part on the root, and or interleaving pack and send on the root (which will also lead to less memory required for temporaries). |
|
bot:aws:retest |
|
bot:aws:retest |
|
I'm done with this. I was under the impression that @mkurnosov wanted to add some optimizations, but it's entirely up to him. |
|
I think that we should merge this PR, and if optimization comes in later, great. This fixes a real bug, so we should go ahead and merge it. |
jsquyres
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by proxy for @mkurnosov
This PR provides an alternative solution to #8283, where all temporaries are handled as packed data resulting in less memory allocated for storing the temporary data.
Signed-off-by: George Bosilca bosilca@icl.utk.edu