-
Notifications
You must be signed in to change notification settings - Fork 940
Backport/datatype #8837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport/datatype #8837
Conversation
597f63a to
f15f4e2
Compare
Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit e8ebe13)
Signed-off-by: Christoph Niethammer <[email protected]> (cherry picked from commit 9901325)
Signed-off-by: Austen Lauria <[email protected]> (cherry picked from commit ef28e8d)
This commit fixes the support for heterogeneous environments and specifically for external32. The root cause was that during the datatype optimization process types that are contiguous in memory are collapsed together in order to decrease the number of conversion (or memcpy) function calls. The resulting type however, does not have the same conversion rules as the types it replaced, leading to an incorrect (or absent) conversion in some cases. This patch marks the datatypes where types have been collapsed during the optimization process with a flag, allowing the convertor to detect if the optimized type can be used in heterogeneous setups. Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit 73d64cb)
When unpacking a partial predefined element check the boundaries of the description vector type, and adjust the memory pointer accordingly (to reflect not only when a single basic type was correctly unpacked, but also when an entire blocklen has been unpacked). Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit fb07960)
f15f4e2 to
85d2dbc
Compare
Signed-off-by: George Bosilca <[email protected]> bot:notacherrypick
85d2dbc to
565d72e
Compare
|
@open-mpi/ucx Could you guys verify that this fixes this issue for you on v4.1.x ASAP? |
|
Confirmed: issue is resolved. |
|
@hoopoepg Could you do me a huge favor (since I don't have access to UCX/IB networks)? Could you try that reproducer on v4.0.0? I'd like to know how far back this issue goes. |
Brings some updates on the datatype engine into the 4.1. Among these the most critical is the partial unpack bug from #8466.
Here are the commits from master that are covered by this PR:
e8ebe13
9901325
ef28e8d
73d64cb
fb07960
It must be noted that this PR does not bring the support for MPI_LONG and MPI_UNSIGNED_LONG in external32, because it would have required to break the ABI (because of the 2 new datatypes #define added).
Unfortunately, I had to import 2 additional commits in order to be able to build and run on an M1: 4f2dde0 and 73aae14.
Fixes #8466.
One of these commits is intentionally not a cherry pick: bot:notacherrypick