[SPARK-40932][CORE] Fix issue messages for allGather are overridden #38410
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
The messages returned by allGather may be overridden by the following barrier APIs, eg,
the
messagesmay be like Array("", ""), but we're expecting Array("ABC", "ABC")The root cause of this issue is the messages got by allGather pointing to the original message in the local mode. So when the following barrier APIs changed the messages, then the allGather message will be changed accordingly.
Finally, users can't get the correct result.
This PR fixed this issue by sending back the cloned messages.
Why are the changes needed?
The bug mentioned in this description may block some external SPARK ML libraries which heavily depend on the spark barrier API to do some synchronization. If the barrier mechanism can't guarantee the correctness of the barrier APIs, it will be a disaster for external SPARK ML libraries.
Does this PR introduce any user-facing change?
No
How was this patch tested?
I added a unit test, with this PR, the unit test can pass