Skip to content

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Sep 15, 2025

What changes were proposed in this pull request?

This a followup for #47037. This PR wraps up the synchronize lock for invocation of OpenHashSet.add() in IndexShuffleBlockResolver.

Why are the changes needed?

OpenHashSet is not thread safe. We should enfoce synchronize lock when invokes the add function to ensure the thread-safety.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

N/A.

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ngone51 We should create a new Jira ticket because SPARK-46957 has already been released in Spark 4.0.

@Ngone51
Copy link
Member Author

Ngone51 commented Sep 15, 2025

@LuciferYang Thanks for the advice. Created a new ticket SPARK-53581.

@Ngone51 Ngone51 changed the title [SPARK-46957][FOLLOW-UP][CORE] Fix potential thread-safety issue for mapTaskIds.add() [SPARK-53581][FOLLOW-UP][CORE] Fix potential thread-safety issue for mapTaskIds.add() Sep 15, 2025
@attilapiros
Copy link
Contributor

Now as the PR title changed should not the FOLLOW-UP need to be removed?

@Ngone51 Ngone51 changed the title [SPARK-53581][FOLLOW-UP][CORE] Fix potential thread-safety issue for mapTaskIds.add() [SPARK-53581][CORE] Fix potential thread-safety issue for mapTaskIds.add() Sep 16, 2025
@Ngone51
Copy link
Member Author

Ngone51 commented Sep 16, 2025

@attilapiros Thanks. Updated.

Copy link
Contributor

@attilapiros attilapiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

dongjoon-hyun pushed a commit that referenced this pull request Sep 16, 2025
…add()

### What changes were proposed in this pull request?

This a followup for #47037. This PR wraps up the synchronize lock for invocation of `OpenHashSet.add()` in `IndexShuffleBlockResolver`.

### Why are the changes needed?

`OpenHashSet` is not thread safe. We should enfoce synchronize lock when invokes the add function to ensure the thread-safety.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #52337 from Ngone51/fix-thread-safety.

Authored-by: Yi Wu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 78aba00)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Sep 16, 2025
…add()

### What changes were proposed in this pull request?

This a followup for #47037. This PR wraps up the synchronize lock for invocation of `OpenHashSet.add()` in `IndexShuffleBlockResolver`.

### Why are the changes needed?

`OpenHashSet` is not thread safe. We should enfoce synchronize lock when invokes the add function to ensure the thread-safety.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #52337 from Ngone51/fix-thread-safety.

Authored-by: Yi Wu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 78aba00)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Thank you, @Ngone51 and all.

Merged to master/4.0/3.5 for

  • Apache Spark 4.1.0-preview2
  • Apache Spark 4.0.2
  • Apache Spark 3.5.7

cc @peter-toth as the release manager of Apache Spark 3.5.7, too.

@Ngone51
Copy link
Member Author

Ngone51 commented Sep 17, 2025

Thanks all!

@mridulm
Copy link
Contributor

mridulm commented Sep 17, 2025

I am wondering if SortShuffleManager.unregisterShuffle is also impacted by this ...

computeIfAbsent in IndexShuffleBlockResolver.onComplete -> remove in SortShuffleManager.unregisterShuffle -> mapTaskIds.add while iterating over mapTaskIds in unregisterShuffle

I am entirely sure if this combination is possible, but would be a good idea to make SortShuffleManager.unregisterShuffle lock mapTaskIds as well given we have identified the issue in this PR.

Thoughts @Ngone51 ?

@Ngone51
Copy link
Member Author

Ngone51 commented Sep 17, 2025

@mridulm Good catch! I'll make a separate PR to fix it later.

@LuciferYang
Copy link
Contributor

I'd like to expand on this issue. Is there a possibility of the reverse timing sequence for the operations mentioned by @mridulm? If SortShuffleManager.unregisterShuffle occurs first, and then the computeIfAbsent in IndexShuffleBlockResolver.onComplete happens afterward, could this lead to a minor memory leak?

@Ngone51
Copy link
Member Author

Ngone51 commented Sep 17, 2025

If SortShuffleManager.unregisterShuffle occurs first, and then the computeIfAbsent in IndexShuffleBlockResolver.onComplete happens afterward, could this lead to a minor memory leak?

@LuciferYang Possible, but I think that's fine. Because even if we have ensured the thread-safety already, this kind of issue could persist due to the race betwen shuffle removal and shuffle adding. The critial issue on this thread-safety problem is that the state of OpenHashSet could be corrupted, leading to incorrect data accessing result.

dongjoon-hyun pushed a commit that referenced this pull request Oct 21, 2025
…egisterShuffle

### What changes were proposed in this pull request?

This PR fixes the thread-safetye issue in `SortShuffleManager.unregisterShuffle` by enforcing synchronous lock on `mapTaskIds`'s iteration. Besides, this PR also addresses the [concern](#52337 (comment)) to enfore the type of `taskIdMapsForShuffle` as `ConcurrentHashMap` to ensure its thread-safety.

### Why are the changes needed?

Fix the potential thread-safety issue as pointed at #52337 (comment) and also the [concern](#52337 (comment)).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #52386 from Ngone51/fix.

Authored-by: Yi Wu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…dd()

### What changes were proposed in this pull request?

This a followup for apache#47037. This PR
wraps up the synchronize lock for invocation of `OpenHashSet.add()` in
`IndexShuffleBlockResolver`.

### Why are the changes needed?

`OpenHashSet` is not thread safe. We should enfoce synchronize lock when
invokes the add function to ensure the thread-safety.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52337 from Ngone51/fix-thread-safety.

Authored-by: Yi Wu <[email protected]>

(cherry picked from commit 78aba00)

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Yi Wu <[email protected]>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…add()

### What changes were proposed in this pull request?

This a followup for apache#47037. This PR wraps up the synchronize lock for invocation of `OpenHashSet.add()` in `IndexShuffleBlockResolver`.

### Why are the changes needed?

`OpenHashSet` is not thread safe. We should enfoce synchronize lock when invokes the add function to ensure the thread-safety.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52337 from Ngone51/fix-thread-safety.

Authored-by: Yi Wu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit baae014)
Signed-off-by: Dongjoon Hyun <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…add()

### What changes were proposed in this pull request?

This a followup for apache#47037. This PR wraps up the synchronize lock for invocation of `OpenHashSet.add()` in `IndexShuffleBlockResolver`.

### Why are the changes needed?

`OpenHashSet` is not thread safe. We should enfoce synchronize lock when invokes the add function to ensure the thread-safety.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52337 from Ngone51/fix-thread-safety.

Authored-by: Yi Wu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…egisterShuffle

### What changes were proposed in this pull request?

This PR fixes the thread-safetye issue in `SortShuffleManager.unregisterShuffle` by enforcing synchronous lock on `mapTaskIds`'s iteration. Besides, this PR also addresses the [concern](apache#52337 (comment)) to enfore the type of `taskIdMapsForShuffle` as `ConcurrentHashMap` to ensure its thread-safety.

### Why are the changes needed?

Fix the potential thread-safety issue as pointed at apache#52337 (comment) and also the [concern](apache#52337 (comment)).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52386 from Ngone51/fix.

Authored-by: Yi Wu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants