Skip to content

Conversation

@zhzhan
Copy link
Contributor

@zhzhan zhzhan commented Feb 1, 2018

What changes were proposed in this pull request?

here is race condition in TaskMemoryManger, which may cause OOM.

The memory released may be taken by another task because there is a gap between releaseMemory and acquireMemory, e.g., UnifiedMemoryManager, causing the OOM. if the current is the only one that can perform spill. It can happen to BytesToBytesMap, as it only spill required bytes.

Loop on current consumer if it still has memory to release.

How was this patch tested?

The race contention is hard to reproduce, but the current logic seems causing the issue.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@gatorsmile
Copy link
Member

cc @jiangxb1987 @cloud-fan

@SparkQA
Copy link

SparkQA commented Feb 2, 2018

Test build #86943 has finished for PR 20480 at commit df96f0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 2, 2018

Test build #86944 has finished for PR 20480 at commit afe40e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM this should be correct logically, and I can't think out a better way to resolve it. cc @cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in b3a0428 Feb 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants