-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30198][Core] BytesToBytesMap does not grow internal long array as expected #26828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -741,7 +741,9 @@ public boolean append(Object kbase, long koff, int klen, Object vbase, long voff | |
| longArray.set(pos * 2 + 1, keyHashcode); | ||
| isDefined = true; | ||
|
|
||
| if (numKeys >= growthThreshold && longArray.size() < MAX_CAPACITY) { | ||
| // We use two array entries per key, so the array size is twice the capacity. | ||
| // We should compare the current capacity of the array, instead of its size. | ||
| if (numKeys >= growthThreshold && longArray.size() / 2 < MAX_CAPACITY) { | ||
viirya marked this conversation as resolved.
Show resolved
Hide resolved
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @viirya . Can we have explicit test cases for these boundary conditions?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The max capacity is big number. Is it ok to have unit test allocating such big array?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guessed we can use
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Never mind. I don't want to block this PR and you because this looks urgent. I'll try that later by myself.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. Sounds good. I will also do test to see if I can add it. Thanks for the suggestion!
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, even mock with
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, you already tested that. Got it. Thank you for spending time for that. |
||
| try { | ||
| growAndRehash(); | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I also think that we should false canGrowArray like: if (numKeys >= growthThreshold && longArray.size() / 2 >= MAX_CAPACITY) {
canGrowArray = false;
}So as we reach max capacity of the map, canGrowArray is set to false. We can fail next append and let the map spill and fallback to sort-based aggregation in HashAggregate. Thus we can prevent a similar forever-loop happens when we reach max capacity.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we do this, we won't call
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I was not meaning to replace current condition, but to add another check.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let me think about it. if making sense, will submit another PR for it.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @viirya I'm encountering the same problem that you describe here. When the map is close to It looks like you didn't submit a PR for this - is there a reason why not? If there's no problem with your suggested fix, I can submit a PR now.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the quick response! I saw that PR (#26914) but I don't think it solves the problem I'm encountering. That PR stops accepting new keys once we have reached
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the problem I posted above, is when we reach
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You're right, it's not the same problem - I was mistaken in saying so earlier.
Yes, but by this point the task has typically consumed all available memory, so the allocation of the new pointer array is likely to fail.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I filed SPARK-32872 and submitted #29744 to fix this. |
||
| } catch (SparkOutOfMemoryError oom) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remind me some more details of
BytesToBytesMap? What happens if we don't grow? I don't see a loop in this method and not sure how the job hangs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sure.
The client of
BytesToBytesMap, like HashAggregate, will calllookupto find aLocationto write value. The returned location will be used to do append (Location.append). Everytime after we append a key/value, we check if it is time to grow internal array and grow up it if needed.lookupdelegates looking up keys tosafeLookup. Its control flow looks like:So the job hangs in this loop because it can not find any empty location as the internal array is full.
We early stop growing the internal array due to wrongly check array size at:
Another point #26828 (comment) is we may want to set
canGrowArrayto false once we are close to max capacity, so we can avoid infinite loop again.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't
lookupthrow OOM if no space can be found?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lookupjust looks up for empty slot in the internal array for a new key. It does not allocate memory. The array is allocated/grown up in last timeappend.Once an empty slot (a
Locationobject) is found, the client ofBytesToBytesMapmay callappendto the Location, OOM could be thrown during append new key/value.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me confirm the problem: so
appendmistakenly think there is enough space, and doesn't grow the array. This makes the client ofBytesToBytesMapkeeping callinglookupand hang. Is my understanding correct?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appendmistakenly think it can not grow the array anymore so does not grow the array. It keeps append value until full. Then the client callinglookupcan not find an empty slot and gets stuck in infinite loop.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the
appenddoesn't grow the array while it should. This makesBytesToBytesMapmalformed (the array is not big enough to serve the data region) and causes problems.