Fix AllocationPoll::newRun fails when requests exceed largestSizeClass#4713
Fix AllocationPoll::newRun fails when requests exceed largestSizeClass#4713marin-ma wants to merge 5 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
| } | ||
| pool_->allocateNonContiguous( | ||
| std::max<int32_t>(kMinPages, numPages), allocation_, numPages); | ||
| std::max<int32_t>(kMinPages, numPages), allocation_); |
There was a problem hiding this comment.
pool_->allocateNonContiguous(
std::max<int32_t>(kMinPages, numPages), allocation_, std::min(numPages, pool_->largestSizeClass()));
| } | ||
|
|
||
| TEST_P(MemoryAllocatorTest, exceedLargestSizeClass) { | ||
| const size_t kExceedLargestSizeClass = instance_->largestSizeClass() + 1; |
There was a problem hiding this comment.
Can you test with different allocation size based on the class size and check the number of page runs in the resulting allocation?
AllocationPool::allocationAt(0)?
1, => expect one page run?
first size class + 1 => expect two page runs?
second size class + 1 => expect two page runs?
...
largestSizeClass() + 1 => expect two page runs?
You can structure the test with a while loop and each iteration test different size + expected page run? Thanks!
There was a problem hiding this comment.
Thanks for your comments! I tried to add these checks but I found in most cases allocation->numRuns() == 1. Is it as expected, or did I misunderstand the above logic? Please review the updated code.
There was a problem hiding this comment.
I think the number of page runs is most likely 1. I previous comment on the expected page run is incorrect. But you need to update allocation object as newRun() will insert a new run which update allocation_: AllocationPool::allocationAt(currentRunIndex()). Then it suppose all the newRun should have only one pageRun.
| const size_t kExceedLargestSizeClass = instance_->largestSizeClass() + 1; | ||
| AllocationPool pool(pool_.get()); | ||
| const auto* allocation = pool.allocationAt(0); | ||
| pool.newRun(1 * AllocationTraits::kPageSize); |
There was a problem hiding this comment.
nit: AllocationTraits::pageBytes(numPages)
| } | ||
|
|
||
| TEST_P(MemoryAllocatorTest, exceedLargestSizeClass) { | ||
| const size_t kExceedLargestSizeClass = instance_->largestSizeClass() + 1; |
There was a problem hiding this comment.
I think the number of page runs is most likely 1. I previous comment on the expected page run is incorrect. But you need to update allocation object as newRun() will insert a new run which update allocation_: AllocationPool::allocationAt(currentRunIndex()). Then it suppose all the newRun should have only one pageRun.
|
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@xiaoxmeng Can we merge this PR? |
|
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@xiaoxmeng Here's the stacktrace. It's triggered when calling |
|
@xiaoxmeng Do you have further comments? Can you provide the internal linter error for me to fix that? |
Summary: This function is used in Spark Runtime Filters: apache/spark#35789 https://docs.google.com/document/d/16IEuyLeQlubQkH8YuVuXWKo2-grVIoDJqQpHZrE7q04/edit#heading=h.4v65wq7vzy4q BloomFilter implementation in Velox is different from Spark, hence, serialized BloomFilter is different. Velox has memory limit for contiguous memory buffer, hence BloomFilter capacity is less than in Spark when numBits is large. See #4713 (comment) Spark allows for changing the defaults while Velox does not. See also #3342 Fixes #3694 Pull Request resolved: #4028 Reviewed By: Yuhta Differential Revision: D46352733 Pulled By: mbasmanova fbshipit-source-id: 1c8a0b489a736e627ba2c0869688fc0cf46279bb
|
@xiaoxmeng Sorry for not catching up this PR for long time. Do you still have further comments? We still need this fix. |
|
@xiaoxmeng @mbasmanova Any issue to follow up on the PR? It's a bug fix of a Gluten case |
|
@FelixYBW Would you please rebase? |
|
Close this one since there are a lot of changes on main branch and we no longer need this PR. |
When directly invoking AllocationPoll::newRun like code piece below:
velox/velox/common/memory/HashStringAllocator.cpp
Line 154 in 35ab8cc
If the value of param
neededexceed largestSizeClass, the program fails with exception thrownThis is caused by passing the
numPagesas theminSizeClassand it fails the check herevelox/velox/common/memory/MemoryAllocator.cpp
Lines 51 to 56 in 35ab8cc