Skip to content

Fix AllocationPoll::newRun fails when requests exceed largestSizeClass#4713

Closed
marin-ma wants to merge 5 commits intofacebookincubator:mainfrom
marin-ma:fix-new-run
Closed

Fix AllocationPoll::newRun fails when requests exceed largestSizeClass#4713
marin-ma wants to merge 5 commits intofacebookincubator:mainfrom
marin-ma:fix-new-run

Conversation

@marin-ma
Copy link
Collaborator

When directly invoking AllocationPoll::newRun like code piece below:

If the value of param needed exceed largestSizeClass, the program fails with exception thrown

E0423 11:29:24.637616 43478 Exceptions.h:68] Line: /home/rong/github/oap-project/velox/velox/common/memory/MemoryAllocator.cpp:51, Function:allocationSize, Expression: minSizeClass <= sizeClassSizes_.back() (257 vs. 256) Requesting minimum size 257 larger than largest size class 256, Source: RUNTIME, ErrorCode: INVALID_STATE

This is caused by passing the numPages as the minSizeClass and it fails the check here

VELOX_CHECK_LE(
minSizeClass,
sizeClassSizes_.back(),
"Requesting minimum size {} larger than largest size class {}",
minSizeClass,
sizeClassSizes_.back());

@netlify
Copy link

netlify bot commented Apr 23, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 9fc2d56
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/64b90f15b013450008b3795c

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2023
@marin-ma
Copy link
Collaborator Author

cc @zhejiangxiaomai

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marin-ma thanks for the fix!

}
pool_->allocateNonContiguous(
std::max<int32_t>(kMinPages, numPages), allocation_, numPages);
std::max<int32_t>(kMinPages, numPages), allocation_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pool_->allocateNonContiguous(
        std::max<int32_t>(kMinPages, numPages), allocation_, std::min(numPages, pool_->largestSizeClass()));

}

TEST_P(MemoryAllocatorTest, exceedLargestSizeClass) {
const size_t kExceedLargestSizeClass = instance_->largestSizeClass() + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test with different allocation size based on the class size and check the number of page runs in the resulting allocation?

AllocationPool::allocationAt(0)?
1, => expect one page run?
first size class + 1 => expect two page runs?
second size class + 1 => expect two page runs?
...
largestSizeClass() + 1 => expect two page runs?

You can structure the test with a while loop and each iteration test different size + expected page run? Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments! I tried to add these checks but I found in most cases allocation->numRuns() == 1. Is it as expected, or did I misunderstand the above logic? Please review the updated code.

Copy link
Contributor

@xiaoxmeng xiaoxmeng Apr 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the number of page runs is most likely 1. I previous comment on the expected page run is incorrect. But you need to update allocation object as newRun() will insert a new run which update allocation_: AllocationPool::allocationAt(currentRunIndex()). Then it suppose all the newRun should have only one pageRun.

const size_t kExceedLargestSizeClass = instance_->largestSizeClass() + 1;
AllocationPool pool(pool_.get());
const auto* allocation = pool.allocationAt(0);
pool.newRun(1 * AllocationTraits::kPageSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AllocationTraits::pageBytes(numPages)

}

TEST_P(MemoryAllocatorTest, exceedLargestSizeClass) {
const size_t kExceedLargestSizeClass = instance_->largestSizeClass() + 1;
Copy link
Contributor

@xiaoxmeng xiaoxmeng Apr 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the number of page runs is most likely 1. I previous comment on the expected page run is incorrect. But you need to update allocation object as newRun() will insert a new run which update allocation_: AllocationPool::allocationAt(currentRunIndex()). Then it suppose all the newRun should have only one pageRun.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@marin-ma
Copy link
Collaborator Author

@xiaoxmeng Can we merge this PR?

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marin-ma thanks for the update!

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@xiaoxmeng
Copy link
Contributor

@marin-ma thanks for the fix!

@marin-ma I chat with @oerling about this. This is intentionally designed so. Can you provide the call stack that run into this issue? Thanks!

@marin-ma
Copy link
Collaborator Author

marin-ma commented Apr 25, 2023

@xiaoxmeng Here's the stacktrace. It's triggered when calling facebook::velox::HashStringAllocator::allocate(int, bool) and the requested allocation size is equal to largeSizeClass (256). In this call stack more space is required because of allocating HashStringAllocator::Header. So the final request size is 257 pages.

 0# facebook::velox::memory::MemoryAllocator::allocationSize(unsigned long, unsigned long) const in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 1# 0x00007FEF029A7E9C in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 2# facebook::velox::memory::MemoryPoolImpl::allocateNonContiguous(unsigned long, facebook::velox::memory::Allocation&, unsigned long) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 3# facebook::velox::AllocationPool::newRunImpl(unsigned long) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 4# facebook::velox::HashStringAllocator::newSlab(int) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 5# facebook::velox::HashStringAllocator::allocate(int, bool) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 6# std::vector<unsigned long, facebook::velox::StlAllocator<unsigned long> >::_M_default_append(unsigned long) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 7# facebook::velox::BloomFilter<facebook::velox::StlAllocator<unsigned long> >::merge(char const*) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 8# facebook::velox::functions::sparksql::makeMightContain(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<facebook::velox::exec::VectorFunctionArg, std::allocator<facebook::velox::exec::VectorFunctionArg> > const&) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 9# std::_Function_handler<std::shared_ptr<facebook::velox::exec::VectorFunction> (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<facebook::velox::exec::VectorFunctionArg, std::allocator<facebook::velox::exec::VectorFunctionArg> > const&), std::shared_ptr<facebook::velox::exec::VectorFunction> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<facebook::velox::exec::VectorFunctionArg, std::allocator<facebook::velox::exec::VectorFunctionArg> > const&)>::_M_invoke(std::_Any_data const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<facebook::velox::exec::VectorFunctionArg, std::allocator<facebook::velox::exec::VectorFunctionArg> > const&) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
10# facebook::velox::exec::getVectorFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::shared_ptr<facebook::velox::Type const>, std::allocator<std::shared_ptr<facebook::velox::Type const> > > const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > > const&) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
11# 0x00007FEF027AF7BD in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
12# 0x00007FEF027ADAB7 in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
13# facebook::velox::exec::compileExpressions(std::vector<std::shared_ptr<facebook::velox::core::ITypedExpr const>, std::allocator<std::shared_ptr<facebook::velox::core::ITypedExpr const> > > const&, facebook::velox::core::ExecCtx*, facebook::velox::exec::ExprSet*, bool) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
14# facebook::velox::exec::ExprSet::ExprSet(std::vector<std::shared_ptr<facebook::velox::core::ITypedExpr const>, std::allocator<std::shared_ptr<facebook::velox::core::ITypedExpr const> > > const&, facebook::velox::core::ExecCtx*, bool) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
15# 0x00007FEF00A74440 in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
16# facebook::velox::connector::hive::HiveDataSource::HiveDataSource(std::shared_ptr<facebook::velox::RowType const> const&, std::shared_ptr<facebook::velox::connector::ConnectorTableHandle> const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<facebook::velox::connector::ColumnHandle>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<facebook::velox::connector::ColumnHandle> > > > const&, facebook::velox::CachedFactory<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, facebook::velox::FileHandle, facebook::velox::FileHandleGenerator, facebook::velox::FileHandleSizer, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, facebook::velox::memory::MemoryPool*, facebook::velox::connector::ExpressionEvaluator*, facebook::velox::memory::MemoryAllocator*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, folly::Executor*) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
17# facebook::velox::connector::hive::HiveConnector::createDataSource(std::shared_ptr<facebook::velox::RowType const> const&, std::shared_ptr<facebook::velox::connector::ConnectorTableHandle> const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<facebook::velox::connector::ColumnHandle>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<facebook::velox::connector::ColumnHandle> > > > const&, facebook::velox::connector::ConnectorQueryCtx*) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
18# facebook::velox::exec::TableScan::getOutput() in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
19# facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
20# facebook::velox::exec::Driver::next(std::shared_ptr<facebook::velox::exec::BlockingState>&) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
21# facebook::velox::exec::Task::next(folly::SemiFuture<folly::Unit>*) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
22# gluten::WholeStageResultIterator::Next() in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
23# arrow::Result<std::shared_ptr<gluten::ColumnarBatch> > arrow::Iterator<std::shared_ptr<gluten::ColumnarBatch> >::Next<gluten::ResultIterator::Wrapper<gluten::WholeStageResultIteratorFirstStage> >(void*) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
24# gluten::ResultIterator::GetNext() in ./generic_benchmark

@marin-ma
Copy link
Collaborator Author

@xiaoxmeng Do you have further comments? Can you provide the internal linter error for me to fix that?

facebook-github-bot pushed a commit that referenced this pull request Jun 7, 2023
Summary:
This function is used in Spark Runtime Filters: apache/spark#35789

https://docs.google.com/document/d/16IEuyLeQlubQkH8YuVuXWKo2-grVIoDJqQpHZrE7q04/edit#heading=h.4v65wq7vzy4q

BloomFilter implementation in Velox is different from Spark, hence, serialized BloomFilter is different.

Velox has memory limit for contiguous memory buffer, hence BloomFilter capacity is less than in Spark when numBits is large. See #4713 (comment)

Spark allows for changing the defaults while Velox does not.

See also #3342

Fixes #3694

Pull Request resolved: #4028

Reviewed By: Yuhta

Differential Revision: D46352733

Pulled By: mbasmanova

fbshipit-source-id: 1c8a0b489a736e627ba2c0869688fc0cf46279bb
@marin-ma
Copy link
Collaborator Author

marin-ma commented Jul 3, 2023

@xiaoxmeng Sorry for not catching up this PR for long time. Do you still have further comments? We still need this fix.

@FelixYBW
Copy link

@xiaoxmeng @mbasmanova Any issue to follow up on the PR? It's a bug fix of a Gluten case

@mbasmanova
Copy link
Contributor

@FelixYBW Would you please rebase?

@marin-ma marin-ma closed this Aug 10, 2023
@marin-ma
Copy link
Collaborator Author

Close this one since there are a lot of changes on main branch and we no longer need this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants