Fix AllocationPoll::newRun fails when requests exceed largestSizeClass by marin-ma · Pull Request #4713 · facebookincubator/velox

marin-ma · 2023-04-23T03:36:09Z

When directly invoking AllocationPoll::newRun like code piece below:

velox/velox/common/memory/HashStringAllocator.cpp

Line 154 in 35ab8cc

pool_.newRun(needed);

If the value of param needed exceed largestSizeClass, the program fails with exception thrown

E0423 11:29:24.637616 43478 Exceptions.h:68] Line: /home/rong/github/oap-project/velox/velox/common/memory/MemoryAllocator.cpp:51, Function:allocationSize, Expression: minSizeClass <= sizeClassSizes_.back() (257 vs. 256) Requesting minimum size 257 larger than largest size class 256, Source: RUNTIME, ErrorCode: INVALID_STATE

This is caused by passing the numPages as the minSizeClass and it fails the check here

velox/velox/common/memory/MemoryAllocator.cpp

Lines 51 to 56 in 35ab8cc

    
           VELOX_CHECK_LE( 
        
               minSizeClass, 
        
               sizeClassSizes_.back(), 
        
               "Requesting minimum size {} larger than largest size class {}", 
        
               minSizeClass, 
        
               sizeClassSizes_.back());

netlify · 2023-04-23T03:36:14Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`9fc2d56`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/64b90f15b013450008b3795c

marin-ma · 2023-04-23T03:36:18Z

cc @zhejiangxiaomai

xiaoxmeng

@marin-ma thanks for the fix!

xiaoxmeng · 2023-04-23T05:25:36Z

velox/common/memory/AllocationPool.cpp

    }
    pool_->allocateNonContiguous(
-        std::max<int32_t>(kMinPages, numPages), allocation_, numPages);
+        std::max<int32_t>(kMinPages, numPages), allocation_);


pool_->allocateNonContiguous( std::max<int32_t>(kMinPages, numPages), allocation_, std::min(numPages, pool_->largestSizeClass()));

xiaoxmeng · 2023-04-23T05:30:07Z

velox/common/memory/tests/MemoryAllocatorTest.cpp

 }

+TEST_P(MemoryAllocatorTest, exceedLargestSizeClass) {
+  const size_t kExceedLargestSizeClass = instance_->largestSizeClass() + 1;


Can you test with different allocation size based on the class size and check the number of page runs in the resulting allocation?

AllocationPool::allocationAt(0)? 1, => expect one page run? first size class + 1 => expect two page runs? second size class + 1 => expect two page runs? ... largestSizeClass() + 1 => expect two page runs?

You can structure the test with a while loop and each iteration test different size + expected page run? Thanks!

Thanks for your comments! I tried to add these checks but I found in most cases allocation->numRuns() == 1. Is it as expected, or did I misunderstand the above logic? Please review the updated code.

I think the number of page runs is most likely 1. I previous comment on the expected page run is incorrect. But you need to update allocation object as newRun() will insert a new run which update allocation_: AllocationPool::allocationAt(currentRunIndex()). Then it suppose all the newRun should have only one pageRun.

xiaoxmeng · 2023-04-23T17:02:17Z

velox/common/memory/tests/MemoryAllocatorTest.cpp

+  const size_t kExceedLargestSizeClass = instance_->largestSizeClass() + 1;
+  AllocationPool pool(pool_.get());
+  const auto* allocation = pool.allocationAt(0);
+  pool.newRun(1 * AllocationTraits::kPageSize);


nit: AllocationTraits::pageBytes(numPages)

xiaoxmeng · 2023-04-23T17:05:30Z

velox/common/memory/tests/MemoryAllocatorTest.cpp

 }

+TEST_P(MemoryAllocatorTest, exceedLargestSizeClass) {
+  const size_t kExceedLargestSizeClass = instance_->largestSizeClass() + 1;


I think the number of page runs is most likely 1. I previous comment on the expected page run is incorrect. But you need to update allocation object as newRun() will insert a new run which update allocation_: AllocationPool::allocationAt(currentRunIndex()). Then it suppose all the newRun should have only one pageRun.

facebook-github-bot · 2023-04-23T17:08:05Z

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

marin-ma · 2023-04-24T06:14:22Z

@xiaoxmeng Can we merge this PR?

xiaoxmeng

@marin-ma thanks for the update!

facebook-github-bot · 2023-04-24T17:10:05Z

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xiaoxmeng · 2023-04-24T18:16:39Z

@marin-ma thanks for the fix!

@marin-ma I chat with @oerling about this. This is intentionally designed so. Can you provide the call stack that run into this issue? Thanks!

marin-ma · 2023-04-25T06:57:19Z

@xiaoxmeng Here's the stacktrace. It's triggered when calling facebook::velox::HashStringAllocator::allocate(int, bool) and the requested allocation size is equal to largeSizeClass (256). In this call stack more space is required because of allocating HashStringAllocator::Header. So the final request size is 257 pages.

 0# facebook::velox::memory::MemoryAllocator::allocationSize(unsigned long, unsigned long) const in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 1# 0x00007FEF029A7E9C in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 2# facebook::velox::memory::MemoryPoolImpl::allocateNonContiguous(unsigned long, facebook::velox::memory::Allocation&, unsigned long) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 3# facebook::velox::AllocationPool::newRunImpl(unsigned long) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 4# facebook::velox::HashStringAllocator::newSlab(int) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 5# facebook::velox::HashStringAllocator::allocate(int, bool) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 6# std::vector<unsigned long, facebook::velox::StlAllocator<unsigned long> >::_M_default_append(unsigned long) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 7# facebook::velox::BloomFilter<facebook::velox::StlAllocator<unsigned long> >::merge(char const*) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 8# facebook::velox::functions::sparksql::makeMightContain(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<facebook::velox::exec::VectorFunctionArg, std::allocator<facebook::velox::exec::VectorFunctionArg> > const&) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
 9# std::_Function_handler<std::shared_ptr<facebook::velox::exec::VectorFunction> (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<facebook::velox::exec::VectorFunctionArg, std::allocator<facebook::velox::exec::VectorFunctionArg> > const&), std::shared_ptr<facebook::velox::exec::VectorFunction> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<facebook::velox::exec::VectorFunctionArg, std::allocator<facebook::velox::exec::VectorFunctionArg> > const&)>::_M_invoke(std::_Any_data const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<facebook::velox::exec::VectorFunctionArg, std::allocator<facebook::velox::exec::VectorFunctionArg> > const&) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
10# facebook::velox::exec::getVectorFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::shared_ptr<facebook::velox::Type const>, std::allocator<std::shared_ptr<facebook::velox::Type const> > > const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > > const&) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
11# 0x00007FEF027AF7BD in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
12# 0x00007FEF027ADAB7 in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
13# facebook::velox::exec::compileExpressions(std::vector<std::shared_ptr<facebook::velox::core::ITypedExpr const>, std::allocator<std::shared_ptr<facebook::velox::core::ITypedExpr const> > > const&, facebook::velox::core::ExecCtx*, facebook::velox::exec::ExprSet*, bool) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
14# facebook::velox::exec::ExprSet::ExprSet(std::vector<std::shared_ptr<facebook::velox::core::ITypedExpr const>, std::allocator<std::shared_ptr<facebook::velox::core::ITypedExpr const> > > const&, facebook::velox::core::ExecCtx*, bool) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
15# 0x00007FEF00A74440 in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
16# facebook::velox::connector::hive::HiveDataSource::HiveDataSource(std::shared_ptr<facebook::velox::RowType const> const&, std::shared_ptr<facebook::velox::connector::ConnectorTableHandle> const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<facebook::velox::connector::ColumnHandle>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<facebook::velox::connector::ColumnHandle> > > > const&, facebook::velox::CachedFactory<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, facebook::velox::FileHandle, facebook::velox::FileHandleGenerator, facebook::velox::FileHandleSizer, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, facebook::velox::memory::MemoryPool*, facebook::velox::connector::ExpressionEvaluator*, facebook::velox::memory::MemoryAllocator*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, folly::Executor*) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
17# facebook::velox::connector::hive::HiveConnector::createDataSource(std::shared_ptr<facebook::velox::RowType const> const&, std::shared_ptr<facebook::velox::connector::ConnectorTableHandle> const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<facebook::velox::connector::ColumnHandle>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<facebook::velox::connector::ColumnHandle> > > > const&, facebook::velox::connector::ConnectorQueryCtx*) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
18# facebook::velox::exec::TableScan::getOutput() in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
19# facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
20# facebook::velox::exec::Driver::next(std::shared_ptr<facebook::velox::exec::BlockingState>&) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
21# facebook::velox::exec::Task::next(folly::SemiFuture<folly::Unit>*) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
22# gluten::WholeStageResultIterator::Next() in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
23# arrow::Result<std::shared_ptr<gluten::ColumnarBatch> > arrow::Iterator<std::shared_ptr<gluten::ColumnarBatch> >::Next<gluten::ResultIterator::Wrapper<gluten::WholeStageResultIteratorFirstStage> >(void*) in /home/sparkuser/github/oap-project/gluten/cpp/build/releases/libvelox.so
24# gluten::ResultIterator::GetNext() in ./generic_benchmark

marin-ma · 2023-04-26T06:30:25Z

@xiaoxmeng Do you have further comments? Can you provide the internal linter error for me to fix that?

Summary: This function is used in Spark Runtime Filters: apache/spark#35789 https://docs.google.com/document/d/16IEuyLeQlubQkH8YuVuXWKo2-grVIoDJqQpHZrE7q04/edit#heading=h.4v65wq7vzy4q BloomFilter implementation in Velox is different from Spark, hence, serialized BloomFilter is different. Velox has memory limit for contiguous memory buffer, hence BloomFilter capacity is less than in Spark when numBits is large. See #4713 (comment) Spark allows for changing the defaults while Velox does not. See also #3342 Fixes #3694 Pull Request resolved: #4028 Reviewed By: Yuhta Differential Revision: D46352733 Pulled By: mbasmanova fbshipit-source-id: 1c8a0b489a736e627ba2c0869688fc0cf46279bb

marin-ma · 2023-07-03T01:53:51Z

@xiaoxmeng Sorry for not catching up this PR for long time. Do you still have further comments? We still need this fix.

FelixYBW · 2023-07-20T06:52:21Z

@xiaoxmeng @mbasmanova Any issue to follow up on the PR? It's a bug fix of a Gluten case

mbasmanova · 2023-07-20T10:35:07Z

@FelixYBW Would you please rebase?

marin-ma · 2023-08-10T03:25:18Z

Close this one since there are a lot of changes on main branch and we no longer need this PR.

newRun should successfully allocate when exceeding largestSizeClass

5fc5436

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2023

xiaoxmeng reviewed Apr 23, 2023

View reviewed changes

update UT

7f29cce

xiaoxmeng reviewed Apr 23, 2023

View reviewed changes

update UT

6202b40

xiaoxmeng approved these changes Apr 24, 2023

View reviewed changes

xiaoxmeng mentioned this pull request Apr 25, 2023

Add bloom_filter_agg Spark aggregate function #4028

Closed

Merge branch 'main' into fix-new-run

0c3cb1c

Merge branch 'main' into fix-new-run

9fc2d56

marin-ma closed this Aug 10, 2023

	VELOX_CHECK_LE(
	minSizeClass,
	sizeClassSizes_.back(),
	"Requesting minimum size {} larger than largest size class {}",
	minSizeClass,
	sizeClassSizes_.back());

Conversation

marin-ma commented Apr 23, 2023

Uh oh!

netlify bot commented Apr 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for meta-velox canceled.

Uh oh!

marin-ma commented Apr 23, 2023

Uh oh!

xiaoxmeng left a comment

Choose a reason for hiding this comment

Uh oh!

xiaoxmeng Apr 23, 2023

Choose a reason for hiding this comment

Uh oh!

xiaoxmeng Apr 23, 2023

Choose a reason for hiding this comment

Uh oh!

marin-ma Apr 23, 2023

Choose a reason for hiding this comment

Uh oh!

xiaoxmeng Apr 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaoxmeng Apr 23, 2023

Choose a reason for hiding this comment

Uh oh!

xiaoxmeng Apr 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Apr 23, 2023

Uh oh!

marin-ma commented Apr 24, 2023

Uh oh!

xiaoxmeng left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Apr 24, 2023

Uh oh!

xiaoxmeng commented Apr 24, 2023

Uh oh!

marin-ma commented Apr 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marin-ma commented Apr 26, 2023

Uh oh!

marin-ma commented Jul 3, 2023

Uh oh!

FelixYBW commented Jul 20, 2023

Uh oh!

mbasmanova commented Jul 20, 2023

Uh oh!

marin-ma commented Aug 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

netlify bot commented Apr 23, 2023 •

edited

Loading

xiaoxmeng Apr 23, 2023 •

edited

Loading

xiaoxmeng Apr 23, 2023 •

edited

Loading

marin-ma commented Apr 25, 2023 •

edited

Loading