-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-6396: [C++] Add overloads of Boolean kernels implementing Kleene logic #5771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. This looks neat, but a bunch of comments below.
|
By the way, what benchmark results do you get? |
|
Word-wise AND is slower than BitmapAnd (7.4GB/s vs 8.9GB/s) for zero offset, but much faster for non zero offset (4.5GB/s vs 23.5MB/s) |
Nifty. It's actually a bit surprising that BitmapAnd is faster for zero offset, perhaps the compiler is making better sense of a less abstract code? if any case, it may be worth using the new facility inside BitmapAnd. |
|
BitmapAnd has a byte-wise code path which kicks in for zero offset, |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, but it seems very useful to have the Bitmap interface. There are some places in parquet-cpp where we might prefer to use this instead of passing the bitmap components (buffer, length, offset) piecemeal
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more comments.
|
Locally I've tried this with 128 and 256 bit wide intrinsics but still can't recover the speed of the autovectorized BitmapAnd when offset == 0. This bears investigation in a follow up patch; word wise visit should be just as fast for sufficiently large words |
|
Test is failing for mingw32 only https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/28727695/job/gjlopg5jeu87n4b3#L2683 |
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't assume 64bit size for size_t on 32bit OS.
|
I can reproduce this on 32bit Linux: $ docker run -it --rm --cap-add=SYS_PTRACE --security-opt="seccomp=unconfined" --volume ${PWD}:/host i386/debian:buster
# apt install -y -V cmake ninja-build g++ gdb
# mkdir -p build
# cd build
# cmake /host/cpp -G Ninja -DCMAKE_INSTALL_PREFIX=/tmp/local -DCMAKE_BUILD_TYPE=debug -DARROW_EXTRA_ERROR_CONTEXT=on -DARROW_BUILD_TESTS=on
# ninja
# gdb --args debug/arrow-bit-util-test
GNU gdb (Debian 8.2.1-2+b1) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from build/debug/arrow-bit-util-test...done.
(gdb) r
Starting program: /root/build/debug/arrow-bit-util-test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
[New Thread 0xf51ffb40 (LWP 14094)]
Running main() from /root/build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc
[==========] Running 38 tests from 12 test cases.
[----------] Global test environment set-up.
[----------] 7 tests from BitUtilTests
[ RUN ] BitUtilTests.TestIsMultipleOf64
[ OK ] BitUtilTests.TestIsMultipleOf64 (0 ms)
[ RUN ] BitUtilTests.TestNextPower2
[ OK ] BitUtilTests.TestNextPower2 (0 ms)
[ RUN ] BitUtilTests.TestCountSetBits
[ OK ] BitUtilTests.TestCountSetBits (5 ms)
[ RUN ] BitUtilTests.TestSetBitsTo
[ OK ] BitUtilTests.TestSetBitsTo (0 ms)
[ RUN ] BitUtilTests.TestCopyBitmap
[ OK ] BitUtilTests.TestCopyBitmap (6 ms)
[ RUN ] BitUtilTests.TestCopyBitmapPreAllocated
[ OK ] BitUtilTests.TestCopyBitmapPreAllocated (56 ms)
[ RUN ] BitUtilTests.TestCopyAndInvertBitmapPreAllocated
[ OK ] BitUtilTests.TestCopyAndInvertBitmapPreAllocated (55 ms)
[----------] 7 tests from BitUtilTests (122 ms total)
[----------] 2 tests from BitmapReader
[ RUN ] BitmapReader.NormalOperation
[ OK ] BitmapReader.NormalOperation (0 ms)
[ RUN ] BitmapReader.DoesNotReadOutOfBounds
[ OK ] BitmapReader.DoesNotReadOutOfBounds (0 ms)
[----------] 2 tests from BitmapReader (0 ms total)
[----------] 2 tests from BitmapWriter
[ RUN ] BitmapWriter.NormalOperation
[ OK ] BitmapWriter.NormalOperation (0 ms)
[ RUN ] BitmapWriter.DoesNotWriteOutOfBounds
[ OK ] BitmapWriter.DoesNotWriteOutOfBounds (0 ms)
[----------] 2 tests from BitmapWriter (0 ms total)
[----------] 1 test from FirstTimeBitmapWriter
[ RUN ] FirstTimeBitmapWriter.NormalOperation
[ OK ] FirstTimeBitmapWriter.NormalOperation (0 ms)
[----------] 1 test from FirstTimeBitmapWriter (0 ms total)
[----------] 1 test from TestGenerateBits/0, where TypeParam = arrow::GenerateBitsFunctor
[ RUN ] TestGenerateBits/0.NormalOperation
[ OK ] TestGenerateBits/0.NormalOperation (3 ms)
[----------] 1 test from TestGenerateBits/0 (3 ms total)
[----------] 1 test from TestGenerateBits/1, where TypeParam = arrow::GenerateBitsUnrolledFunctor
[ RUN ] TestGenerateBits/1.NormalOperation
[ OK ] TestGenerateBits/1.NormalOperation (3 ms)
[----------] 1 test from TestGenerateBits/1 (3 ms total)
[----------] 1 test from TestVisitBits/0, where TypeParam = arrow::VisitBitsFunctor
[ RUN ] TestVisitBits/0.NormalOperation
[ OK ] TestVisitBits/0.NormalOperation (2 ms)
[----------] 1 test from TestVisitBits/0 (2 ms total)
[----------] 1 test from TestVisitBits/1, where TypeParam = arrow::VisitBitsUnrolledFunctor
[ RUN ] TestVisitBits/1.NormalOperation
[ OK ] TestVisitBits/1.NormalOperation (1 ms)
[----------] 1 test from TestVisitBits/1 (1 ms total)
[----------] 3 tests from BitmapOp
[ RUN ] BitmapOp.And
[ OK ] BitmapOp.And (6 ms)
[ RUN ] BitmapOp.Or
[ OK ] BitmapOp.Or (6 ms)
[ RUN ] BitmapOp.XorAligned
[ OK ] BitmapOp.XorAligned (6 ms)
[----------] 3 tests from BitmapOp (18 ms total)
[----------] 14 tests from BitUtil
[ RUN ] BitUtil.CeilDiv
[ OK ] BitUtil.CeilDiv (0 ms)
[ RUN ] BitUtil.RoundUp
[ OK ] BitUtil.RoundUp (0 ms)
[ RUN ] BitUtil.RoundDown
[ OK ] BitUtil.RoundDown (1 ms)
[ RUN ] BitUtil.CoveringBytes
[ OK ] BitUtil.CoveringBytes (0 ms)
[ RUN ] BitUtil.TrailingBits
[ OK ] BitUtil.TrailingBits (0 ms)
[ RUN ] BitUtil.ByteSwap
[ OK ] BitUtil.ByteSwap (0 ms)
[ RUN ] BitUtil.Log2
[ OK ] BitUtil.Log2 (0 ms)
[ RUN ] BitUtil.NumRequiredBits
[ OK ] BitUtil.NumRequiredBits (0 ms)
[ RUN ] BitUtil.CountLeadingZeros
[ OK ] BitUtil.CountLeadingZeros (0 ms)
[ RUN ] BitUtil.CountTrailingZeros
[ OK ] BitUtil.CountTrailingZeros (0 ms)
[ RUN ] BitUtil.RoundUpToPowerOf2
[ OK ] BitUtil.RoundUpToPowerOf2 (0 ms)
[ RUN ] BitUtil.RoundTripLittleEndianTest
[ OK ] BitUtil.RoundTripLittleEndianTest (0 ms)
[ RUN ] BitUtil.RoundTripBigEndianTest
[ OK ] BitUtil.RoundTripBigEndianTest (0 ms)
[ RUN ] BitUtil.BitsetStack
[ OK ] BitUtil.BitsetStack (0 ms)
[----------] 14 tests from BitUtil (1 ms total)
[----------] 1 test from BitStreamUtil
[ RUN ] BitStreamUtil.ZigZag
[ OK ] BitStreamUtil.ZigZag (0 ms)
[----------] 1 test from BitStreamUtil (0 ms total)
[----------] 4 tests from Bitmap
[ RUN ] Bitmap.ShiftingWordsOptimization
[ OK ] Bitmap.ShiftingWordsOptimization (2 ms)
[ RUN ] Bitmap.VisitWords
arrow-bit-util-test: /host/cpp/src/arrow/util/bit_util.h:931: arrow::internal::Bitmap::VisitWords(const arrow::internal::Bitmap (&)[N], Visitor&&) [with unsigned int N = 1; Visitor = arrow::internal::Copy(const arrow::internal::Bitmap&, std::shared_ptr<arrow::Buffer>)::<lambda(std::array<long long unsigned int, 1>)>; Word = long long unsigned int; int64_t = long long int]::<lambda(int64_t)>: Assertion `offsets[i] >= 0 && offsets[i] < kBitWidth' failed.
Thread 1 "arrow-bit-util-" received signal SIGABRT, Aborted.
0xf7fd3939 in __kernel_vsyscall ()
(gdb) bt full
#0 0xf7fd3939 in __kernel_vsyscall ()
No symbol table info available.
#1 0xf55ed382 in __libc_signal_restore_set (set=0xffffd3cc)
at ../sysdeps/unix/sysv/linux/internal-signals.h:84
resultvar = <optimized out>
#2 __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
set = {__val = {0, 0, 1, 4118532816, 0, 1448984284, 8, 4294956056,
1448816952, 1448984284, 0, 4160737280, 0, 1448984284,
4294956127, 64, 4294956480, 241, 1, 0, 1449002064, 14, 239, 0,
4116464556, 4118532816, 1448816972, 288, 1384223744,
4118383840, 0, 4294956288}}
pid = <optimized out>
tid = <optimized out>
ret = 0
#3 0xf55d72b6 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x4d5f434c,
sa_sigaction = 0x4d5f434c}, sa_mask = {__val = {1095979845,
793986375, 1384223744, 7302446, 4118384984, 4116603136,
4118380544, 4118383840, 4294956568, 4294956536, 4116860659,
4118383840, 4118016000, 4294956568, 4118380544, 465, 0, 0,
1449001600, 28, 463, 4294956544, 0, 3, 4096, 4117446388,
1384223744, 1448867232, 0, 4118383840, 4118380544,
4116365312}}, sa_flags = 0,
sa_restorer = 0xf5798ce0 <_IO_2_1_stderr_>}
sigs = {__val = {32, 0 <repeats 31 times>}}
#4 0xf55d71c1 in __assert_fail_base (
fmt=0xf5740250 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=0x565be3d4 "offsets[i] >= 0 && offsets[i] < kBitWidth",
file=0x565be3b0 "/host/cpp/src/arrow/util/bit_util.h", line=931,
function=0x565bf5a0 <arrow::internal::Bitmap::VisitWords<1u, arrow::internal::Copy(arrow::internal::Bitmap const&, std::shared_ptr<arrow::Buffer>)::{lambda(std::array<unsigned long long, 1u>)#1}, unsigned long long>(arrow::inte--Type <RET> for more, q to quit, c to continue without paging--
rnal::Bitmap const (&) [1u], arrow::internal::Copy(arrow::internal::Bitmap const&, std::shared_ptr<arrow::Buffer>)::{lambda(std::array<unsigned long long, 1u>)#1}&&)::{lambda(long long)#1}::operator()(long long) const::__PRETTY_FUNCTION__> "arrow::internal::Bitmap::VisitWords(const arrow::internal::Bitmap (&)[N], Visitor&&) [with unsigned int N = 1; Visitor = arrow::internal::Copy(const arrow::internal::Bitmap&, std::shared_ptr<arrow::Bu"...) at assert.c:92
str = 0x565e0280 ""
total = 4096
#5 0xf55e5339 in __GI___assert_fail (
assertion=0x565be3d4 "offsets[i] >= 0 && offsets[i] < kBitWidth",
file=0x565be3b0 "/host/cpp/src/arrow/util/bit_util.h", line=931,
function=0x565bf5a0 <arrow::internal::Bitmap::VisitWords<1u, arrow::internal::Copy(arrow::internal::Bitmap const&, std::shared_ptr<arrow::Buffer>)::{lambda(std::array<unsigned long long, 1u>)#1}, unsigned long long>(arrow::internal::Bitmap const (&) [1u], arrow::internal::Copy(arrow::internal::Bitmap const&, std::shared_ptr<arrow::Buffer>)::{lambda(std::array<unsigned long long, 1u>)#1}&&)::{lambda(long long)#1}::operator()(long long) const::__PRETTY_FUNCTION__> "arrow::internal::Bitmap::VisitWords(const arrow::internal::Bitmap (&)[N], Visitor&&) [with unsigned int N = 1; Visitor = arrow::internal::Copy(const arrow::internal::Bitmap&, std::shared_ptr<arrow::Bu"...)
at assert.c:101
No locals.
#6 0x5659d820 in arrow::internal::Bitmap::<lambda(int64_t)>::operator()(int64_t) const (this=0xffffd718, consumed_bits=64)
at /host/cpp/src/arrow/util/bit_util.h:931
i = 0
__PRETTY_FUNCTION__ = "arrow::internal::Bitmap::VisitWords(const arrow::internal::Bitmap (&)[N], Visitor&&) [with unsigned int N = 1; Visitor = arrow::internal::Copy(const arrow::internal::Bitmap&, std::shared_ptr<arrow::Bu"...
kBitWidth = <optimized out>
bit_length = @0xffffd730: 64
bitmaps = @0xffffd740: {
--Type <RET> for more, q to quit, c to continue without paging--
{<arrow::util::ToStringOstreamable<arrow::internal::Bitmap>> = {<No data fields>}, <arrow::util::EqualityComparable<arrow::internal::Bitmap>> = {<No data fields>}, buffer_ =
std::shared_ptr<arrow::Buffer> (use count 4, weak count 0) = {
get() = 0x565dff9c}, offset_ = 64, length_ = 0}}
offsets = @0xffffd738: {4294967296}
words = @0xffffd728: {{data_ = 0xf5b87000, size_ = 1}}
#7 0x5659db49 in arrow::internal::Bitmap::VisitWords<1, arrow::internal::Copy(const arrow::internal::Bitmap&, std::shared_ptr<arrow::Buffer>)::<lambda(std::array<long long unsigned int, 1>)> >(const arrow::internal::Bitmap (&)[1], arrow::internal::<lambda(std::array<long long unsigned int, 1>)> &&) (
bitmaps_arg=..., visitor=...) at /host/cpp/src/arrow/util/bit_util.h:946
leading_bits = 64
__PRETTY_FUNCTION__ = "static int64_t arrow::internal::Bitmap::VisitWords(const arrow::internal::Bitmap (&)[N], Visitor&&) [with unsigned int N = 1; Visitor = arrow::internal::Copy(const arrow::internal::Bitmap&, std::share"...
kBitWidth = 64
bitmaps = {
{<arrow::util::ToStringOstreamable<arrow::internal::Bitmap>> = {<No data fields>}, <arrow::util::EqualityComparable<arrow::internal::Bitmap>> = {<No data fields>},
buffer_ = std::shared_ptr<arrow::Buffer> (use count 4, weak count 0) = {get() = 0x565dff9c}, offset_ = 64, length_ = 0}}
offsets = {4294967296}
bit_length = 64
words = {{data_ = 0xf5b87000, size_ = 1}}
consume = {__bit_length = @0xffffd730, __bitmaps = @0xffffd740,
__offsets = @0xffffd738, __words = @0xffffd728}
visited_words = {_M_elems = {12469403627024359680}}
max_offset = 6222340146476203728
min_offset = -42550241011700
whole_word_count = 6223340113646960348
--Type <RET> for more, q to quit, c to continue without paging--
#8 0x5659b539 in arrow::internal::Copy (bitmap=...,
storage=std::shared_ptr<arrow::Buffer> (use count 2, weak count 0) = {...}) at /host/cpp/src/arrow/util/bit_util_test.cc:1165
i = 1
min_offset = 0
#9 0x5659bad8 in arrow::internal::Bitmap_VisitWords_Test::TestBody (
this=0x565dfe40) at /host/cpp/src/arrow/util/bit_util_test.cc:1187
actual = {<arrow::util::ToStringOstreamable<arrow::internal::Bitmap>> = {<No data fields>}, <arrow::util::EqualityComparable<arrow::internal::Bitmap>> = {<No data fields>},
buffer_ = std::shared_ptr<arrow::Buffer> (use count 2, weak count 0) = {get() = 0x565dbe9c}, offset_ = 0, length_ = 63}
num_bits = 64
__for_range = @0xffffd8d8: {_M_array = 0xffffd878, _M_len = 8}
__for_begin = 0xffffd890
__for_end = 0xffffd8b8
offset = 0
__for_range = @0xffffd8e8: {_M_array = 0x565befc0 <._177>,
_M_len = 5}
__for_begin = 0x565befc0 <._177>
__for_end = 0x565befd4
nbytes = 1024
buffer = std::shared_ptr<arrow::Buffer> (use count 4, weak count 0) = {get() = 0x565dff9c}
actual_buffer = std::shared_ptr<arrow::Buffer> (use count 2, weak count 0) = {get() = 0x565dbe9c}
kBitWidth = 64
#10 0xf5b42e2a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x565dfe40,
method=&virtual testing::Test::TestBody(),
location=0xf5b53d1f "the test body")
at /root/build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2443
--Type <RET> for more, q to quit, c to continue without paging--
No locals.
#11 0xf5b3cb6d in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x565dfe40,
method=&virtual testing::Test::TestBody(),
location=0xf5b53d1f "the test body")
at /root/build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2479
No locals.
#12 0xf5b18c37 in testing::Test::Run (this=0x565dfe40)
at /root/build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2517
impl = 0x565dc610
#13 0xf5b195aa in testing::TestInfo::Run (this=0x565df640)
at /root/build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2693
impl = 0x565dc610
repeater = 0x565dc4f0
start = 1573336395475
test = 0x565dfe40
#14 0xf5b19c77 in testing::TestCase::Run (this=0x565df570)
at /root/build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2811
i = 1
impl = 0x565dc610
repeater = 0x565dc4f0
start = 1573336395473
#15 0xf5b25bf5 in testing::internal::UnitTestImpl::RunAllTests (
this=0x565dc610)
at /root/build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:5177
test_index = 11
start = 1573336395323
i = 0
--Type <RET> for more, q to quit, c to continue without paging--
gtest_is_initialized_before_run_all_tests = true
in_subprocess_for_death_test = false
should_shard = false
has_tests_to_run = true
failed = false
repeater = 0x565dc4f0
repeat = 1
forever = false
#16 0xf5b43e79 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x565dc610,
method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0xf5b25940 <testing::internal::UnitTestImpl::RunAllTests()>,
location=0xf5b54624 "auxiliary test code (environments or event listeners)")
at /root/build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2443
No locals.
#17 0xf5b3d9ac in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x565dc610,
method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0xf5b25940 <testing::internal::UnitTestImpl::RunAllTests()>,
location=0xf5b54624 "auxiliary test code (environments or event listeners)")
at /root/build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2479
No locals.
#18 0xf5b24581 in testing::UnitTest::Run (
this=0xf5b78480 <testing::UnitTest::GetInstance()::instance>)
at /root/build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:4786
in_death_test_child_process = false
--Type <RET> for more, q to quit, c to continue without paging--
premature_exit_file = {premature_exit_filepath_ = ""}
#19 0xf5b7a2a1 in RUN_ALL_TESTS ()
at /root/build/googletest_ep-prefix/src/googletest_ep/googletest/include/gtest/gtest.h:2341
No locals.
#20 0xf5b7a1f8 in main (argc=1, argv=0xffffde04)
at /root/build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc:36
No locals.
#21 0xf55d8b41 in __libc_start_main (main=0xf5b7a1a9 <main(int, char**)>,
argc=1, argv=0xffffde04, init=0x565bb130 <__libc_csu_init>,
fini=0x565bb190 <__libc_csu_fini>, rtld_fini=0xf7fe4520 <_dl_fini>,
stack_end=0xffffddfc) at ../csu/libc-start.c:308
self = <optimized out>
result = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -176586752,
-176586752, 0, 1710948255, 559290762},
mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x1,
0x565866e0 <_start>}, data = {prev = 0x0, cleanup = 0x0,
canceltype = 1}}}
not_first_call = <optimized out>
#22 0x56586711 in _start ()
No symbol table info available. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, will merge when green
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, will merge when green
The utility classes I needed to write for this impinged on ARROW-4722