-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](arrow-flight-sql) Fix Doris NULL column conversion to arrow batch #43929
Conversation
BE UT will be refactored soon. |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
30ce839
to
7e4941e
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGBTM
…ch (#43929) ### What problem does this PR solve? Problem Summary: The representation of NULL columns in Doris is special, which is `DataTypeNull<DataTypeNumber::Uint8>`. `Uint8` uses `arrow::BooleanBuilder` when serializing into arrow batch, which does not match the expected `arrow::NullBuilder`. Fix: ``` *** Query id: fd32741526804c1e-bc016473fd8f3aa3 *** *** is nereids: 1 *** *** tablet id: 0 *** *** Aborted at 1731327262 (unix time) try "date -d @1731327262" if you are using GNU date *** *** Current BE git commitID: 653e315 *** *** SIGSEGV address not mapped to object (@0x100000024) received by PID 1442863 (TID 1443456 OR 0x7f8b8cdea700) from PID 36; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so 2# JVM_handle_linux_signal in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so 3# 0x00007F8CA1F38B50 in /lib64/libc.so.6 4# 0x000055FC45E5B2D3 in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be 5# arrow::BooleanBuilder::AppendValues(unsigned char const*, long, unsigned char const*) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be 6# doris::vectorized::DataTypeNumberSerDe<unsigned char>::write_column_to_arrow(doris::vectorized::IColumn const&, doris::vectorized::PODArray<unsigned char, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 15ul, 16ul> const*, arrow::ArrayBuilder*, int, int, cctz::time_zone const&) const at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/data_types/serde/data_type_number_serde.cpp:86 7# doris::FromBlockConverter::convert(std::shared_ptr<arrow::RecordBatch>*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/util/arrow/block_convertor.cpp:390 8# doris::convert_to_arrow_batch(doris::vectorized::Block const&, std::shared_ptr<arrow::Schema> const&, arrow::MemoryPool*, std::shared_ptr<arrow::RecordBatch>*, cctz::time_zone const&) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be 9# doris::vectorized::VArrowFlightResultWriter::write(doris::vectorized::Block&) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/varrow_flight_result_writer.cpp:76 10# doris::vectorized::VResultSink::send(doris::RuntimeState*, doris::vectorized::Block*, bool) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/vresult_sink.cpp:149 11# doris::PlanFragmentExecutor::open_vectorized_internal() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:341 12# doris::PlanFragmentExecutor::open() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:273 ```
…ch (#43929) ### What problem does this PR solve? Problem Summary: The representation of NULL columns in Doris is special, which is `DataTypeNull<DataTypeNumber::Uint8>`. `Uint8` uses `arrow::BooleanBuilder` when serializing into arrow batch, which does not match the expected `arrow::NullBuilder`. Fix: ``` *** Query id: fd32741526804c1e-bc016473fd8f3aa3 *** *** is nereids: 1 *** *** tablet id: 0 *** *** Aborted at 1731327262 (unix time) try "date -d @1731327262" if you are using GNU date *** *** Current BE git commitID: 653e315 *** *** SIGSEGV address not mapped to object (@0x100000024) received by PID 1442863 (TID 1443456 OR 0x7f8b8cdea700) from PID 36; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so 2# JVM_handle_linux_signal in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so 3# 0x00007F8CA1F38B50 in /lib64/libc.so.6 4# 0x000055FC45E5B2D3 in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be 5# arrow::BooleanBuilder::AppendValues(unsigned char const*, long, unsigned char const*) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be 6# doris::vectorized::DataTypeNumberSerDe<unsigned char>::write_column_to_arrow(doris::vectorized::IColumn const&, doris::vectorized::PODArray<unsigned char, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 15ul, 16ul> const*, arrow::ArrayBuilder*, int, int, cctz::time_zone const&) const at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/data_types/serde/data_type_number_serde.cpp:86 7# doris::FromBlockConverter::convert(std::shared_ptr<arrow::RecordBatch>*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/util/arrow/block_convertor.cpp:390 8# doris::convert_to_arrow_batch(doris::vectorized::Block const&, std::shared_ptr<arrow::Schema> const&, arrow::MemoryPool*, std::shared_ptr<arrow::RecordBatch>*, cctz::time_zone const&) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be 9# doris::vectorized::VArrowFlightResultWriter::write(doris::vectorized::Block&) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/varrow_flight_result_writer.cpp:76 10# doris::vectorized::VResultSink::send(doris::RuntimeState*, doris::vectorized::Block*, bool) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/vresult_sink.cpp:149 11# doris::PlanFragmentExecutor::open_vectorized_internal() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:341 12# doris::PlanFragmentExecutor::open() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:273 ```
…ch (apache#43929) ### What problem does this PR solve? Problem Summary: The representation of NULL columns in Doris is special, which is `DataTypeNull<DataTypeNumber::Uint8>`. `Uint8` uses `arrow::BooleanBuilder` when serializing into arrow batch, which does not match the expected `arrow::NullBuilder`. Fix: ``` *** Query id: fd32741526804c1e-bc016473fd8f3aa3 *** *** is nereids: 1 *** *** tablet id: 0 *** *** Aborted at 1731327262 (unix time) try "date -d @1731327262" if you are using GNU date *** *** Current BE git commitID: 653e315 *** *** SIGSEGV address not mapped to object (@0x100000024) received by PID 1442863 (TID 1443456 OR 0x7f8b8cdea700) from PID 36; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so 2# JVM_handle_linux_signal in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so 3# 0x00007F8CA1F38B50 in /lib64/libc.so.6 4# 0x000055FC45E5B2D3 in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be 5# arrow::BooleanBuilder::AppendValues(unsigned char const*, long, unsigned char const*) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be 6# doris::vectorized::DataTypeNumberSerDe<unsigned char>::write_column_to_arrow(doris::vectorized::IColumn const&, doris::vectorized::PODArray<unsigned char, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 15ul, 16ul> const*, arrow::ArrayBuilder*, int, int, cctz::time_zone const&) const at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/data_types/serde/data_type_number_serde.cpp:86 7# doris::FromBlockConverter::convert(std::shared_ptr<arrow::RecordBatch>*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/util/arrow/block_convertor.cpp:390 8# doris::convert_to_arrow_batch(doris::vectorized::Block const&, std::shared_ptr<arrow::Schema> const&, arrow::MemoryPool*, std::shared_ptr<arrow::RecordBatch>*, cctz::time_zone const&) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be 9# doris::vectorized::VArrowFlightResultWriter::write(doris::vectorized::Block&) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/varrow_flight_result_writer.cpp:76 10# doris::vectorized::VResultSink::send(doris::RuntimeState*, doris::vectorized::Block*, bool) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/vresult_sink.cpp:149 11# doris::PlanFragmentExecutor::open_vectorized_internal() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:341 12# doris::PlanFragmentExecutor::open() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:273 ```
### What problem does this PR solve? Problem Summary: add regression test for this pr: #43929 TODO: Support to export data of doris NULL type to orc file format.
### What problem does this PR solve? Problem Summary: add regression test for this pr: #43929 TODO: Support to export data of doris NULL type to orc file format.
### What problem does this PR solve? Problem Summary: add regression test for this pr: #43929 TODO: Support to export data of doris NULL type to orc file format.
…to arrow batch #43929 (#44231) Cherry-picked from #43929 Co-authored-by: Xinyi Zou <[email protected]>
* [pick](branch-2.1) pick apache#38215 (apache#43386) pick apache#38215 --------- Co-authored-by: Zou Xinyi <[email protected]> * pick apache#43929 * pick apache#43960
What problem does this PR solve?
Problem Summary:
The representation of NULL columns in Doris is special, which is
DataTypeNull<DataTypeNumber::Uint8>
.Uint8
usesarrow::BooleanBuilder
when serializing into arrow batch, which does not match the expectedarrow::NullBuilder
.Fix:
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)