[Rust] [Parquet] LargeListArray support and why I think the tests are still failing #6

carols10cents · 2020-10-16T21:18:31Z

Hi, I decided to open the PR over here rather than on the apache/arrow repo since I'm making changes to your branch... ~~I did rebase this on the upstream rust-parquet-arrow-writer branch though, so there's some noise in this PR's diff :-/~~

I added support for LargeListArrays in the ListArrayReader, and I added a test in array_reader and it's passing!

However, the roundtrip tests are still failing, and my current suspicion is that something isn't quite right with the test setup, because the null count isn't what I expected it to be. I've documented with comments and assertions (some of which pass and some of which fail) my current reasoning-- I have to go right now but I'll keep investigating later. If you notice something that's obvious to you about the test setup, please let me know!

…r OffsetSize

…-paste

carols10cents · 2020-10-16T21:22:38Z

Just saw that you rebased your branch so I rebased on your branch :)

nevi-me · 2020-10-17T17:06:37Z

Thanks Carol 🙏🏾

I'm merging this, then I'll look at the CI failures

This is a port of apache#6770 to the parquet-writer branch. We'll have more of a chance to test this reader, and ensure that we can roundtrip on list types. [Rust] [Parquet] LargeListArray support and why I think the tests are still failing (#6) * Support reading LargeListArrays by making ListArrayReader generic over OffsetSize * Update comment to match the actual values in this test; probably copy-paste * Document why I think the test setup isn't quite right disable list writer tests They're failing because of incorrect def/rep. Will be addressed separately

From a deadlocked run... ``` #0 0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #3 0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #4 0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #5 0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #6 0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #7 0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #8 0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #9 0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so ``` The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock. To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests. Closes apache#9842 from westonpace/bugfix/arrow-12040 Lead-authored-by: Weston Pace <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>

github-actions bot added lang-rust datafusion labels Oct 16, 2020

carols10cents mentioned this pull request Oct 16, 2020

ARROW-7842: [Rust] [Parquet] Arrow list reader apache/arrow#8449

Closed

carols10cents added 3 commits October 16, 2020 17:22

Support reading LargeListArrays by making ListArrayReader generic ove…

c9f9abf

…r OffsetSize

Update comment to match the actual values in this test; probably copy…

b7336ff

…-paste

Document why I think the test setup isn't quite right

37028e3

carols10cents force-pushed the ARROW-7842-cherry branch from ca9df67 to 37028e3 Compare October 16, 2020 21:22

carols10cents changed the title ~~LargeListArray support and why I think the tests are still failing~~ [Rust] [Parquet] LargeListArray support and why I think the tests are still failing Oct 16, 2020

nevi-me merged this pull request into nevi-me:ARROW-7842-cherry Oct 17, 2020

carols10cents deleted the ARROW-7842-cherry branch October 19, 2020 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Rust] [Parquet] LargeListArray support and why I think the tests are still failing #6

[Rust] [Parquet] LargeListArray support and why I think the tests are still failing #6

Uh oh!

carols10cents commented Oct 16, 2020 •

edited

Loading

Uh oh!

carols10cents commented Oct 16, 2020

Uh oh!

nevi-me commented Oct 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Rust] [Parquet] LargeListArray support and why I think the tests are still failing #6

[Rust] [Parquet] LargeListArray support and why I think the tests are still failing #6

Uh oh!

Conversation

carols10cents commented Oct 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carols10cents commented Oct 16, 2020

Uh oh!

nevi-me commented Oct 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

carols10cents commented Oct 16, 2020 •

edited

Loading