Skip to content

Conversation

@carols10cents
Copy link

@carols10cents carols10cents commented Oct 16, 2020

Hi, I decided to open the PR over here rather than on the apache/arrow repo since I'm making changes to your branch... I did rebase this on the upstream rust-parquet-arrow-writer branch though, so there's some noise in this PR's diff :-/

I added support for LargeListArrays in the ListArrayReader, and I added a test in array_reader and it's passing!

However, the roundtrip tests are still failing, and my current suspicion is that something isn't quite right with the test setup, because the null count isn't what I expected it to be. I've documented with comments and assertions (some of which pass and some of which fail) my current reasoning-- I have to go right now but I'll keep investigating later. If you notice something that's obvious to you about the test setup, please let me know!

@carols10cents
Copy link
Author

Just saw that you rebased your branch so I rebased on your branch :)

@carols10cents carols10cents changed the title LargeListArray support and why I think the tests are still failing [Rust] [Parquet] LargeListArray support and why I think the tests are still failing Oct 16, 2020
@nevi-me
Copy link
Owner

nevi-me commented Oct 17, 2020

Thanks Carol 🙏🏾

I'm merging this, then I'll look at the CI failures

@nevi-me nevi-me merged this pull request into nevi-me:ARROW-7842-cherry Oct 17, 2020
nevi-me added a commit that referenced this pull request Oct 17, 2020
This is a port of apache#6770 to the parquet-writer branch.
We'll have more of a chance to test this reader,
and ensure that we can roundtrip on list types.

[Rust] [Parquet] LargeListArray support and why I think the tests are still failing (#6)

* Support reading LargeListArrays by making ListArrayReader generic over OffsetSize

* Update comment to match the actual values in this test; probably copy-paste

* Document why I think the test setup isn't quite right

disable list writer tests

They're failing because of incorrect def/rep.
Will be addressed separately
@carols10cents carols10cents deleted the ARROW-7842-cherry branch October 19, 2020 13:00
nevi-me pushed a commit that referenced this pull request Jul 11, 2021
From a deadlocked run...

```
#0  0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#3  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#4  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#5  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#6  0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#7  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#8  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#9  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
```

The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock.

To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests.

Closes apache#9842 from westonpace/bugfix/arrow-12040

Lead-authored-by: Weston Pace <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants