Skip to content

Conversation

@carols10cents
Copy link
Member

No description provided.

@carols10cents carols10cents force-pushed the rust-flight-integration-tests branch 2 times, most recently from c98a22e to a8da854 Compare December 23, 2020 17:40
seddonm1 and others added 11 commits December 28, 2020 10:05
This PR adds the ability to load and parse the expected query answers included in the https://github.com/databricks/tpch-dbgen/tree/master/answers repository - which users have to clone anyway to generate the TPC-H data.

Currently DataFusion does not support Decimal types which all the numeric values in TPC-H are so there are the expected precision errors in the current results. These tests are still useful as they show some interesting results already such as non-deterministic query 5 results.

@andygrove

Closes apache#9015 from seddonm1/test-tpch

Authored-by: Mike Seddon <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
This increases the default batch size 8x from `4096` to `32768` as it improves performance of quite some operations.
I just increased the size until performance didn't increase on my machine. Note that CSV reading also is faster on bigger batches on the bigger data sources.

This PR
```
Loading table 'part' into memory
Loaded table 'part' into memory in 125 ms
Loading table 'supplier' into memory
Loaded table 'supplier' into memory in 10 ms
Loading table 'partsupp' into memory
Loaded table 'partsupp' into memory in 381 ms
Loading table 'customer' into memory
Loaded table 'customer' into memory in 126 ms
Loading table 'orders' into memory
Loaded table 'orders' into memory in 961 ms
Loading table 'lineitem' into memory
Loaded table 'lineitem' into memory in 6382 ms
Loading table 'nation' into memory
Loaded table 'nation' into memory in 2 ms
Loading table 'region' into memory
Loaded table 'region' into memory in 2 ms
Query 12 iteration 0 took 220.2 ms
Query 12 iteration 1 took 223.2 ms
Query 12 iteration 2 took 222.4 ms
Query 12 iteration 3 took 222.2 ms
Query 12 iteration 4 took 221.8 ms
Query 12 iteration 5 took 222.0 ms
Query 12 iteration 6 took 223.1 ms
Query 12 iteration 7 took 223.7 ms
Query 12 iteration 8 took 222.5 ms
Query 12 iteration 9 took 222.9 ms
Query 12 avg time: 222.40 ms
```

Master
```
Loading table 'part' into memory
Loaded table 'part' into memory in 116 ms
Loading table 'supplier' into memory
Loaded table 'supplier' into memory in 7 ms
Loading table 'partsupp' into memory
Loaded table 'partsupp' into memory in 386 ms
Loading table 'customer' into memory
Loaded table 'customer' into memory in 115 ms
Loading table 'orders' into memory
Loaded table 'orders' into memory in 1048 ms
Loading table 'lineitem' into memory
Loaded table 'lineitem' into memory in 7673 ms
Loading table 'nation' into memory
Loaded table 'nation' into memory in 0 ms
Loading table 'region' into memory
Loaded table 'region' into memory in 0 ms
Query 12 iteration 0 took 596.1 ms
Query 12 iteration 1 took 602.0 ms
Query 12 iteration 2 took 608.1 ms
Query 12 iteration 3 took 607.9 ms
Query 12 iteration 4 took 613.5 ms
Query 12 iteration 5 took 615.3 ms
Query 12 iteration 6 took 611.6 ms
Query 12 iteration 7 took 609.8 ms
Query 12 iteration 8 took 615.7 ms
Query 12 iteration 9 took 616.9 ms
Query 12 avg time: 609.68 ms
```

Query 1 also improves a bit (but smaller improvement)

PR.
```
Query 1 iteration 0 took 653.0 ms
Query 1 iteration 1 took 653.4 ms
Query 1 iteration 2 took 652.3 ms
Query 1 iteration 3 took 658.9 ms
Query 1 iteration 4 took 655.1 ms
Query 1 iteration 5 took 662.0 ms
Query 1 iteration 6 took 659.7 ms
Query 1 iteration 7 took 662.7 ms
Query 1 iteration 8 took 669.0 ms
Query 1 iteration 9 took 665.7 ms
Query 1 avg time: 659.19 ms
```

Master:

```
Query 1 iteration 0 took 708.8 ms
Query 1 iteration 1 took 714.5 ms
Query 1 iteration 2 took 700.4 ms
Query 1 iteration 3 took 713.7 ms
Query 1 iteration 4 took 707.5 ms
Query 1 iteration 5 took 727.8 ms
Query 1 iteration 6 took 727.9 ms
Query 1 iteration 7 took 721.3 ms
Query 1 iteration 8 took 717.3 ms
Query 1 iteration 9 took 729.4 ms
Query 1 avg time: 716.85 ms
```

Closes apache#9021 from Dandandan/batch_size

Authored-by: Heres, Daniel <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
… API

Adds `count_distinct` function.

Closes apache#9028 from Dandandan/count_distinct

Authored-by: Heres, Daniel <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
This function returns, potentially, a few dictionaries, and will always
return one record batch. Return these as a tuple rather than putting
them together in one vec to clarify.
This doesn't seem necessary
Tracked in ARROW-10961. There's a bug in tonic that doesn't handle
headers and trailers correctly; it has been fixed but a new version of
tonic needs to be released and used to get the fix.
@carols10cents carols10cents force-pushed the rust-flight-integration-tests branch from 043ab91 to 5f479dd Compare December 28, 2020 19:04
@codecov-io
Copy link

codecov-io commented Dec 28, 2020

Codecov Report

Merging #4 (5f479dd) into master (e1b38cd) will decrease coverage by 1.00%.
The diff coverage is 3.49%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #4      +/-   ##
==========================================
- Coverage   82.88%   81.87%   -1.01%     
==========================================
  Files         201      211      +10     
  Lines       49757    50403     +646     
==========================================
+ Hits        41241    41269      +28     
- Misses       8516     9134     +618     
Impacted Files Coverage Δ
rust/arrow-flight/src/utils.rs 0.00% <0.00%> (ø)
rust/datafusion/examples/flight_server.rs 0.00% <0.00%> (ø)
rust/datafusion/src/execution/context.rs 89.19% <ø> (ø)
...ion-testing/src/bin/arrow-json-integration-test.rs 0.00% <ø> (ø)
...-testing/src/bin/flight-test-integration-client.rs 0.00% <0.00%> (ø)
...-testing/src/bin/flight-test-integration-server.rs 0.00% <0.00%> (ø)
...ng/src/flight_client_scenarios/auth_basic_proto.rs 0.00% <0.00%> (ø)
...ng/src/flight_client_scenarios/integration_test.rs 0.00% <0.00%> (ø)
...-testing/src/flight_client_scenarios/middleware.rs 0.00% <0.00%> (ø)
...integration-testing/src/flight_server_scenarios.rs 0.00% <0.00%> (ø)
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1b38cd...5f479dd. Read the comment docs.

@carols10cents carols10cents force-pushed the rust-flight-integration-tests branch from 5f479dd to 116611c Compare December 28, 2020 19:47
carols10cents pushed a commit that referenced this pull request Apr 15, 2021
From a deadlocked run...

```
#0  0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#3  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#4  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#5  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#6  0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#7  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#8  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#9  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
```

The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock.

To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests.

Closes apache#9842 from westonpace/bugfix/arrow-12040

Lead-authored-by: Weston Pace <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants