forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 3
rust integration tests for jake to review #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
shepmaster
reviewed
Dec 22, 2020
rust/integration-testing/src/bin/flight-test-integration-client.rs
Outdated
Show resolved
Hide resolved
rust/integration-testing/src/bin/flight-test-integration-client.rs
Outdated
Show resolved
Hide resolved
rust/integration-testing/src/bin/flight-test-integration-server.rs
Outdated
Show resolved
Hide resolved
rust/integration-testing/src/bin/flight-test-integration-server.rs
Outdated
Show resolved
Hide resolved
c98a22e to
a8da854
Compare
This PR adds the ability to load and parse the expected query answers included in the https://github.com/databricks/tpch-dbgen/tree/master/answers repository - which users have to clone anyway to generate the TPC-H data. Currently DataFusion does not support Decimal types which all the numeric values in TPC-H are so there are the expected precision errors in the current results. These tests are still useful as they show some interesting results already such as non-deterministic query 5 results. @andygrove Closes apache#9015 from seddonm1/test-tpch Authored-by: Mike Seddon <[email protected]> Signed-off-by: Andy Grove <[email protected]>
This increases the default batch size 8x from `4096` to `32768` as it improves performance of quite some operations. I just increased the size until performance didn't increase on my machine. Note that CSV reading also is faster on bigger batches on the bigger data sources. This PR ``` Loading table 'part' into memory Loaded table 'part' into memory in 125 ms Loading table 'supplier' into memory Loaded table 'supplier' into memory in 10 ms Loading table 'partsupp' into memory Loaded table 'partsupp' into memory in 381 ms Loading table 'customer' into memory Loaded table 'customer' into memory in 126 ms Loading table 'orders' into memory Loaded table 'orders' into memory in 961 ms Loading table 'lineitem' into memory Loaded table 'lineitem' into memory in 6382 ms Loading table 'nation' into memory Loaded table 'nation' into memory in 2 ms Loading table 'region' into memory Loaded table 'region' into memory in 2 ms Query 12 iteration 0 took 220.2 ms Query 12 iteration 1 took 223.2 ms Query 12 iteration 2 took 222.4 ms Query 12 iteration 3 took 222.2 ms Query 12 iteration 4 took 221.8 ms Query 12 iteration 5 took 222.0 ms Query 12 iteration 6 took 223.1 ms Query 12 iteration 7 took 223.7 ms Query 12 iteration 8 took 222.5 ms Query 12 iteration 9 took 222.9 ms Query 12 avg time: 222.40 ms ``` Master ``` Loading table 'part' into memory Loaded table 'part' into memory in 116 ms Loading table 'supplier' into memory Loaded table 'supplier' into memory in 7 ms Loading table 'partsupp' into memory Loaded table 'partsupp' into memory in 386 ms Loading table 'customer' into memory Loaded table 'customer' into memory in 115 ms Loading table 'orders' into memory Loaded table 'orders' into memory in 1048 ms Loading table 'lineitem' into memory Loaded table 'lineitem' into memory in 7673 ms Loading table 'nation' into memory Loaded table 'nation' into memory in 0 ms Loading table 'region' into memory Loaded table 'region' into memory in 0 ms Query 12 iteration 0 took 596.1 ms Query 12 iteration 1 took 602.0 ms Query 12 iteration 2 took 608.1 ms Query 12 iteration 3 took 607.9 ms Query 12 iteration 4 took 613.5 ms Query 12 iteration 5 took 615.3 ms Query 12 iteration 6 took 611.6 ms Query 12 iteration 7 took 609.8 ms Query 12 iteration 8 took 615.7 ms Query 12 iteration 9 took 616.9 ms Query 12 avg time: 609.68 ms ``` Query 1 also improves a bit (but smaller improvement) PR. ``` Query 1 iteration 0 took 653.0 ms Query 1 iteration 1 took 653.4 ms Query 1 iteration 2 took 652.3 ms Query 1 iteration 3 took 658.9 ms Query 1 iteration 4 took 655.1 ms Query 1 iteration 5 took 662.0 ms Query 1 iteration 6 took 659.7 ms Query 1 iteration 7 took 662.7 ms Query 1 iteration 8 took 669.0 ms Query 1 iteration 9 took 665.7 ms Query 1 avg time: 659.19 ms ``` Master: ``` Query 1 iteration 0 took 708.8 ms Query 1 iteration 1 took 714.5 ms Query 1 iteration 2 took 700.4 ms Query 1 iteration 3 took 713.7 ms Query 1 iteration 4 took 707.5 ms Query 1 iteration 5 took 727.8 ms Query 1 iteration 6 took 727.9 ms Query 1 iteration 7 took 721.3 ms Query 1 iteration 8 took 717.3 ms Query 1 iteration 9 took 729.4 ms Query 1 avg time: 716.85 ms ``` Closes apache#9021 from Dandandan/batch_size Authored-by: Heres, Daniel <[email protected]> Signed-off-by: Andy Grove <[email protected]>
… API Adds `count_distinct` function. Closes apache#9028 from Dandandan/count_distinct Authored-by: Heres, Daniel <[email protected]> Signed-off-by: Andy Grove <[email protected]>
This function returns, potentially, a few dictionaries, and will always return one record batch. Return these as a tuple rather than putting them together in one vec to clarify.
This doesn't seem necessary
Tracked in ARROW-10961. There's a bug in tonic that doesn't handle headers and trailers correctly; it has been fixed but a new version of tonic needs to be released and used to get the fix.
043ab91 to
5f479dd
Compare
Codecov Report
@@ Coverage Diff @@
## master #4 +/- ##
==========================================
- Coverage 82.88% 81.87% -1.01%
==========================================
Files 201 211 +10
Lines 49757 50403 +646
==========================================
+ Hits 41241 41269 +28
- Misses 8516 9134 +618
Continue to review full report at Codecov.
|
5f479dd to
116611c
Compare
carols10cents
pushed a commit
that referenced
this pull request
Apr 15, 2021
From a deadlocked run... ``` #0 0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #3 0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #4 0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #5 0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#6 0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#7 0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#8 0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#9 0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so ``` The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock. To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests. Closes apache#9842 from westonpace/bugfix/arrow-12040 Lead-authored-by: Weston Pace <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.