Skip to content

Conversation

@projjal
Copy link
Owner

@projjal projjal commented Mar 23, 2021

@frank400 @jpedroantunes
creating this dummy PR so that I can add comments.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should not add streams to this class. See this and this comments

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding redundant precision and scale properties, since this is a templated class its better to separate out the Decimal class and only add precision/scale to them

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is better to use gandiva::DecimalScalar128 instead of arrow::Decimal128 since in gandiva we are using gandiva::DecimalScalar128 as literal

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substituted

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it is better to implement hash for gandiva::DecimalScalar instead of arrow::Decimal128. Implemnting hash for arrow::Decimal128 might conflict with any implemtations with arrow code later

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

…expose options to python

This adds ReadOptions to CsvFileFormat and exposes ReadOptions, ConvertOptions, and CsvFragmentScanOptions to Python.

ReadOptions was added to CsvFileFormat as its options can affect the discovered schema. For the block size, which does not need to be global, a field was added to CsvFragmentScanOptions.

Closes apache#9725 from lidavidm/arrow-8631

Authored-by: David Li <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> bool is_decimal = false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it required?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should already removed it. done

bkietz and others added 12 commits March 23, 2021 16:06
This patch adds basic building blocks for grouped aggregation:

- `Grouper` for producing integer arrays encoding group id from batches of keys
- `HashAggregateKernel` for consuming batches of arguments and group ids, updating internal sums/counts/...

For testing purposes, a one-shot grouped aggregation function is provided:
```c++
std::shared_ptr<arrow::Array> needs_sum = ...;
std::shared_ptr<arrow::Array> needs_min_max = ...;
std::shared_ptr<arrow::Array> key_0 = ...;
std::shared_ptr<arrow::Array> key_1 = ...;

ARROW_ASSIGN_OR_RAISE(arrow::Datum out,
  arrow::compute::internal::GroupBy({
    needs_sum,
    needs_min_max,
  }, {
    key_0,
    key_1,
  }, {
    {"sum", nullptr},  // first argument will be summed
    {"min_max", &min_max_options},  // second argument's extrema will be found
}));

// Unpack struct array result (a four-field array)
auto out_array = out.array_as<StructArray>();
std::shared_ptr<arrow::Array> sums = out_array->field(0);
std::shared_ptr<arrow::Array> mins_and_maxes = out_array->field(1);
std::shared_ptr<arrow::Array> group_key_0 = out_array->field(2);
std::shared_ptr<arrow::Array> group_key_1 = out_array->field(3);
```

Closes apache#9621 from bkietz/groupby1

Lead-authored-by: Benjamin Kietzman <[email protected]>
Co-authored-by: michalursa <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>
The given input stream should be alive while a new GArrowCSVReader is
alive.

Closes apache#9777 from kou/glib-csv-reader-refer-input

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
…le namespacing

This is an implementation of catalog and schema providers to support table namespacing (see the [design doc](https://docs.google.com/document/d/1_bCP_tjVRLJyOrMBOezSFNpF0hwPa1ZS_qMWv1uvtS4/edit?usp=sharing)).

I'm creating this draft PR as a supporting implementation for the proposal, to prove out that the work can be done whilst minimising API churn and still allowing for use cases that don't care at all about the notion of catalogs or schemas; in this new setup, the default namespace is `datafusion.public`, which will be created automatically with the default execution context config and allow for table registration.

## Highlights
- Datasource map removed in execution context state, replaced with catalog map
- Execution context allows for registering new catalog providers
- Catalog providers can be queried for their constituent schema providers
- Schema providers can be queried for table providers, similarly to the old datasource map
- Includes basic implementations of `CatalogProvider` and `SchemaProvider` backed by hashmaps
- New `TableReference` enum maps to various ways of referring to a table in sql
  - Bare: `my_table`
  - Partial: `schema.my_table`
  - Full: `catalog.schema.my_table`
- Given a default catalog and schema, `TableReference` instances of any variant can be converted to a `ResolvedTableReference`, which always include all three components

Closes apache#9762 from returnString/catalog

Lead-authored-by: Ruan Pearce-Authers <[email protected]>
Co-authored-by: Ruan Pearce-Authers <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
Closes apache#9757 from pachamaltese/ARROW-11912

Lead-authored-by: Mauricio Vargas <[email protected]>
Co-authored-by: Pachamaltese <[email protected]>
Signed-off-by: Neal Richardson <[email protected]>
[ARROW-12012](https://issues.apache.org/jira/browse/ARROW-12012)
An exception will be thrown when BinaryConsumer consumes a large amount or a lot of data.

Closes apache#9744 from zxf/fix/jdbc-binary-consumer

Authored-by: Felix Zhu <[email protected]>
Signed-off-by: liyafan82 <[email protected]>
…ag for python binding

Hi,

I am making a PR following the discussion in [ARROW-11497](https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11497)

This is my first PR to this project, please let me know if I'm missing something, I will try to address all problem as much as I can.

Cheers,
Truc

Closes apache#9489 from trucnguyenlam/provide-parquet-enable-compliant-nested-type-flag

Authored-by: Truc Lam Nguyen <[email protected]>
Signed-off-by: Micah Kornfield <[email protected]>
…rsion of Map

These items can all stand on their own and they are used by the async datasets conversion.

MergeMap - Given AsyncGenerator<AsyncGenerator<T>> return AsyncGenerator<T>.  This method flattens a generator of generators into a generator of items.  It may reorder the items.

ConcatMap - Same as MergeMap but it will only pull items from one inner subscription at a time.  This reduced parallelism allows items to be returned in-order.

Async-reentrant Map - In some cases the map function is slow.  Even if the source is not async-reentrant this map can still be async-reentrant by allowing multiple instances of the map function to run at once.  The resulting mapped generator is async reentrant but it will not pull reentrantly from the source.

Vector utilities - In order to make migrating from Iterator code to vector code easier I added some map style utilities.  These copy the vectors (where an iterator wouldn't) so some care should be taken but they can still be useful.

Moved Future/AsyncGenerator into top level type_fwd.  This is needed for the RecordBatchGenerator alias in the same way Iterator is needed at the top level.

Added `IsEnd` to `IterationTraits`.  This allows non-comparable types to be iterated on.  It allows us to create an AsyncGenerator<AsyncGenerator<T>> since AsyncGenerator is std::function and we can use an empty instance as an end token even though std::function is not comaprable.

Closes apache#9643 from westonpace/feature/arrow-11883

Authored-by: Weston Pace <[email protected]>
Signed-off-by: David Li <[email protected]>
A segfault would occur when a field is inferred as null in a first block and then as list in a second block.

Also re-enable `chunked_builder_test.cc`, which wasn't compiled.

Closes apache#9783 from pitrou/ARROW-12065-json-segfault

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Closes apache#9775 from jorgecarleitao/clippy_clean

Authored-by: Jorge C. Leitao <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
This adds a function `from_trusted_len_iter_bool` to speed up the creation of an array for booleans.

Benchmarks are a bit noisy, but seems to be ~10-20% faster for comparison kernels. This also has some positive effect on DataFusion queries, as they contain quite some (nested) comparisons in filters. For example, executing tpch query 6 in memory is ~7% faster.

```
Gnuplot not found, using plotters backend
eq Float32              time:   [54.204 us 54.284 us 54.364 us]
                        change: [-29.087% -28.838% -28.581%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) low mild
  1 (1.00%) high mild

eq scalar Float32       time:   [43.660 us 43.743 us 43.830 us]
                        change: [-30.819% -30.545% -30.269%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

neq Float32             time:   [68.726 us 68.893 us 69.048 us]
                        change: [-14.045% -13.772% -13.490%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

neq scalar Float32      time:   [46.251 us 46.322 us 46.395 us]
                        change: [-12.204% -11.952% -11.702%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild

lt Float32              time:   [50.264 us 50.438 us 50.613 us]
                        change: [-21.300% -20.964% -20.649%] (p = 0.00 < 0.05)
                        Performance has improved.

lt scalar Float32       time:   [48.847 us 48.929 us 49.013 us]
                        change: [-10.132% -9.9180% -9.6910%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

lt_eq Float32           time:   [46.105 us 46.198 us 46.282 us]
                        change: [-21.276% -20.966% -20.703%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) low severe
  13 (13.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

lt_eq scalar Float32    time:   [47.359 us 47.456 us 47.593 us]
                        change: [+0.2766% +0.5240% +0.7821%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) high mild
  2 (2.00%) high severe

gt Float32              time:   [57.313 us 57.363 us 57.412 us]
                        change: [-18.328% -18.177% -18.031%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild

gt scalar Float32       time:   [44.091 us 44.132 us 44.175 us]
                        change: [-9.4233% -9.2747% -9.1273%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) low mild
  3 (3.00%) high mild

gt_eq Float32           time:   [55.856 us 55.932 us 56.007 us]
                        change: [-7.4997% -7.2656% -7.0334%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

gt_eq scalar Float32    time:   [42.365 us 42.419 us 42.482 us]
                        change: [+0.5289% +0.7174% +0.9116%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
```

Closes apache#9759 from Dandandan/optimize_comparison

Authored-by: Heres, Daniel <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
This adds support for CTE syntax:

```sql
WITH
   name AS (SELECT ...)
   [, name2 AS (SELECT ...)]
SELECT ...
FROM ...
```

Before this PR, the CTE syntax was ignored.

This PR supports CTEs referening a previous CTE within the same query (but no forward references)

Closes apache#9776 from Dandandan/cte_support

Authored-by: Heres, Daniel <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
This likely needs more testing, especially where I had to implement functionality in (Basic)Decimal256. Also, we may want to extend the scalar cast benchmarks to cover decimals. There's also potentially some redundancy to eliminate in the tests.

Closes apache#9751 from lidavidm/arrow-10606

Authored-by: David Li <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
ianmcook and others added 7 commits March 24, 2021 14:04
This fixes a `NOTE` from `R CMD check` caused by ARROW-11700

Closes apache#9793 from ianmcook/ARROW-12073

Authored-by: Ian Cook <[email protected]>
Signed-off-by: Neal Richardson <[email protected]>
This PR adds the *utf8_length* compute kernel to the string scalar functions to support calculating the string length (as number of characters) for UTF-8 encoded STRINGs and LARGE STRINGs. The implementation makes use of utf8proc (utf8proc_iterate) to perform the calculation.

Closes apache#9786 from edponce/ARROW-11693-Add-string-length-kernel

Authored-by: Eduardo Ponce <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Users should use Meson.

Closes apache#9787 from kou/glib-remove-autotools

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Closes apache#9788 from kou/glib-json-reader-refer

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
There was a logical conflict between apache@eebf64b which removed the Arc in `ArrayData` and  apache@8dd6abb which optimized the compute kernels.

FYI @Dandandan  and @nevi-me

Closes apache#9796 from alamb/alamb/fix-build

Authored-by: Andrew Lamb <[email protected]>
Signed-off-by: Neville Dipale <[email protected]>
…l Interator<Item=Expr> rather than &[Expr]

# NOTE:
Since is a fairly major backwards incompatible change (many callsites need to be updated, though mostly mechanically); I gathered some feedback on this approach in apache#9692 and this is the PR I propose for merge.

I'll leave this open for several days and also send a note to the mailing lists for additional comment

It is part of my overall plan to make the DataFusion optimizer more idiomatic and do much less copying [ARROW-11689](https://issues.apache.org/jira/browse/ARROW-11689)

# Rationale:
All callsites currently need an owned `Vec` (or equivalent) so they can pass in `&[Expr]` and then Datafusion copies all the `Expr`s. Many times the original `Vec<Expr>` is discarded immediately after use (I'll point out where this happens in a few places below). Thus I it would better (more idiomatic and often less copy/faster) to take something that could produce an iterator over Expr

# Changes
1. Change `Dataframe` so it takes `Vec<Expr>` rather than `&[Expr]`
2. Change `LogicalPlanBuilder` so it takes `impl Iterator<Item=Expr>` rather than `&[Expr]`

I couldn't figure out how to allow the `Dataframe` API (which is a Trait) to take an `impl Iterator<Item=Expr>`

Closes apache#9703 from alamb/alamb/less_copy_in_plan_builder_final

Authored-by: Andrew Lamb <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
…ed code

Closes apache#9789 from jorisvandenbossche/ARROW-11983

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
@jvictorhuguenin jvictorhuguenin force-pushed the feature/add-decimal-in-expression branch from 1d7bdcd to 89b2172 Compare April 1, 2021 12:34
@github-actions github-actions bot added the flight label Apr 1, 2021
projjal pushed a commit that referenced this pull request Apr 7, 2021
From a deadlocked run...

```
#0  0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0
apache#2  0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#3  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#4  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#5  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#6  0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#7  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#8  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#9  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
```

The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock.

To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests.

Closes apache#9842 from westonpace/bugfix/arrow-12040

Lead-authored-by: Weston Pace <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
@projjal projjal closed this Apr 8, 2021
projjal pushed a commit that referenced this pull request Jun 17, 2021
Before change:

```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
    #0 0x522f09 in
    #1 0x7f28ae5826f4 in
    apache#2 0x7f28ae57fa5d in
    apache#3 0x7f28ae58cb0f in
    apache#4 0x7f28ae58bda0 in
    ...
```

After change:
```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
    #0 0x522f09 in posix_memalign (/build/cpp/debug/arrow-dataset-file-csv-test+0x522f09)
    #1 0x7f28ae5826f4 in arrow::(anonymous namespace)::SystemAllocator::AllocateAligned(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:213:24
    apache#2 0x7f28ae57fa5d in arrow::BaseMemoryPoolImpl<arrow::(anonymous namespace)::SystemAllocator>::Allocate(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:405:5
    apache#3 0x7f28ae58cb0f in arrow::PoolBuffer::Reserve(long) /arrow/cpp/src/arrow/memory_pool.cc:717:9
    apache#4 0x7f28ae58bda0 in arrow::PoolBuffer::Resize(long, bool) /arrow/cpp/src/arrow/memory_pool.cc:741:7
    ...
```

Closes apache#10498 from westonpace/feature/ARROW-13027--c-fix-asan-stack-traces-in-ci

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
projjal pushed a commit that referenced this pull request Dec 1, 2021
Error log of Valgrind failure:
```
[----------] 3 tests from TestArrowReadDeltaEncoding
[ RUN      ] TestArrowReadDeltaEncoding.DeltaBinaryPacked
[       OK ] TestArrowReadDeltaEncoding.DeltaBinaryPacked (812 ms)
[ RUN      ] TestArrowReadDeltaEncoding.DeltaByteArray
==12587== Conditional jump or move depends on uninitialised value(s)
==12587==    at 0x4F12C57: Advance (bit_stream_utils.h:426)
==12587==    by 0x4F12C57: parquet::(anonymous namespace)::DeltaBitPackDecoder<parquet::PhysicalType<(parquet::Type::type)1> >::GetInternal(int*, int) (encoding.cc:2216)
==12587==    by 0x4F13823: Decode (encoding.cc:2091)
==12587==    by 0x4F13823: parquet::(anonymous namespace)::DeltaByteArrayDecoder::SetData(int, unsigned char const*, int) (encoding.cc:2360)
==12587==    by 0x4E89EF5: parquet::(anonymous namespace)::ColumnReaderImplBase<parquet::PhysicalType<(parquet::Type::type)6> >::InitializeDataDecoder(parquet::DataPage const&, long) (column_reader.cc:797)
==12587==    by 0x4E9AE63: ReadNewPage (column_reader.cc:614)
==12587==    by 0x4E9AE63: HasNextInternal (column_reader.cc:576)
==12587==    by 0x4E9AE63: parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)6> >::ReadRecords(long) (column_reader.cc:1228)
==12587==    by 0x4DFB19F: parquet::arrow::(anonymous namespace)::LeafReader::LoadBatch(long) (reader.cc:467)
==12587==    by 0x4DF513C: parquet::arrow::ColumnReaderImpl::NextBatch(long, std::shared_ptr<arrow::ChunkedArray>*) (reader.cc:108)
==12587==    by 0x4DFB74D: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadColumn(int, std::vector<int, std::allocator<int> > const&, parquet::arrow::ColumnReader*, std::shared_ptr<arrow::ChunkedArray>*) (reader.cc:273)
==12587==    by 0x4E11FDA: operator() (reader.cc:1180)
==12587==    by 0x4E11FDA: arrow::Future<std::vector<std::shared_ptr<arrow::ChunkedArray>, std::allocator<arrow::Future> > > arrow::internal::OptionalParallelForAsync<parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*)::{lambda(unsigned long, std::shared_ptr<parquet::arrow::ColumnReaderImpl>)#1}&, std::shared_ptr<parquet::arrow::ColumnReaderImpl>, std::shared_ptr<arrow::ChunkedArray> >(bool, std::vector<std::shared_ptr<parquet::arrow::ColumnReaderImpl>, std::allocator<arrow::Future<std::vector<std::shared_ptr<arrow::ChunkedArray>, std::allocator<arrow::Future> > > > >, parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*)::{lambda(unsigned long, std::shared_ptr<parquet::arrow::ColumnReaderImpl>)#1}&, arrow::internal::Executor*) (parallel.h:95)
==12587==    by 0x4E126A9: parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*) (reader.cc:1198)
==12587==    by 0x4E12F50: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadRowGroups(std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) (reader.cc:1160)
==12587==    by 0x4DFA2BC: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) (reader.cc:198)
==12587==    by 0x4DFA392: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::shared_ptr<arrow::Table>*) (reader.cc:289)
==12587==    by 0x1DCE62: parquet::arrow::TestArrowReadDeltaEncoding::ReadTableFromParquetFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<arrow::Table>*) (arrow_reader_writer_test.cc:4174)
==12587==    by 0x2266D2: parquet::arrow::TestArrowReadDeltaEncoding_DeltaByteArray_Test::TestBody() (arrow_reader_writer_test.cc:4209)
==12587==    by 0x4AD2C9B: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2607)
==12587==    by 0x4AC9DD1: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2643)
==12587==    by 0x4AA4C02: testing::Test::Run() (gtest.cc:2682)
==12587==    by 0x4AA563A: testing::TestInfo::Run() (gtest.cc:2861)
==12587==    by 0x4AA600F: testing::TestSuite::Run() (gtest.cc:3015)
==12587==    by 0x4AB631B: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5855)
==12587==    by 0x4AD3CE7: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2607)
==12587==    by 0x4ACB063: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2643)
==12587==    by 0x4AB47B6: testing::UnitTest::Run() (gtest.cc:5438)
==12587==    by 0x4218918: RUN_ALL_TESTS() (gtest.h:2490)
==12587==    by 0x421895B: main (gtest_main.cc:52)
```

Closes apache#11725 from pitrou/ARROW-14704-parquet-valgrind

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
projjal pushed a commit that referenced this pull request Mar 29, 2022
TODOs:
Convert cheat sheet to PDF and hide slide #1.

Closes apache#12445 from pachadotdev/patch-4

Lead-authored-by: Stephanie Hazlitt <[email protected]>
Co-authored-by: Pachá <[email protected]>
Co-authored-by: Mauricio Vargas <[email protected]>
Co-authored-by: Pachá <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.