-
Notifications
You must be signed in to change notification settings - Fork 0
Add files to be reviewed #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| std::string ToString() const; | ||
|
|
||
| private: | ||
| std::vector<std::string> keys_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use a single std::vector<std::pair<std::string, std::string>>? That would remove size equality checks of both vector, also ensure semantic correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A rather good question :-)
| /// | ||
| /// \param pairs key-value mapping | ||
| std::shared_ptr<KeyValueMetadata> ARROW_EXPORT | ||
| key_value_metadata(const std::unordered_map<std::string, std::string>& pairs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we follow the pattern found in the existing codebase, that would be a static method in the class named Make. The function name doesn't respect the style guide.
It's also redundant with the explicit constructor (minus the shared_ptr), is this for FFI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR, I think it's just a convenience function, like those other functions in Arrow: https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h#L832
Arrow promotes using shared_ptr most everywhere, so it makes sense that we make helper factories available. The raw constructors are sometimes available for specialized use cases.
Not sure why the style discrepancy :-)
| out++; | ||
| *out = ((*in) >> 26); | ||
| ++in; | ||
| *o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs at least documentation on the assumption made by this function, very easy to read to invalid memory or write too far due to mis-use by lack of documentation.
| namespace arrow { | ||
| namespace internal { | ||
|
|
||
| inline const uint32_t* unpack1_32(const uint32_t* in, uint32_t* out) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is prone to off-by-one and hardly reviewable. I'd say a property testing with random inputs is almost mandatory. But this is vendored code, so trust upstream?
|
|
||
| void Append(const std::string& key, const std::string& value); | ||
|
|
||
| void reserve(int64_t n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method name does not respect style guide, I expect this is wanted to emulate std containers? Ditto for other methods in lowercase.
| void reserve(int64_t n); | ||
| int64_t size() const; | ||
|
|
||
| std::string key(int64_t i) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a method to return both in single call, e.g. KeyValue(i), I'd also expose a method that returns a const ref to avoid copy.
| return keys; | ||
| } | ||
|
|
||
| static std::vector<std::string> UnorderedMapValues( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a good candidate for a macro UNORDERED_MAP(m, first/second).
I am contributing to [Arrow 3731](https://issues.apache.org/jira/browse/ARROW-3731). This PR has the minimum functionality to read parquet files into an arrow::Table, which can then be converted to a tibble. Multiple parquet files can be read inside `lapply`, and then concatenated at the end. Steps to compile 1) Build arrow and parquet c++ projects 2) In R run `devtools::load_all()` What I could use help with: The biggest challenge for me is my lack of experience with pkg-config. The R library has a `configure` file which uses pkg-config to figure out what c++ libraries to link to. Currently, `configure` looks up the Arrow project and links to -larrow only. We need it to also link to -lparquet. I do not know how to modify pkg-config's metadata to let it know to link to both -larrow and -lparquet Author: Jeffrey Wong <[email protected]> Author: Romain Francois <[email protected]> Author: jeffwong-nflx <[email protected]> Closes apache#3230 from jeffwong-nflx/master and squashes the following commits: c67fa3d <jeffwong-nflx> Merge pull request #3 from jeffwong-nflx/cleanup 1df3026 <Jeffrey Wong> don't hard code -larrow and -lparquet 8ccaa51 <Jeffrey Wong> cleanup 75ba5c9 <Jeffrey Wong> add contributor 56adad2 <jeffwong-nflx> Merge pull request #2 from romainfrancois/3731/parquet-2 7d6e64d <Romain Francois> read_parquet() only reading one parquet file, and gains a `as_tibble` argument e936b44 <Romain Francois> need parquet on travis too ff260c5 <Romain Francois> header was too commented, renamed to parquet.cpp 9e1897f <Romain Francois> styling etc ... 456c5d2 <Jeffrey Wong> read parquet files 22d89dd <Jeffrey Wong> hardcode -larrow and -lparquet
https://issues.apache.org/jira/browse/ARROW-3965 This creates an object which configures the BaseAllocator and Calendar used during to configure the translation from a JDBC ResultSet to an Arrow vector. Author: Mike Pigott <[email protected]> Author: Michael Pigott <[email protected]> Closes apache#3133 from mikepigott/jdbc-to-arrow-config and squashes the following commits: be95426 <Mike Pigott> ARROW-3965: JDBC-To-Arrow Config Builder javadocs. d6c64a7 <Mike Pigott> ARROW-3965: JdbcToArrowConfigBuilder d7ca982 <Mike Pigott> Merge branch 'master' into jdbc-to-arrow-config 789c8c8 <Michael Pigott> Merge pull request #4 from apache/master e5b19ee <Michael Pigott> Merge pull request #3 from apache/master 3b17c29 <Michael Pigott> Merge pull request #2 from apache/master 5b1b364 <Mike Pigott> Merge branch 'master' into jdbc-to-arrow-config 881c6c8 <Michael Pigott> Merge pull request #1 from apache/master bb3165b <Mike Pigott> Updating the function calls to use the JdbcToArrowConfig versions. 68c91e7 <Mike Pigott> Modifying the jdbcToArrowSchema and jdbcToArrowVectors methods to receive JdbcToArrowConfig objects. 8d6cf00 <Mike Pigott> Documentation for public static VectorSchemaRoot sqlToArrow(Connection connection, String query, JdbcToArrowConfig config) 4f1260c <Mike Pigott> Adding documentation for public static VectorSchemaRoot sqlToArrow(ResultSet resultSet, JdbcToArrowConfig config) df632e3 <Mike Pigott> Updating the SQL tests to include JdbcToArrowConfig versions. b270044 <Mike Pigott> Updated validaton & documentation, and unit tests for the new JdbcToArrowConfig. da77cbe <Mike Pigott> Creating a configuration class for the JDBC-to-Arrow converter.
https://issues.apache.org/jira/browse/ARROW-3923 Hello! I was reading through the JDBC source code and I noticed that a java.util.Calendar was required for creating an Arrow Schema and Arrow Vectors from a JDBC ResultSet, when none is required. This change makes the Calendar optional. Unit Tests: The existing SureFire plugin configuration uses a UTC calendar for the database, which is the default Calendar in the existing code. Likewise, no changes to the unit tests are required to provide adequate coverage for the change. Author: Michael Pigott <[email protected]> Author: Mike Pigott <[email protected]> Closes apache#3066 from mikepigott/jdbc-timestamp-no-calendar and squashes the following commits: 4d95da0 <Mike Pigott> ARROW-3923: Supporting a null Calendar in the config, and reverting the breaking change. cd9a230 <Mike Pigott> Merge branch 'master' into jdbc-timestamp-no-calendar 509a1cc <Michael Pigott> Merge pull request #5 from apache/master 789c8c8 <Michael Pigott> Merge pull request #4 from apache/master e5b19ee <Michael Pigott> Merge pull request #3 from apache/master 3b17c29 <Michael Pigott> Merge pull request #2 from apache/master 881c6c8 <Michael Pigott> Merge pull request #1 from apache/master 089cff4 <Mike Pigott> Format fixes a58a4a5 <Mike Pigott> Fixing calendar usage. e12832a <Mike Pigott> Allowing for timestamps without a time zone.
https://issues.apache.org/jira/browse/ARROW-3966 This change includes apache#3133, and supports a new configuration item called "Include Metadata." If true, metadata from the JDBC ResultSetMetaData object is pulled along to the Schema Field Metadata. For now, this includes: * Catalog Name * Table Name * Column Name * Column Type Name Author: Mike Pigott <[email protected]> Author: Michael Pigott <[email protected]> Closes apache#3134 from mikepigott/jdbc-column-metadata and squashes the following commits: 02f2f34 <Mike Pigott> ARROW-3966: Picking up lost change to support null calendars. 7049c36 <Mike Pigott> Merge branch 'master' into jdbc-column-metadata e9a9b2b <Michael Pigott> Merge pull request #6 from apache/master 65741a9 <Mike Pigott> ARROW-3966: Code review feedback cc6cc88 <Mike Pigott> ARROW-3966: Using a 1:N loop instead of a 0:N-1 loop for fewer index offsets in code. cfb2ba6 <Mike Pigott> ARROW-3966: Using a helper method for building a UTC calendar with root locale. 2928513 <Mike Pigott> ARROW-3966: Moving the metadata flag assignment into the builder. 69022c2 <Mike Pigott> ARROW-3966: Fixing merge. 4a6de86 <Mike Pigott> Merge branch 'master' into jdbc-column-metadata 509a1cc <Michael Pigott> Merge pull request #5 from apache/master 789c8c8 <Michael Pigott> Merge pull request #4 from apache/master e5b19ee <Michael Pigott> Merge pull request #3 from apache/master 3b17c29 <Michael Pigott> Merge pull request #2 from apache/master d847ebc <Mike Pigott> Fixing file location 1ceac9e <Mike Pigott> Merge branch 'master' into jdbc-column-metadata 881c6c8 <Michael Pigott> Merge pull request #1 from apache/master 03091a8 <Mike Pigott> Unit tests for including result set metadata. 72d64cc <Mike Pigott> Affirming the field metadata is empty when the configuration excludes field metadata. 7b4527c <Mike Pigott> Test for the include-metadata flag in the configuration. 7e9ce37 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata bb3165b <Mike Pigott> Updating the function calls to use the JdbcToArrowConfig versions. a6fb1be <Mike Pigott> Fixing function call 5bfd6a2 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata 68c91e7 <Mike Pigott> Modifying the jdbcToArrowSchema and jdbcToArrowVectors methods to receive JdbcToArrowConfig objects. b5b0cb1 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata 8d6cf00 <Mike Pigott> Documentation for public static VectorSchemaRoot sqlToArrow(Connection connection, String query, JdbcToArrowConfig config) 4f1260c <Mike Pigott> Adding documentation for public static VectorSchemaRoot sqlToArrow(ResultSet resultSet, JdbcToArrowConfig config) e34a9e7 <Mike Pigott> Fixing formatting. fe097c8 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata df632e3 <Mike Pigott> Updating the SQL tests to include JdbcToArrowConfig versions. b270044 <Mike Pigott> Updated validaton & documentation, and unit tests for the new JdbcToArrowConfig. da77cbe <Mike Pigott> Creating a configuration class for the JDBC-to-Arrow converter. a78c770 <Mike Pigott> Updating Javadocs. 523387f <Mike Pigott> Updating the API to support an optional 'includeMetadata' field. 5af1b5b <Mike Pigott> Separating out the field-type creation from the field creation.
https://issues.apache.org/jira/browse/ARROW-3923 Hello! I was reading through the JDBC source code and I noticed that a java.util.Calendar was required for creating an Arrow Schema and Arrow Vectors from a JDBC ResultSet, when none is required. This change makes the Calendar optional. Unit Tests: The existing SureFire plugin configuration uses a UTC calendar for the database, which is the default Calendar in the existing code. Likewise, no changes to the unit tests are required to provide adequate coverage for the change. Author: Michael Pigott <[email protected]> Author: Mike Pigott <[email protected]> Closes apache#3066 from mikepigott/jdbc-timestamp-no-calendar and squashes the following commits: 4d95da0 <Mike Pigott> ARROW-3923: Supporting a null Calendar in the config, and reverting the breaking change. cd9a230 <Mike Pigott> Merge branch 'master' into jdbc-timestamp-no-calendar 509a1cc <Michael Pigott> Merge pull request #5 from apache/master 789c8c8 <Michael Pigott> Merge pull request #4 from apache/master e5b19ee <Michael Pigott> Merge pull request #3 from apache/master 3b17c29 <Michael Pigott> Merge pull request #2 from apache/master 881c6c8 <Michael Pigott> Merge pull request #1 from apache/master 089cff4 <Mike Pigott> Format fixes a58a4a5 <Mike Pigott> Fixing calendar usage. e12832a <Mike Pigott> Allowing for timestamps without a time zone.
https://issues.apache.org/jira/browse/ARROW-3966 This change includes apache#3133, and supports a new configuration item called "Include Metadata." If true, metadata from the JDBC ResultSetMetaData object is pulled along to the Schema Field Metadata. For now, this includes: * Catalog Name * Table Name * Column Name * Column Type Name Author: Mike Pigott <[email protected]> Author: Michael Pigott <[email protected]> Closes apache#3134 from mikepigott/jdbc-column-metadata and squashes the following commits: 02f2f34 <Mike Pigott> ARROW-3966: Picking up lost change to support null calendars. 7049c36 <Mike Pigott> Merge branch 'master' into jdbc-column-metadata e9a9b2b <Michael Pigott> Merge pull request #6 from apache/master 65741a9 <Mike Pigott> ARROW-3966: Code review feedback cc6cc88 <Mike Pigott> ARROW-3966: Using a 1:N loop instead of a 0:N-1 loop for fewer index offsets in code. cfb2ba6 <Mike Pigott> ARROW-3966: Using a helper method for building a UTC calendar with root locale. 2928513 <Mike Pigott> ARROW-3966: Moving the metadata flag assignment into the builder. 69022c2 <Mike Pigott> ARROW-3966: Fixing merge. 4a6de86 <Mike Pigott> Merge branch 'master' into jdbc-column-metadata 509a1cc <Michael Pigott> Merge pull request #5 from apache/master 789c8c8 <Michael Pigott> Merge pull request #4 from apache/master e5b19ee <Michael Pigott> Merge pull request #3 from apache/master 3b17c29 <Michael Pigott> Merge pull request #2 from apache/master d847ebc <Mike Pigott> Fixing file location 1ceac9e <Mike Pigott> Merge branch 'master' into jdbc-column-metadata 881c6c8 <Michael Pigott> Merge pull request #1 from apache/master 03091a8 <Mike Pigott> Unit tests for including result set metadata. 72d64cc <Mike Pigott> Affirming the field metadata is empty when the configuration excludes field metadata. 7b4527c <Mike Pigott> Test for the include-metadata flag in the configuration. 7e9ce37 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata bb3165b <Mike Pigott> Updating the function calls to use the JdbcToArrowConfig versions. a6fb1be <Mike Pigott> Fixing function call 5bfd6a2 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata 68c91e7 <Mike Pigott> Modifying the jdbcToArrowSchema and jdbcToArrowVectors methods to receive JdbcToArrowConfig objects. b5b0cb1 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata 8d6cf00 <Mike Pigott> Documentation for public static VectorSchemaRoot sqlToArrow(Connection connection, String query, JdbcToArrowConfig config) 4f1260c <Mike Pigott> Adding documentation for public static VectorSchemaRoot sqlToArrow(ResultSet resultSet, JdbcToArrowConfig config) e34a9e7 <Mike Pigott> Fixing formatting. fe097c8 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata df632e3 <Mike Pigott> Updating the SQL tests to include JdbcToArrowConfig versions. b270044 <Mike Pigott> Updated validaton & documentation, and unit tests for the new JdbcToArrowConfig. da77cbe <Mike Pigott> Creating a configuration class for the JDBC-to-Arrow converter. a78c770 <Mike Pigott> Updating Javadocs. 523387f <Mike Pigott> Updating the API to support an optional 'includeMetadata' field. 5af1b5b <Mike Pigott> Separating out the field-type creation from the field creation.
…igNum - Use stride in `DataBufferBuilder.set` to ensure that Int64Builder is consistent with and without BigNum. - Use `WideBufferBuilder` for `Int64` and `Uint64` builders, even when `BigInt` is not available. Moves check for `BigInt` availability into `WideBufferBuilder`. (Thanks to Paul Taylor) Author: Brian Hulette <[email protected]> Author: ptaylor <[email protected]> Closes apache#4691 from TheNeuralBit/js-data-buffer-builder-stride and squashes the following commits: 7862612 <Brian Hulette> Add clarifying comment for int64 generation in builder tests b08ac1b <Brian Hulette> Merge pull request #3 from trxcllnt/js/data-buffer-builder-64bit-stride 2e4ce7a <Brian Hulette> Use stride in DataBufferBuilder.set 4567d07 <ptaylor> use WideBufferBuilder for Int64 and Uint64 builders dc9ed3a <ptaylor> update package-lock.json
This updates the language in `install_arrow()` to follow the README revision that will land in https://github.com/apache/arrow/pull/4948/files#diff-563b2cb2c8c2d51b2ff6b177e2d84286R33. The [Jira ticket](https://issues.apache.org/jira/browse/ARROW-6142) requested three things; this is `#2` in the list. On `#1`, I defer to the C++ installation docs, which are already included in the install_arrow message, rather than duplicating content here. `#3` is out of scope. Closes apache#5027 from nealrichardson/no-ppa and squashes the following commits: 80b142e <Neal Richardson> s/arrow/Arrow/ 44c9659 <Neal Richardson> Tweak language again 36cfe28 <Neal Richardson> Further linux install revisions 79bd7e0 <Neal Richardson> One more PPurge 63f75bd <Neal Richardson> Revise install_arrow instructions for Linux Authored-by: Neal Richardson <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…comments. The reset method allow the data structures to be re-used so they don't have to be allocated over and over again. Closes apache#6430 from richardartoul/ra/merge-upstream and squashes the following commits: 5a08281 <Richard Artoul> Add license to test file d76be05 <Richard Artoul> Add test for data reset d102b1f <Richard Artoul> Add tests d3e6e67 <Richard Artoul> cleanup comments c8525ae <Richard Artoul> Add Reset method to int array (#5) 489ca25 <Richard Artoul> Fix array.setData() to retain before release (#4) 88cd05f <Richard Artoul> Add reset method to Data (#3) 6d1b277 <Richard Artoul> Add Reset() method to String array (#2) dca2303 <Richard Artoul> Add Reset method to buffer and cleanup comments (#1) Lead-authored-by: Richard Artoul <[email protected]> Co-authored-by: Richard Artoul <[email protected]> Signed-off-by: Sebastien Binet <[email protected]>
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). apache#7131 enabled a minimal set of tests as a starting point. I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`. ``` $ git log | head -1 commit ed5f534 % ctest ... Start 1: arrow-array-test 1/51 Test #1: arrow-array-test ..................... Passed 4.62 sec Start 2: arrow-buffer-test 2/51 Test #2: arrow-buffer-test .................... Passed 0.14 sec Start 3: arrow-extension-type-test 3/51 Test #3: arrow-extension-type-test ............ Passed 0.12 sec Start 4: arrow-misc-test 4/51 Test #4: arrow-misc-test ...................... Passed 0.14 sec Start 5: arrow-public-api-test 5/51 Test #5: arrow-public-api-test ................ Passed 0.12 sec Start 6: arrow-scalar-test 6/51 Test #6: arrow-scalar-test .................... Passed 0.13 sec Start 7: arrow-type-test 7/51 Test #7: arrow-type-test ...................... Passed 0.14 sec Start 8: arrow-table-test 8/51 Test #8: arrow-table-test ..................... Passed 0.13 sec Start 9: arrow-tensor-test 9/51 Test #9: arrow-tensor-test .................... Passed 0.13 sec Start 10: arrow-sparse-tensor-test 10/51 Test #10: arrow-sparse-tensor-test ............. Passed 0.16 sec Start 11: arrow-stl-test 11/51 Test #11: arrow-stl-test ....................... Passed 0.12 sec Start 12: arrow-concatenate-test 12/51 Test #12: arrow-concatenate-test ............... Passed 0.53 sec Start 13: arrow-diff-test 13/51 Test #13: arrow-diff-test ...................... Passed 1.45 sec Start 14: arrow-c-bridge-test 14/51 Test apache#14: arrow-c-bridge-test .................. Passed 0.18 sec Start 15: arrow-io-buffered-test 15/51 Test apache#15: arrow-io-buffered-test ............... Passed 0.20 sec Start 16: arrow-io-compressed-test 16/51 Test apache#16: arrow-io-compressed-test ............. Passed 3.48 sec Start 17: arrow-io-file-test 17/51 Test apache#17: arrow-io-file-test ................... Passed 0.74 sec Start 18: arrow-io-hdfs-test 18/51 Test apache#18: arrow-io-hdfs-test ................... Passed 0.12 sec Start 19: arrow-io-memory-test 19/51 Test apache#19: arrow-io-memory-test ................. Passed 2.77 sec Start 20: arrow-utility-test 20/51 Test apache#20: arrow-utility-test ...................***Failed 5.65 sec Start 21: arrow-threading-utility-test 21/51 Test apache#21: arrow-threading-utility-test ......... Passed 1.34 sec Start 22: arrow-compute-compute-test 22/51 Test apache#22: arrow-compute-compute-test ........... Passed 0.13 sec Start 23: arrow-compute-boolean-test 23/51 Test apache#23: arrow-compute-boolean-test ........... Passed 0.15 sec Start 24: arrow-compute-cast-test 24/51 Test apache#24: arrow-compute-cast-test .............. Passed 0.22 sec Start 25: arrow-compute-hash-test 25/51 Test apache#25: arrow-compute-hash-test .............. Passed 2.61 sec Start 26: arrow-compute-isin-test 26/51 Test apache#26: arrow-compute-isin-test .............. Passed 0.81 sec Start 27: arrow-compute-match-test 27/51 Test apache#27: arrow-compute-match-test ............. Passed 0.40 sec Start 28: arrow-compute-sort-to-indices-test 28/51 Test apache#28: arrow-compute-sort-to-indices-test ... Passed 3.33 sec Start 29: arrow-compute-nth-to-indices-test 29/51 Test apache#29: arrow-compute-nth-to-indices-test .... Passed 1.51 sec Start 30: arrow-compute-util-internal-test 30/51 Test apache#30: arrow-compute-util-internal-test ..... Passed 0.13 sec Start 31: arrow-compute-add-test 31/51 Test apache#31: arrow-compute-add-test ............... Passed 0.12 sec Start 32: arrow-compute-aggregate-test 32/51 Test apache#32: arrow-compute-aggregate-test ......... Passed 14.70 sec Start 33: arrow-compute-compare-test 33/51 Test apache#33: arrow-compute-compare-test ........... Passed 7.96 sec Start 34: arrow-compute-take-test 34/51 Test apache#34: arrow-compute-take-test .............. Passed 4.80 sec Start 35: arrow-compute-filter-test 35/51 Test apache#35: arrow-compute-filter-test ............ Passed 8.23 sec Start 36: arrow-dataset-dataset-test 36/51 Test apache#36: arrow-dataset-dataset-test ........... Passed 0.25 sec Start 37: arrow-dataset-discovery-test 37/51 Test apache#37: arrow-dataset-discovery-test ......... Passed 0.13 sec Start 38: arrow-dataset-file-ipc-test 38/51 Test apache#38: arrow-dataset-file-ipc-test .......... Passed 0.21 sec Start 39: arrow-dataset-file-test 39/51 Test apache#39: arrow-dataset-file-test .............. Passed 0.12 sec Start 40: arrow-dataset-filter-test 40/51 Test apache#40: arrow-dataset-filter-test ............ Passed 0.16 sec Start 41: arrow-dataset-partition-test 41/51 Test apache#41: arrow-dataset-partition-test ......... Passed 0.13 sec Start 42: arrow-dataset-scanner-test 42/51 Test apache#42: arrow-dataset-scanner-test ........... Passed 0.20 sec Start 43: arrow-filesystem-test 43/51 Test apache#43: arrow-filesystem-test ................ Passed 1.62 sec Start 44: arrow-hdfs-test 44/51 Test apache#44: arrow-hdfs-test ...................... Passed 0.13 sec Start 45: arrow-feather-test 45/51 Test apache#45: arrow-feather-test ................... Passed 0.91 sec Start 46: arrow-ipc-read-write-test 46/51 Test apache#46: arrow-ipc-read-write-test ............ Passed 5.77 sec Start 47: arrow-ipc-json-simple-test 47/51 Test apache#47: arrow-ipc-json-simple-test ........... Passed 0.16 sec Start 48: arrow-ipc-json-test 48/51 Test apache#48: arrow-ipc-json-test .................. Passed 0.27 sec Start 49: arrow-json-integration-test 49/51 Test apache#49: arrow-json-integration-test .......... Passed 0.13 sec Start 50: arrow-json-test 50/51 Test apache#50: arrow-json-test ...................... Passed 0.26 sec Start 51: arrow-orc-adapter-test 51/51 Test apache#51: arrow-orc-adapter-test ............... Passed 1.92 sec 98% tests passed, 1 tests failed out of 51 Label Time Summary: arrow-tests = 27.38 sec (27 tests) arrow_compute = 45.11 sec (14 tests) arrow_dataset = 1.21 sec (7 tests) arrow_ipc = 6.20 sec (3 tests) unittest = 79.91 sec (51 tests) Total Test time (real) = 79.99 sec The following tests FAILED: 20 - arrow-utility-test (Failed) Errors while running CTest ``` Closes apache#7142 from kiszk/ARROW-8754 Authored-by: Kazuaki Ishizaki <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…lure on big-endian platforms This PR gets an element data using an endianless API in Flatbuffer instead of getting a pointer. This can fix a failure of TestPlasmaSerialization.DeleteReply in plasma-serialization-tests. Without this PR ``` 1: [==========] Running 14 tests from 1 test case. 1: [----------] Global test environment set-up. 1: [----------] 14 tests from TestPlasmaSerialization 1: [ RUN ] TestPlasmaSerialization.CreateRequest 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-kk8t88p9/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.CreateRequest (2 ms) 1: [ RUN ] TestPlasmaSerialization.CreateReply 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-97gspx5v/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.CreateReply (0 ms) 1: [ RUN ] TestPlasmaSerialization.SealRequest 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-dkksx76p/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.SealRequest (1 ms) 1: [ RUN ] TestPlasmaSerialization.SealReply 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-oqbs9vm0/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.SealReply (0 ms) 1: [ RUN ] TestPlasmaSerialization.GetRequest 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-d7q6h5q4/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.GetRequest (1 ms) 1: [ RUN ] TestPlasmaSerialization.GetReply 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-sxsncs72/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.GetReply (1 ms) 1: [ RUN ] TestPlasmaSerialization.ReleaseRequest 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-njc3g3b5/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.ReleaseRequest (0 ms) 1: [ RUN ] TestPlasmaSerialization.ReleaseReply 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-917ybxmo/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.ReleaseReply (1 ms) 1: [ RUN ] TestPlasmaSerialization.DeleteRequest 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-1kwauefv/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.DeleteRequest (0 ms) 1: [ RUN ] TestPlasmaSerialization.DeleteReply 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-4ftq28pq/fileXXXXXX' 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:271: Failure 1: Value of: error_vec[0] == PlasmaError::ObjectExists 1: Actual: false 1: Expected: true 1: [ FAILED ] TestPlasmaSerialization.DeleteReply (1 ms) 1: [ RUN ] TestPlasmaSerialization.EvictRequest 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-vl97870w/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.EvictRequest (0 ms) 1: [ RUN ] TestPlasmaSerialization.EvictReply 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-3am9a6rv/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.EvictReply (1 ms) 1: [ RUN ] TestPlasmaSerialization.DataRequest 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-plye5tmm/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.DataRequest (0 ms) 1: [ RUN ] TestPlasmaSerialization.DataReply 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-mbu6lqsq/fileXXXXXX' 1: [ OK ] TestPlasmaSerialization.DataReply (1 ms) 1: [----------] 14 tests from TestPlasmaSerialization (9 ms total) 1: 1: [----------] Global test environment tear-down 1: [==========] 14 tests from 1 test case ran. (9 ms total) 1: [ PASSED ] 13 tests. 1: [ FAILED ] 1 test, listed below: 1: [ FAILED ] TestPlasmaSerialization.DeleteReply 1: 1: 1 FAILED TEST 1: /home/ishizaki/Arrow/arrow/cpp/src/plasma 1/3 Test #1: plasma-serialization-tests .......***Failed 0.27 sec ... 3/3 Test #3: plasma-external-store-tests ...... Passed 0.46 sec ``` With this PR ``` $ ctest Test project /home/ishizaki/Arrow/arrow/cpp/src/plasma Start 1: plasma-serialization-tests 1/3 Test #1: plasma-serialization-tests ....... Passed 0.26 sec Start 2: plasma-client-tests 2/3 Test #2: plasma-client-tests .............. Passed 14.99 sec Start 3: plasma-external-store-tests 3/3 Test #3: plasma-external-store-tests ...... Passed 0.49 sec 100% tests passed, 0 tests failed out of 3 Label Time Summary: plasma-tests = 15.74 sec (3 tests) unittest = 15.74 sec (3 tests) Total Test time (real) = 15.74 sec ``` Closes apache#7148 from kiszk/ARROW-8759 Authored-by: Kazuaki Ishizaki <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
From a deadlocked run... ``` #0 0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #3 0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #4 0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #5 0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #6 0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #7 0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #8 0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #9 0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so #10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so ``` The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock. To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests. Closes apache#9842 from westonpace/bugfix/arrow-12040 Lead-authored-by: Weston Pace <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Before change:
```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
#0 0x522f09 in
#1 0x7f28ae5826f4 in
#2 0x7f28ae57fa5d in
#3 0x7f28ae58cb0f in
#4 0x7f28ae58bda0 in
...
```
After change:
```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
#0 0x522f09 in posix_memalign (/build/cpp/debug/arrow-dataset-file-csv-test+0x522f09)
#1 0x7f28ae5826f4 in arrow::(anonymous namespace)::SystemAllocator::AllocateAligned(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:213:24
#2 0x7f28ae57fa5d in arrow::BaseMemoryPoolImpl<arrow::(anonymous namespace)::SystemAllocator>::Allocate(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:405:5
#3 0x7f28ae58cb0f in arrow::PoolBuffer::Reserve(long) /arrow/cpp/src/arrow/memory_pool.cc:717:9
#4 0x7f28ae58bda0 in arrow::PoolBuffer::Resize(long, bool) /arrow/cpp/src/arrow/memory_pool.cc:741:7
...
```
Closes apache#10498 from westonpace/feature/ARROW-13027--c-fix-asan-stack-traces-in-ci
Authored-by: Weston Pace <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
There are two unrelated functionalities here, I gathered them for the sake of reviewing.
The file
bpacking.his quite large, but it has a very repetitive structure.