From 9d4ccccdfebf4c23bafaf72c837c9aad7984f494 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Thu, 11 Jul 2024 10:57:09 +0200 Subject: [PATCH] MINOR: [Release] Update CHANGELOG.md for 17.0.0 --- CHANGELOG.md | 337 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 337 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6101f5d3cac25..3364204730dd9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,341 @@ +# Apache Arrow 17.0.0 (2024-07-01 07:00:00+00:00) + +## Bug Fixes + +* [GH-15053](https://github.com/apache/arrow/issues/15053) - [C++] Add option to string 'center' kernel to control left/right alignment on odd number of padding (#41449) +* [GH-30866](https://github.com/apache/arrow/issues/30866) - [Java] fix SplitAndTransfer throws for (0,0) if vector empty (#41066) +* [GH-34484](https://github.com/apache/arrow/issues/34484) - [Substrait] add an option to disable augmented fields (#41583) +* [GH-37669](https://github.com/apache/arrow/issues/37669) - [C++][Python] Fix casting to extension type with fixed size list storage type (#42219) +* [GH-38553](https://github.com/apache/arrow/issues/38553) - [C++] Replace null_count with MayHaveNulls in ListArrayFromArray and MapArray (#41957) +* [GH-38575](https://github.com/apache/arrow/issues/38575) - [Python] Include metadata when creating pa.schema from PyCapsule (#41538) +* [GH-38770](https://github.com/apache/arrow/issues/38770) - [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971) +* [GH-39129](https://github.com/apache/arrow/issues/39129) - [Python] pa.array: add check for byte-swapped numpy arrays inside python objects (#41549) +* [GH-39489](https://github.com/apache/arrow/issues/39489) - [C++][Parquet] Timestamp conversion from Parquet to Arrow does not follow compatibility guidelines for convertedType +* [GH-39645](https://github.com/apache/arrow/issues/39645) - [Python] Fix read_table for encrypted parquet (#39438) +* [GH-40270](https://github.com/apache/arrow/issues/40270) - [C++] Use LargeStringArray for casting when writing tables to CSV (#40271) +* [GH-40560](https://github.com/apache/arrow/issues/40560) - [Python] RunEndEncodedArray.from_arrays: bugfix for Array arguments (#40560) (#41093) +* [GH-40750](https://github.com/apache/arrow/issues/40750) - [C++][Python] Map child Array constructed from keys and items shouldn't have offset (#40871) +* [GH-40913](https://github.com/apache/arrow/issues/40913) - [C++] Fix compile warning with 'implicitly-defined constructor does not initialize' in encoding_benchmark (#41060) +* [GH-40997](https://github.com/apache/arrow/issues/40997) - [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 (#40998) +* [GH-41112](https://github.com/apache/arrow/issues/41112) - [C++] Clean up unused parameter warnings (#41111) +* [GH-41149](https://github.com/apache/arrow/issues/41149) - [C++][Acero] Fix asof join race (#41614) +* [GH-41164](https://github.com/apache/arrow/issues/41164) - [C#] Fix concatenation of sliced arrays (#41245) +* [GH-41190](https://github.com/apache/arrow/issues/41190) - [C++] support for single threaded joins (#41125) +* [GH-41192](https://github.com/apache/arrow/issues/41192) - [C++] Fix hashjoin benchmark failed at make utf8's random batches (#41195) +* [GH-41198](https://github.com/apache/arrow/issues/41198) - [C#] Fix concatenation of union arrays (#41226) +* [GH-41199](https://github.com/apache/arrow/issues/41199) - [C#] Fix accessing values of a sliced decimal array (#41200) +* [GH-41258](https://github.com/apache/arrow/issues/41258) - [C#][Integration] Fix comparison of sliced validity buffers with non-zero offsets (#41259) +* [GH-41263](https://github.com/apache/arrow/issues/41263) - [C#][Integration] Ensure offset is considered in all branches of the bitmap comparison (#41264) +* [GH-41282](https://github.com/apache/arrow/issues/41282) - [Dev] Always prompt next major version on merge script if it exists (#41305) +* [GH-41306](https://github.com/apache/arrow/issues/41306) - [C++] Check to avoid copying when NullBitmapBuffer is Null (#41452) +* [GH-41317](https://github.com/apache/arrow/issues/41317) - [C++] Fix crash on invalid Parquet file (#41366) +* [GH-41319](https://github.com/apache/arrow/issues/41319) - [Python] \`test\_numpy\_array\_protocol\` test failures with numpy 2.0.0rc1 +* [GH-41321](https://github.com/apache/arrow/issues/41321) - [C++][Parquet] More strict Parquet level checking (#41346) +* [GH-41329](https://github.com/apache/arrow/issues/41329) - [C++][Gandiva] Fix gandiva cache size env var (#41330) +* [GH-41340](https://github.com/apache/arrow/issues/41340) - [C++][CMake][Windows] Remove needless .dll suffix from link libraries (#41341) +* [GH-41343](https://github.com/apache/arrow/issues/41343) - [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345) +* [GH-41356](https://github.com/apache/arrow/issues/41356) - [Release][Docs] Update post release documentation task to remove the warnings banner for stable version (#41377) +* [GH-41367](https://github.com/apache/arrow/issues/41367) - [C++][maybe_unused] with Arrow macro (#41359) +* [GH-41371](https://github.com/apache/arrow/issues/41371) - [CI][Release] Use the latest Ruby on macOS (#41379) +* [GH-41390](https://github.com/apache/arrow/issues/41390) - [CI] Use setup-python GitHub action on csharp macOS job (#41392) +* [GH-41397](https://github.com/apache/arrow/issues/41397) - [C#] Downgrade macOS test runner to avoid infrastructure bug (#41934) +* [GH-41418](https://github.com/apache/arrow/issues/41418) - [C++][Large] ListView and Map nested types for scalar_if_else's kernel functions (#41419) +* [GH-41426](https://github.com/apache/arrow/issues/41426) - [R][CI] Install CRAN style openssl on gh runners. (#41629) +* [GH-41433](https://github.com/apache/arrow/issues/41433) - [C++][Gandiva] Fix ascii_utf8 function to return same result on x86 and Arm (#41434) +* [GH-41464](https://github.com/apache/arrow/issues/41464) - [Python] Fix StructArray.sort() for by=None (#41495) +* [GH-41467](https://github.com/apache/arrow/issues/41467) - [CI][Release] Don't push conda-verify-rc image (#41468) +* [GH-41470](https://github.com/apache/arrow/issues/41470) - [C++] Reuse deduplication logic for direct registration (#41466) +* [GH-41471](https://github.com/apache/arrow/issues/41471) - [Java] Fix performance uber-jar (#41473) +* [GH-41475](https://github.com/apache/arrow/issues/41475) - [Python] Build with Python 3.13 (#42034) +* [GH-41478](https://github.com/apache/arrow/issues/41478) - [C++] Clean up more redundant move warnings (#41487) +* [GH-41491](https://github.com/apache/arrow/issues/41491) - [Python] remove special methods related to buffers in python <2.6 (#41492) +* [GH-41502](https://github.com/apache/arrow/issues/41502) - [Python] Fix reading column index with decimal values (#41503) +* [GH-41529](https://github.com/apache/arrow/issues/41529) - [C++][Compute] Remove redundant logic for ArrayData as ExecResults in ExecScalarCaseWhen (#41380) +* [GH-41534](https://github.com/apache/arrow/issues/41534) - [Go] Fix mem leak importing 0 length C Array (#41535) +* [GH-41541](https://github.com/apache/arrow/issues/41541) - [Go][Parquet] More fixes for writer performance regression (#42003) +* [GH-41541](https://github.com/apache/arrow/issues/41541) - [Go][Parquet] Fix writer performance regression (#41638) +* [GH-41571](https://github.com/apache/arrow/issues/41571) - [Java] Revert GH-41307 (#41309) (#41628) +* [GH-41573](https://github.com/apache/arrow/issues/41573) - [Java] VectorSchemaRoot uses inefficient stream to copy fieldVectors (#41574) +* [GH-41581](https://github.com/apache/arrow/issues/41581) - [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582) +* [GH-41587](https://github.com/apache/arrow/issues/41587) - [Docs][Python] Remove duplicate contents (#41588) +* [GH-41602](https://github.com/apache/arrow/issues/41602) - [C#] Resolve build warnings (#41645) +* [GH-41617](https://github.com/apache/arrow/issues/41617) - [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622) +* [GH-41630](https://github.com/apache/arrow/issues/41630) - [Benchmarking] Fix out-of-source build in benchmarks (#41631) +* [GH-41648](https://github.com/apache/arrow/issues/41648) - [Java] Memory Leak about splitAndTransfer (#41898) +* [GH-41660](https://github.com/apache/arrow/issues/41660) - [CI][Java] Restore devtoolset relatead GANDIVA_CXX_FLAGS (#41661) +* [GH-41679](https://github.com/apache/arrow/issues/41679) - [Release][Packaging][deb] Update package name in 01-preparesh too (#41859) +* [GH-41684](https://github.com/apache/arrow/issues/41684) - [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757) +* [GH-41686](https://github.com/apache/arrow/issues/41686) - [Java] Nullability of struct child vectors not preserved in TransferPair (#41785) +* [GH-41688](https://github.com/apache/arrow/issues/41688) - [Dev] Include all relevant CMakeLists.txt files in cmake-format precommit hook (#41689) +* [GH-41697](https://github.com/apache/arrow/issues/41697) - [Go][Parquet] Release BufferWriter when BufferedPageWriter is closed (#41698) +* [GH-41699](https://github.com/apache/arrow/issues/41699) - [Python][Parquet] Implement to_dict method on SortingColumn (#41704) +* [GH-41711](https://github.com/apache/arrow/issues/41711) - [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712) +* [GH-41717](https://github.com/apache/arrow/issues/41717) - [Java][Vector] fix issue with ByteBuffer rewind in MessageSerializer (#41718) +* [GH-41720](https://github.com/apache/arrow/issues/41720) - [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark (#41716) +* [GH-41725](https://github.com/apache/arrow/issues/41725) - [Python] CMake: ignore Parquet encryption option if Parquet itself is not enabled (fix Java integration build) (#41776) +* [GH-41735](https://github.com/apache/arrow/issues/41735) - [CI][Archery] Update archery to be compatible with pygit2 1.15 API change (#41739) +* [GH-41738](https://github.com/apache/arrow/issues/41738) - [C++] Fix the issue that temp vector stack may be under sized (#41746) +* [GH-41741](https://github.com/apache/arrow/issues/41741) - [C++] Check that extension metadata key is present before attempting to delete it (#41763) +* [GH-41758](https://github.com/apache/arrow/issues/41758) - [Python] Disallow direct pa.RecordBatchReader() construction to avoid segfaults (#41773) +* [GH-41771](https://github.com/apache/arrow/issues/41771) - [C++] Iterator releases its resource immediately when it reads all values (#41824) +* [GH-41780](https://github.com/apache/arrow/issues/41780) - [C++][Flight][Benchmark] Ensure waiting server ready (#41793) +* [GH-41784](https://github.com/apache/arrow/issues/41784) - [Packaging][RPM] Use SO version for -libs package name (#41838) +* [GH-41787](https://github.com/apache/arrow/issues/41787) - Update fmpp-maven-plugin output directory (#41788) +* [GH-41791](https://github.com/apache/arrow/issues/41791) - [CI][Conda] Update azure.linux.yml task, replace CondaEnvironment@1 with Bash@3 (#41883) +* [GH-41813](https://github.com/apache/arrow/issues/41813) - [C++] Fix avx2 gather offset larger than 2GB in `CompareColumnsToRows` (#42188) +* [GH-41829](https://github.com/apache/arrow/issues/41829) - [R] Update relative URLs in README to absolute paths to prevent CRAN check failures (#41830) +* [GH-41836](https://github.com/apache/arrow/issues/41836) - [Java] Fix an undefined symbol error when ARROW_S3=OFF (#41837) +* [GH-41862](https://github.com/apache/arrow/issues/41862) - [C++][S3] Fix potential deadlock when closing output stream (#41876) +* [GH-41884](https://github.com/apache/arrow/issues/41884) - [Python] Fix RecordBatchReader.cast to support casting to equal schema for all types (#42098) +* [GH-41902](https://github.com/apache/arrow/issues/41902) - [Java] Variadic Buffer Counts Incorrect (#41930) +* [GH-41903](https://github.com/apache/arrow/issues/41903) - [CI][GLib] Use the latest Ruby to use OpenSSL 3 (#42001) +* [GH-41920](https://github.com/apache/arrow/issues/41920) - [CI][JS] Add missing build directory argument (#41921) +* [GH-41924](https://github.com/apache/arrow/issues/41924) - [Python] Fix tests when using NumPy 2.0 on Windows (#42099) +* [GH-41964](https://github.com/apache/arrow/issues/41964) - [CI][C++] Clear cache for mamba on AppVeyor (#41977) +* [GH-42005](https://github.com/apache/arrow/issues/42005) - [Java][Integration][CI] Fix ARROW_BUILD_ROOT Path to find pom.xml (#42008) +* [GH-42006](https://github.com/apache/arrow/issues/42006) - [CI][Python] Use pip install -e instead of setup.py build_ext --inplace for installing pyarrow on verification script (#42007) +* [GH-42015](https://github.com/apache/arrow/issues/42015) - [MATLAB] Executing `tfeather.m` test class causes MATLAB to crash on `windows-2022` after MSVC update from 14.39.33519 to 14.40.33807 (#42123) +* [GH-42017](https://github.com/apache/arrow/issues/42017) - [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022) +* [GH-42039](https://github.com/apache/arrow/issues/42039) - [Docs][Go] Fix broken link (#42040) +* [GH-42041](https://github.com/apache/arrow/issues/42041) - [Swift] Fix nullable type decoder issue (#42043) +* [GH-42065](https://github.com/apache/arrow/issues/42065) - [C++] Support list-views on list_slice (#42067) +* [GH-42104](https://github.com/apache/arrow/issues/42104) - [C++] Fix an OTel test failure and remove needless logs (#42122) +* [GH-42107](https://github.com/apache/arrow/issues/42107) - [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol (#42108) +* [GH-42116](https://github.com/apache/arrow/issues/42116) - [C++] Support list-view typed arrays in array_take and array_filter (#42117) +* [GH-42130](https://github.com/apache/arrow/issues/42130) - [GLib] Fix building gir files with MSVC (#42131) +* [GH-42136](https://github.com/apache/arrow/issues/42136) - [CI][Go][Java][JS] Use AMD64-based macOS explicitly (#42175) +* [GH-42139](https://github.com/apache/arrow/issues/42139) - [C++] Fix some potential uninitialized variable warnings (#42207) +* [GH-42140](https://github.com/apache/arrow/issues/42140) - [C++] Avoid invalid accesses in parquet-encoding-benchmark (#42141) +* [GH-42149](https://github.com/apache/arrow/issues/42149) - [C++] Use FetchContent for bundled ORC (#43011) +* [GH-42170](https://github.com/apache/arrow/issues/42170) - [Python][CI] Update expected output for numpy 2.0.0 (#42172) +* [GH-42197](https://github.com/apache/arrow/issues/42197) - [CI][Packaging][Java] Ensure updating "python@*" formulae on macOS (#42202) +* [GH-42198](https://github.com/apache/arrow/issues/42198) - [C++] Fix GetRecordBatchPayload crashes for device data (#42199) +* [GH-42208](https://github.com/apache/arrow/issues/42208) - [Java] Fix the Test in flight-sql-jdbc-driver Module (#42217) +* [GH-42213](https://github.com/apache/arrow/issues/42213) - [Swift] Use "--warnings-as-errors" only on CI (#42214) +* [GH-42220](https://github.com/apache/arrow/issues/42220) - [R] handle vctrs_rcrd extension type in metadata cleaning (#42226) +* [GH-42224](https://github.com/apache/arrow/issues/42224) - [Java] Fix Typo in TestAceroSubstraitConsumer Test Method (#42225) +* [GH-42232](https://github.com/apache/arrow/issues/42232) - [C++] Use non-stale c-ares download URL (#42250) +* [GH-42234](https://github.com/apache/arrow/issues/42234) - [CI][R] Disable libarrow binary use on valgrind tests (#42249) +* [GH-43048](https://github.com/apache/arrow/issues/43048) - [JAVA] Fix IndexOutOfBoundsException message by reporting index correctly (#43049) +* [GH-43058](https://github.com/apache/arrow/issues/43058) - [C#] Revert upgrade of Xunit from 2.8.0 to 2.8.1 (#43074) +* [GH-43059](https://github.com/apache/arrow/issues/43059) - [CI][Gandiva] Disable Python Gandiva tests on AlmaLinux 8 (#43093) +* [GH-43062](https://github.com/apache/arrow/issues/43062) - [Go] Use calloc instead of malloc (#43052) +* [GH-43070](https://github.com/apache/arrow/issues/43070) - [C++][Parquet] Check for valid ciphertext length to prevent segfault (#43071) +* [GH-43116](https://github.com/apache/arrow/issues/43116) - [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as large memory test (#43128) +* [GH-43119](https://github.com/apache/arrow/issues/43119) - [CI][Packaging] Update manylinux 2014 CentOS repos that have been deprecated (#43121) +* [GH-43122](https://github.com/apache/arrow/issues/43122) - [CI][Packaging][RPM][CentOS] Use vault.centos.org for SCL (#43127) +* [GH-43134](https://github.com/apache/arrow/issues/43134) - [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136) +* [GH-43158](https://github.com/apache/arrow/issues/43158) - [Packaging] Use bundled nlohmann/json on AlmaLinux 8/CentOS Stream 8 (#43159) +* [GH-43199](https://github.com/apache/arrow/issues/43199) - [CI][Packaging] dev/release/utils-create-release-tarball.sh should not include the release candidate number in the name of the tarball's top-level directory. (#43200) +* [GH-43204](https://github.com/apache/arrow/issues/43204) - [CI][Packaging] Apply vcpkg patch to fix Thrift version (#43208) + + +## New Features and Improvements + +* [GH-29537](https://github.com/apache/arrow/issues/29537) - [R] Support mutate/summarize with implicit join (#41350) +* [GH-33484](https://github.com/apache/arrow/issues/33484) - [C++][Compute] Implement `Grouper::Reset` (#41352) +* [GH-35804](https://github.com/apache/arrow/issues/35804) - [CI][Packaging][Conan] Synchronize upstream conan (#39729) +* [GH-35888](https://github.com/apache/arrow/issues/35888) - [Java] Add FlightStatusCode.RESOURCE_EXHAUSTED (#41508) +* [GH-37333](https://github.com/apache/arrow/issues/37333) - [Python] Replace pandas.util.testing.rands with vendored version (#42089) +* [GH-37720](https://github.com/apache/arrow/issues/37720) - [Go][FlightSQL] Add prepared statement handle to DoPut result (#40311) +* [GH-37728](https://github.com/apache/arrow/issues/37728) - [Java] Add methods to get an Iterable for a ValueVector (#41895) +* [GH-37929](https://github.com/apache/arrow/issues/37929) - [Python] begin moving static settings to pyproject.toml (#41041) +* [GH-37938](https://github.com/apache/arrow/issues/37938) - [Swift] Add initial C data interface implementation (#41342) +* [GH-38255](https://github.com/apache/arrow/issues/38255) - [Go][C++] Implement Flight SQL Bulk Ingestion (#38385) +* [GH-38325](https://github.com/apache/arrow/issues/38325) - [Python] Implement PyCapsule interface for Device data in PyArrow (#40717) +* [GH-38325](https://github.com/apache/arrow/issues/38325) - [Python] Expand the Arrow PyCapsule Interface with C Device Data support (#40708) +* [GH-38692](https://github.com/apache/arrow/issues/38692) - [C#] Implement ICollection on scalar arrays (#41539) +* [GH-39204](https://github.com/apache/arrow/issues/39204) - [Format][FlightRPC][Docs] Stabilize Flight SQL (#41657) +* [GH-39220](https://github.com/apache/arrow/issues/39220) - [Python] Let RecordBatch.filter accept a boolean expression in addition to mask array (#43043) +* [GH-39301](https://github.com/apache/arrow/issues/39301) - [Archery][CI][Integration] Add nanoarrow to archery + integration setup (#39302) +* [GH-39344](https://github.com/apache/arrow/issues/39344) - [C++][FS][Azure] Support azure cli auth (#41976) +* [GH-39345](https://github.com/apache/arrow/issues/39345) - [C++][FS][Azure] Add support for environment credential (#41715) +* [GH-39649](https://github.com/apache/arrow/issues/39649) - [Java][CI] Fix or suppress spurious errorprone warnings stage 2 (#39777) +* [GH-39722](https://github.com/apache/arrow/issues/39722) - [JS] Clean up packaging (#39723) +* [GH-39798](https://github.com/apache/arrow/issues/39798) - [C++] Optimize Take for fixed-size types including nested fixed-size lists (#41297) +* [GH-39858](https://github.com/apache/arrow/issues/39858) - [C++][Device] Add Copy/View slice functions to a CPU pointer (#41477) +* [GH-39898](https://github.com/apache/arrow/issues/39898) - [C++] Add support for OpenTelemetry logging (#39905) +* [GH-39990](https://github.com/apache/arrow/issues/39990) - [Docs][CI] Add sphinx-lint for docs linting (#40022) +* [GH-40078](https://github.com/apache/arrow/issues/40078) - [C++] Import/Export ArrowDeviceArrayStream (#40807) +* [GH-40339](https://github.com/apache/arrow/issues/40339) - [Java] StringView Initial Implementation (#40340) +* [GH-40342](https://github.com/apache/arrow/issues/40342) - [Python] Fix pickling of LocalFileSystem for cython 2 (#41459) +* [GH-40342](https://github.com/apache/arrow/issues/40342) - [C++] move LocalFileSystem to the registry (#40356) +* [GH-40361](https://github.com/apache/arrow/issues/40361) - [C++] Make flatbuffers serialization more deterministic (#40392) +* [GH-40384](https://github.com/apache/arrow/issues/40384) - [Python] Expand the C Device Interface bindings to support import on CUDA device (#40385) +* [GH-40494](https://github.com/apache/arrow/issues/40494) - [Go] add support for protobuf messages (#40496) +* [GH-40644](https://github.com/apache/arrow/issues/40644) - [Python] Allow passing a mapping of column names to `rename_columns` (#40645) +* [GH-40734](https://github.com/apache/arrow/issues/40734) - [Packaging][Debian] Drop support for Debian bullseye (#41394) +* [GH-40749](https://github.com/apache/arrow/issues/40749) - [Python][Packaging] Strip unnecessary symbols when building wheels (#42028) +* [GH-40819](https://github.com/apache/arrow/issues/40819) - [Java] Adding Spotless to Algorithm module (#41825) +* [GH-40820](https://github.com/apache/arrow/issues/40820) - [Java] Adding Spotless to Adapter module (#42048) +* [GH-40822](https://github.com/apache/arrow/issues/40822) - [Java] Adding Spotless to C module (#42059) +* [GH-40823](https://github.com/apache/arrow/issues/40823) - [Java] Adding Spotless to Compression module (#42060) +* [GH-40824](https://github.com/apache/arrow/issues/40824) - [Java] Adding Spotless to Dataset module (#42062) +* [GH-40825](https://github.com/apache/arrow/issues/40825) - [Java] Adding Spotless to Flight module (#42063) +* [GH-40826](https://github.com/apache/arrow/issues/40826) - [Java] Adding Spotless to Format module +* [GH-40827](https://github.com/apache/arrow/issues/40827) - [Java] Adding Spotless to Gandiva module (#42055) +* [GH-40828](https://github.com/apache/arrow/issues/40828) - [Java] Format arrow-maven-plugins modules (#42054) +* [GH-40829](https://github.com/apache/arrow/issues/40829) - [Java] Adding Spotless to Memory modules (#42056) +* [GH-40830](https://github.com/apache/arrow/issues/40830) - [Java] Adding Spotless to Performance module (#42057) +* [GH-40831](https://github.com/apache/arrow/issues/40831) - [Java] Adding Spotless to Tools module (#42058) +* [GH-40832](https://github.com/apache/arrow/issues/40832) - [Java] Adding Spotless to Vector module (#42061) +* [GH-40930](https://github.com/apache/arrow/issues/40930) - [Java] Implement a function to retrieve reference buffers in StringView (#41796) +* [GH-40932](https://github.com/apache/arrow/issues/40932) - [Java] Implement TransferPair functionality for StringView (#41861) +* [GH-40933](https://github.com/apache/arrow/issues/40933) - [Java] Enhance the copyFrom* functionality in StringView (#41752) +* [GH-40942](https://github.com/apache/arrow/issues/40942) - [Java] Implement C Data Interface for StringView (#41967) +* [GH-40943](https://github.com/apache/arrow/issues/40943) - [Java] Implement RangeEqualsVisitor for StringView (#41636) +* [GH-40944](https://github.com/apache/arrow/issues/40944) - [Java] Implement TypeEqualsVisitor for StringView (#41606) +* [GH-40968](https://github.com/apache/arrow/issues/40968) - [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like function (#40970) +* [GH-41020](https://github.com/apache/arrow/issues/41020) - [C++] Introduce portable compiler assumptions (#41021) +* [GH-41035](https://github.com/apache/arrow/issues/41035) - [C++] Add a grouper benchmark for preventing performance regression (#41036) +* [GH-41055](https://github.com/apache/arrow/issues/41055) - [C++] Support flatten for combining nested list related types (#41092) +* [GH-41085](https://github.com/apache/arrow/issues/41085) - [CI][Java] Add Spark integration tests to "java" group in Crossbow tasks (#41086) +* [GH-41089](https://github.com/apache/arrow/issues/41089) - [C++] Clean up remaining tasks related to half float casts (#41084) +* [GH-41095](https://github.com/apache/arrow/issues/41095) - [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support (#41276) +* [GH-41102](https://github.com/apache/arrow/issues/41102) - [Packaging][Release] Create unique git tags for release candidates (e.g. apache-arrow-{MAJOR}.{MINOR}.{PATCH}-rc{RC_NUM}) (#41131) +* [GH-41105](https://github.com/apache/arrow/issues/41105) - [Python][Docs] Update PyArrow installation docs for conda package split (#41135) +* [GH-41114](https://github.com/apache/arrow/issues/41114) - [C++] Add is_validity_defined_by_bitmap() predicate (#41115) +* [GH-41116](https://github.com/apache/arrow/issues/41116) - [C++] IO: enhance boundary checking in CompressedInputStream (#41117) +* [GH-41126](https://github.com/apache/arrow/issues/41126) - [Python] Basic bindings for Device and MemoryManager classes (#41685) +* [GH-41134](https://github.com/apache/arrow/issues/41134) - [GLib] Support building arrow-glib with MSVC (#41599) +* [GH-41159](https://github.com/apache/arrow/issues/41159) - [Go][Parquet] Improvement Parquet BitWriter WriteVlqInt Performance (#41160) +* [GH-41173](https://github.com/apache/arrow/issues/41173) - [Java] Add spotless configuration for Maven pom.xml files (#41174) +* [GH-41183](https://github.com/apache/arrow/issues/41183) - [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295) +* [GH-41186](https://github.com/apache/arrow/issues/41186) - [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst (#41187) +* [GH-41203](https://github.com/apache/arrow/issues/41203) - [Python][Packaging] Ensure to build with released numpy 2.0 (instead of RC) in the wheel building workflows (#42194) +* [GH-41240](https://github.com/apache/arrow/issues/41240) - [Release][Packaging] Use Debian bookworm for uploading binaries (#41241) +* [GH-41243](https://github.com/apache/arrow/issues/41243) - [Release][Packaging] Avoid needless download by "archery crossbow download-artifacts" (#41244) +* [GH-41256](https://github.com/apache/arrow/issues/41256) - [Format][Docs] Add a canonical extension type specification for JSON (#41257) +* [GH-41262](https://github.com/apache/arrow/issues/41262) - [Java][FlightSQL] Implement stateless prepared statements (#41237) +* [GH-41287](https://github.com/apache/arrow/issues/41287) - [Java] ListViewVector Implementation (#41285) +* [GH-41298](https://github.com/apache/arrow/issues/41298) - [Format][Docs] Add a canonical extension type specification for UUID (#41299) +* [GH-41301](https://github.com/apache/arrow/issues/41301) - [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type (#41373) +* [GH-41307](https://github.com/apache/arrow/issues/41307) - [Java] Use org.apache:apache parent pom version 31 (#41772) +* [GH-41307](https://github.com/apache/arrow/issues/41307) - [Java] Use org.apache:apache parent pom version 31 (#41309) +* [GH-41314](https://github.com/apache/arrow/issues/41314) - [CI][Python] Add a job on ARM64 macOS (#41313) +* [GH-41316](https://github.com/apache/arrow/issues/41316) - [CI][Python] Reduce CI time on macOS (#41378) +* [GH-41323](https://github.com/apache/arrow/issues/41323) - [R] Redo how summarize() evaluates expressions (#41223) +* [GH-41327](https://github.com/apache/arrow/issues/41327) - [Ruby] Show type name in Arrow::Table#to_s (#41328) +* [GH-41334](https://github.com/apache/arrow/issues/41334) - [C++][Acero] Use per-node basis temp vector stack to mitigate overflow (#41335) +* [GH-41349](https://github.com/apache/arrow/issues/41349) - [C#] Optimize DecimalUtility.GetBytes(SqlDecimal) on .NET 7+ (#42150) +* [GH-41358](https://github.com/apache/arrow/issues/41358) - [R] Support join "na_matches" argument (#41372) +* [GH-41361](https://github.com/apache/arrow/issues/41361) - [C++][Parquet] Optimize DelimitRecords by batch execution when max_rep_level > 1 (#41362) +* [GH-41375](https://github.com/apache/arrow/issues/41375) - [C#] Move to .NET 8.0 (#41376) +* [GH-41385](https://github.com/apache/arrow/issues/41385) - [CI][MATLAB][Packaging] Add support for MATLAB `R2024a` in CI and crossbow packaging workflows (#41504) +* [GH-41389](https://github.com/apache/arrow/issues/41389) - [Python] Expose byte_width and bit_width of ExtensionType in terms of the storage type (#41413) +* [GH-41400](https://github.com/apache/arrow/issues/41400) - [MATLAB] Bump `libmexclass` version to commit `ca3cea6` (#41436) +* [GH-41410](https://github.com/apache/arrow/issues/41410) - [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API reference (#41411) +* [GH-41420](https://github.com/apache/arrow/issues/41420) - [R] Update NEWS.md for 16.1.0 (#41422) +* [GH-41427](https://github.com/apache/arrow/issues/41427) - [Go] Fix stateless prepared statements (#41428) +* [GH-41430](https://github.com/apache/arrow/issues/41430) - [Docs] Use sphinxcontrib-mermaid instead of generating images from .mmd (#41455) +* [GH-41435](https://github.com/apache/arrow/issues/41435) - [CI][MATLAB] Add job to build and test MATLAB Interface on `macos-14` (#41592) +* [GH-41450](https://github.com/apache/arrow/issues/41450) - [R][CI] rhub/container follow ons (#41451) +* [GH-41460](https://github.com/apache/arrow/issues/41460) - [C++] Use ASAN to poison temp vector stack memory (#41695) +* [GH-41480](https://github.com/apache/arrow/issues/41480) - [Python] Update Python development guide about components being enabled by default based on Arrow C++ (#41705) +* [GH-41480](https://github.com/apache/arrow/issues/41480) - [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ (#41494) +* [GH-41493](https://github.com/apache/arrow/issues/41493) - [C++][S3] Add a new option to check existence before CreateDir (#41822) +* [GH-41507](https://github.com/apache/arrow/issues/41507) - [MATLAB][CI] Pass `strict: true` to `matlab-actions/run-tests@v2` (#41530) +* [GH-41527](https://github.com/apache/arrow/issues/41527) - [CI][Dev] Remove unncessary requirements for six (#43087) +* [GH-41531](https://github.com/apache/arrow/issues/41531) - [MATLAB][Packaging] Bump `matlab-actions/setup-matlab` and `matlab-actions/run-command` from `v1` to `v2` in the `crossbow` job (#41532) +* [GH-41540](https://github.com/apache/arrow/issues/41540) - [R] Simplify arrow_eval() logic and bindings environments (#41537) +* [GH-41545](https://github.com/apache/arrow/issues/41545) - [C++][Parquet] Fix DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546) +* [GH-41547](https://github.com/apache/arrow/issues/41547) - [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548) +* [GH-41558](https://github.com/apache/arrow/issues/41558) - [C++] Improve fixed_width_test_util.h (#41575) +* [GH-41560](https://github.com/apache/arrow/issues/41560) - [C++] ChunkResolver: Implement ResolveMany and add unit tests (#41561) +* [GH-41590](https://github.com/apache/arrow/issues/41590) - [Java] Improve BaseRepeatedValueVector function on isEmpty and isNull operations (#41601) +* [GH-41596](https://github.com/apache/arrow/issues/41596) - [C++] fixed_width_internal.h: Simplify docstring and support bit-sized types (BOOL) (#41597) +* [GH-41608](https://github.com/apache/arrow/issues/41608) - [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633) +* [GH-41611](https://github.com/apache/arrow/issues/41611) - [Docs][CI] Enable most sphinx-lint rules for documentation (#41612) +* [GH-41620](https://github.com/apache/arrow/issues/41620) - [Docs] Document merge.conf usage (#41621) +* [GH-41626](https://github.com/apache/arrow/issues/41626) - [R][CI] Update OpenSUSE to 15.5 from 15.3 (#41627) +* [GH-41652](https://github.com/apache/arrow/issues/41652) - [C++][CMake][Windows] Don't build needless object libraries (#41658) +* [GH-41653](https://github.com/apache/arrow/issues/41653) - [MATLAB] Add new `arrow.c.Array` MATLAB class which wraps a C Data Interface format `ArrowArray` C struct (#41655) +* [GH-41654](https://github.com/apache/arrow/issues/41654) - [MATLAB] Add new `arrow.c.Schema` MATLAB class which wraps a C Data Interface format `ArrowSchema` C struct (#41674) +* [GH-41656](https://github.com/apache/arrow/issues/41656) - [MATLAB] Add C Data Interface format import/export functionality for `arrow.array.Array` (#41737) +* [GH-41662](https://github.com/apache/arrow/issues/41662) - [Python] Ensure Buffer methods don't crash with non-CPU data (#41889) +* [GH-41664](https://github.com/apache/arrow/issues/41664) - [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010) +* [GH-41675](https://github.com/apache/arrow/issues/41675) - [Packaging][MATLAB] Add crossbow job to package MATLAB interface on macos-14 (#41677) +* [GH-41681](https://github.com/apache/arrow/issues/41681) - [GLib] Generate separate version macros for each GLib library (#41721) +* [GH-41691](https://github.com/apache/arrow/issues/41691) - [Doc] Remove notion of "logical type" (#41958) +* [GH-41702](https://github.com/apache/arrow/issues/41702) - [C++][Parquet] Thrift: generate template method to accelerate reading thrift (#41703) +* [GH-41726](https://github.com/apache/arrow/issues/41726) - [C++][Parquet] Minor: moving EncodedStats by default rather than copying (#41727) +* [GH-41730](https://github.com/apache/arrow/issues/41730) - [Java] Adding variadicBufferCounts to RecordBatch (#41732) +* [GH-41748](https://github.com/apache/arrow/issues/41748) - [Python][Parquet] Update BYTE_STREAM_SPLIT description in write_table() docstring (#41759) +* [GH-41749](https://github.com/apache/arrow/issues/41749) - [GLib] Allow getting a RecordBatchReader from a Dataset or Scanner (#41750) +* [GH-41755](https://github.com/apache/arrow/issues/41755) - [C++][ORC] Ensure setting detected ORC version (#41767) +* [GH-41760](https://github.com/apache/arrow/issues/41760) - [C++][Parquet] Add file metadata read/write benchmark (#41761) +* [GH-41770](https://github.com/apache/arrow/issues/41770) - [CI][GLib] Remove temporary files explicitly (#41807) +* [GH-41783](https://github.com/apache/arrow/issues/41783) - [C++] Make git-dependent definitions internal (#41781) +* [GH-41789](https://github.com/apache/arrow/issues/41789) - [Java] Clean up immutables and checkerframework dependencies (#41790) +* [GH-41797](https://github.com/apache/arrow/issues/41797) - [C++][S3] Remove GetBucketRegion hack for newer AWS SDK versions (#41798) +* [GH-41799](https://github.com/apache/arrow/issues/41799) - [Java] Migrate to com.gradle:develocity-maven-extension (#41800) +* [GH-41803](https://github.com/apache/arrow/issues/41803) - [MATLAB] Add C Data Interface format import/export functionality for `arrow.tabular.RecordBatch` (#41817) +* [GH-41804](https://github.com/apache/arrow/issues/41804) - [Swift] Add Struct (Nested) type (#43082) +* [GH-41806](https://github.com/apache/arrow/issues/41806) - [GLib][CI] Use vcpkg for C++ dependencies when building GLib libraries with MSVC (#41839) +* [GH-41818](https://github.com/apache/arrow/issues/41818) - [C++][Parquet] normalize dictionary encoding to use RLE_DICTIONARY (#41819) +* [GH-41834](https://github.com/apache/arrow/issues/41834) - [R] Better error handling in dplyr code (#41576) +* [GH-41841](https://github.com/apache/arrow/issues/41841) - [R][CI] Remove more defunct rhub containers (#41828) +* [GH-41887](https://github.com/apache/arrow/issues/41887) - [Go] Run linter via pre-commit (#41888) +* [GH-41899](https://github.com/apache/arrow/issues/41899) - [C++] IPC: Minor enhance the code of writer (#41900) +* [GH-41905](https://github.com/apache/arrow/issues/41905) - [JS] Update dependencies (#41906) +* [GH-41910](https://github.com/apache/arrow/issues/41910) - [Python] Add support for Pyodide (#37822) +* [GH-41923](https://github.com/apache/arrow/issues/41923) - [C++] Fix ExecuteScalar deduce all_scalar with chunked_array (#41925) +* [GH-41929](https://github.com/apache/arrow/issues/41929) - [Java] pom.xml license formatting (#42049) +* [GH-41945](https://github.com/apache/arrow/issues/41945) - [Swift] Add interface ArrowArrayHolderBuilder (#41946) +* [GH-41947](https://github.com/apache/arrow/issues/41947) - [Java] Support catalog in JDBC driver with session options (#42035) +* [GH-41952](https://github.com/apache/arrow/issues/41952) - [R] Turn S3 and ZSTD on by default for macOS (#42210) +* [GH-41953](https://github.com/apache/arrow/issues/41953) - [C++] Minor enhance code style for FixedShapeTensorType (#41954) +* [GH-41955](https://github.com/apache/arrow/issues/41955) - [C++] Follow up of adding null_bitmap to MapArray::FromArrays (#41956) +* [GH-41960](https://github.com/apache/arrow/issues/41960) - Expose new S3 option check_directory_existence_before_creation (#41972) +* [GH-41968](https://github.com/apache/arrow/issues/41968) - [Java] Implement TransferPair functionality for BinaryView (#41980) +* [GH-41970](https://github.com/apache/arrow/issues/41970) - [C++] Misc changes making code around list-like types and list-view types behave the same way (#41971) +* [GH-41978](https://github.com/apache/arrow/issues/41978) - [Python] Fix pandas tests to follow downstream datetime64 unit changes (#41979) +* [GH-41983](https://github.com/apache/arrow/issues/41983) - [Dev] Run issue labeling bot only when opening an issue (not editing) (#41986) +* [GH-41994](https://github.com/apache/arrow/issues/41994) - [C++] : kernel.cc: Remove defaults on switch so that compiler can check full enum coverage for us (#41995) +* [GH-41999](https://github.com/apache/arrow/issues/41999) - [Swift] Add methods for adding array and vargs to arrow array (#42000) +* [GH-42002](https://github.com/apache/arrow/issues/42002) - [Java] Update Unit Tests for Vector Module (#42019) +* [GH-42013](https://github.com/apache/arrow/issues/42013) - [Python] Allow Array.filter() to take general array input (#42051) +* [GH-42016](https://github.com/apache/arrow/issues/42016) - [Python] Expose new FLOAT16 logical type in the pyarrow.parquet bindings (#42103) +* [GH-42020](https://github.com/apache/arrow/issues/42020) - [Swift] Add Arrow decoding implementation for Swift Codable (#42023) +* [GH-42021](https://github.com/apache/arrow/issues/42021) - [Swift] Add Arrow encoder implementation for Swift Codable (#43063) +* [GH-42025](https://github.com/apache/arrow/issues/42025) - [Java] Update Unit Tests for Algorithm Module (#42029) +* [GH-42030](https://github.com/apache/arrow/issues/42030) - [Java] Update Unit Tests for Adapter Module (#42038) +* [GH-42042](https://github.com/apache/arrow/issues/42042) - [Java] Update Unit Tests for Compressions Module (#42044) +* [GH-42045](https://github.com/apache/arrow/issues/42045) - [Java] Update Unit Tests for Flight Module (#42158) +* [GH-42087](https://github.com/apache/arrow/issues/42087) - [Swift] refactored to remove build warnings (#42088) +* [GH-42092](https://github.com/apache/arrow/issues/42092) - [Java] Update Unit Tests for Tools Module (#42093) +* [GH-42100](https://github.com/apache/arrow/issues/42100) - [C++][Parquet] ParquetFilePrinter::JSONPrint print length of FLBA (#41981) +* [GH-42101](https://github.com/apache/arrow/issues/42101) - [Java] Create File for Output Validation in FileRoundtrip (#42115) +* [GH-42109](https://github.com/apache/arrow/issues/42109) - [C++][CMake] Add preset for Valgrind (#42110) +* [GH-42112](https://github.com/apache/arrow/issues/42112) - [Python] Array gracefully fails on non-cpu device (#42113) +* [GH-42121](https://github.com/apache/arrow/issues/42121) - [Java] Cleanup spotless plugin configuration (#43019) +* [GH-42124](https://github.com/apache/arrow/issues/42124) - [Swift] Add methods for loading and validating builder by type (#42195) +* [GH-42126](https://github.com/apache/arrow/issues/42126) - [C++] Move TakeXXX free functions into TakeMetaFunction and make them private (#42127) +* [GH-42128](https://github.com/apache/arrow/issues/42128) - [Packaging][CentOS] Migrate CentOS 7 and CentOS Stream 8 packaging jobs to use vault.centos.org (#42129) +* [GH-42134](https://github.com/apache/arrow/issues/42134) - [C++][FS][Azure] Validate AzureOptions::{blob,dfs}_storage_scheme (#42135) +* [GH-42143](https://github.com/apache/arrow/issues/42143) - [R] Sanitize R metadata (#41969) +* [GH-42146](https://github.com/apache/arrow/issues/42146) - [MATLAB] Add IPC `RecordBatchFileReader` and `RecordBatchFileWriter` MATLAB classes (#42201) +* [GH-42162](https://github.com/apache/arrow/issues/42162) - [Java] Update Unit Tests for Dataset Module (#42163) +* [GH-42164](https://github.com/apache/arrow/issues/42164) - [Java] Update Unit Tests for Gandiva Module (#42166) +* [GH-42165](https://github.com/apache/arrow/issues/42165) - [Java] Update Unit Tests for Memory Module (#42161) +* [GH-42167](https://github.com/apache/arrow/issues/42167) - [CI] Upgrade the version of vcpkg in .env (#42171) +* [GH-42168](https://github.com/apache/arrow/issues/42168) - [Python][Parquet] Pyarrow store decimal as integer (#42169) +* [GH-42190](https://github.com/apache/arrow/issues/42190) - [Python] Add CI job for Numpy 1.X (#42189) +* [GH-42193](https://github.com/apache/arrow/issues/42193) - [Java] Update dependency to maintain JUnit 5 only (#42206) +* [GH-42228](https://github.com/apache/arrow/issues/42228) - [CI][Java] Suppress transfer progress log in java-jars (#42230) +* [GH-42235](https://github.com/apache/arrow/issues/42235) - [C++] list_parent_indices: Add support for list-view types (#42236) +* [GH-42243](https://github.com/apache/arrow/issues/42243) - [Swift] Update isValidBuilderType to not required instance of type (#42244) +* [GH-42245](https://github.com/apache/arrow/issues/42245) - [Swift] Ensure map behavior is the same for all key types (#42246) +* [GH-43020](https://github.com/apache/arrow/issues/43020) - [Java] Simplify flight.properties generation (#43028) +* [GH-43033](https://github.com/apache/arrow/issues/43033) - [CI][Docker] Enable linter for python-wheel-windows-test-vs2019 (#43034) +* [GH-43040](https://github.com/apache/arrow/issues/43040) - [C++] Reduce the recursion of many-join test (#43042) +* [GH-43045](https://github.com/apache/arrow/issues/43045) - [CI][Python] Pin openjdk=17 in python substrait integration (#43051) +* [GH-43060](https://github.com/apache/arrow/issues/43060) - [C++] Limit buffer size in BufferedInputStream::SetBufferSize with raw_read_bound (#43064) +* [GH-43076](https://github.com/apache/arrow/issues/43076) - [C#] Upgrade Xunit and change how Python integration tests are skipped (#43091) + + + # Apache Arrow 6.0.1 (2021-11-18) ## Bug Fixes