Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
c7e702b
GH-25025: [C++] Move non core compute kernels into separate shared li…
raulcd Feb 24, 2025
e50a405
Try to fix some CGLib Ruby builds
raulcd Apr 29, 2025
205326d
Try to fix R builds
raulcd Apr 29, 2025
ba2ef33
Add mising arrow_compute to PKG_CFLAGS and PKG_LIBS on configure.win …
raulcd Apr 29, 2025
697cc4d
Try fixing static link with pkg-config
raulcd Apr 30, 2025
01c63da
Try fixing pkg-config to add arrow_compute to acero and dataset
raulcd Apr 30, 2025
27cfa05
Try changing link order
raulcd Apr 30, 2025
3a4089f
Add target_compile_definitions arrow_compute_static
raulcd May 2, 2025
fcfc7be
Fix Arrow compute for PyArrow
raulcd May 5, 2025
55e604e
Move from arrow::compute::RegisterComputeKernels to arrow::compute::I…
raulcd May 6, 2025
235cb38
Some minor comment and restructure improvements
raulcd May 6, 2025
c134c44
Move chunked_internal to main compute sources as originally
raulcd May 6, 2025
75918a3
Fix c-bridge test to add back RunEndEncode/Decode kernels as part of …
raulcd May 6, 2025
a66969c
Fix symbol visibility and minor docstring/comment updates
raulcd May 6, 2025
b79b98f
Move ree_util_internal to compute not on libarrow
raulcd May 6, 2025
b4a0197
Add documentation for the Initialize() function call and move R initi…
raulcd May 6, 2025
febf61a
Add Arrow Compute to the bundled libs for the wheels
raulcd May 6, 2025
50b00bc
Fix docs format
raulcd May 6, 2025
b59eec5
Try using static variable for R Compute Initialize call
raulcd May 6, 2025
35bcf84
Revert back where do we initialize Compute on R
raulcd May 7, 2025
9ca85d9
First attempt on debian type packages
raulcd May 7, 2025
404faf9
New attempt to apt and yum linux-packages
raulcd May 7, 2025
d256b4a
Explicitly exclude arrow-compute.pc from libarrow-dev
raulcd May 7, 2025
e2e6563
Remove arrow-compute.pc from libarrow-dev.install
raulcd May 7, 2025
38a9522
Try removing probably unnecessary libs from arrow_compute_core_target
raulcd May 8, 2025
c470e0a
Re-add XSIMD and Opentelemetry
raulcd May 8, 2025
b82b8bd
Prioritize using arrow/compute/kernels/api.h instead of arrow/compute…
raulcd May 8, 2025
26b86e9
Tackle some review comments -> remove registry.h and add forward decl…
raulcd May 22, 2025
e44f6cb
Check initialization status for R and Python. Update compute_init_sta…
raulcd May 22, 2025
bb7cb81
Update docs, remove unnecessary TODO and other minor review changes
raulcd May 22, 2025
57d60cd
Move all utility functions from arrow/compute/kernels/codegen_interna…
raulcd May 22, 2025
7fe463a
Export required symbols to libarrow_compute
raulcd May 22, 2025
eb7e722
Revert moving to compute/codegen_internal.cc and only export the requ…
raulcd May 22, 2025
b0dfe7b
Update to void Initialize() from Status Initialize() as we always ret…
raulcd May 23, 2025
4a7a559
Revert "Update to void Initialize() from Status Initialize() as we al…
raulcd May 23, 2025
e0247bf
Add function to validate arrow compute initialization for CGLib
raulcd May 23, 2025
3525cc5
Add garrow_compute_initialize()
kou May 30, 2025
fc99628
Reduce scope
kou May 30, 2025
a91696b
Apply suggestions from code review
raulcd Jun 3, 2025
e665992
Remove RegisterComputeKernels from being exposed on the api.h
raulcd Jun 3, 2025
aec389f
Fix arrow-c-bridge and dlpack tests to correctly link to test librari…
raulcd Jun 3, 2025
aadbc33
Update CInitializeCompute to InitializeCompute and move registration …
raulcd Jun 3, 2025
4fcb991
Use StopIfNotOk helper function instead of manually validating status…
raulcd Jun 3, 2025
1d90aeb
Move arrow/compute/kernels/registry.cc to arrow/compute/initialize.cc…
raulcd Jun 5, 2025
6928271
Try moving R Compute Initialize to static initializer instead of in G…
raulcd Jun 5, 2025
8c50aed
Create function to register compute kernels automatically on Init
raulcd Jun 5, 2025
0fcbdb8
Fix includes and update internal to anonymous namespace
raulcd Jun 6, 2025
eacf4a5
Simplify compute test env (#92)
zanmato1984 Jun 6, 2025
9001ec2
Some updates to R package compute registration
amoeba Jun 6, 2025
4081927
Remove redundant if directive
amoeba Jun 9, 2025
b9e6879
Update cpp/src/arrow/c/bridge_test.cc
raulcd Jun 10, 2025
fd48b7a
Add comment to Initialize to specify that it will only be available i…
raulcd Jun 10, 2025
7526369
Remove now unnecessary check ARROW_VERSION_MAJOR >= 21 for R
raulcd Jun 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions c_glib/arrow-glib/compute.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@

#include <arrow/acero/exec_plan.h>
#include <arrow/acero/options.h>
#include <arrow/compute/api.h>

template <typename ArrowType, typename GArrowArrayType>
typename ArrowType::c_type
Expand Down Expand Up @@ -160,6 +161,9 @@ G_BEGIN_DECLS
* @title: Computation on data
* @include: arrow-glib/arrow-glib.h
*
* You must call garrow_compute_initialize() explicitly before you use
* computation related features.
*
* #GArrowExecuteContext is a class to customize how to execute a
* function.
*
Expand Down Expand Up @@ -250,6 +254,25 @@ G_BEGIN_DECLS
* There are many functions to compute data on an array.
*/

/**
* garrow_compute_initialize:
* @error: (nullable): Return location for a #GError or %NULL.
*
* You must call this explicitly before you use computation related
* features.
*
* Returns: %TRUE if initializing the compute module completed successfully,
* %FALSE otherwise.
*
* Since: 21.0.0
*/
gboolean
garrow_compute_initialize(GError **error)
{
auto status = arrow::compute::Initialize();
return garrow::check(error, status, "[compute][initialize]");
}

typedef struct GArrowExecuteContextPrivate_
{
arrow::compute::ExecContext context;
Expand Down
4 changes: 4 additions & 0 deletions c_glib/arrow-glib/compute.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@

G_BEGIN_DECLS

GARROW_AVAILABLE_IN_21_0
gboolean
garrow_compute_initialize(GError **error);

#define GARROW_TYPE_EXECUTE_CONTEXT (garrow_execute_context_get_type())
GARROW_AVAILABLE_IN_1_0
G_DECLARE_DERIVABLE_TYPE(
Expand Down
2 changes: 1 addition & 1 deletion c_glib/arrow-glib/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ gio = cxx.find_library('gio-2.0', dirs: [gobject_libdir], required: false)
if not gio.found()
gio = dependency('gio-2.0')
endif
dependencies = [arrow, arrow_acero, gobject, gio]
dependencies = [arrow_acero, arrow_compute, arrow, gobject, gio]
libarrow_glib = library(
'arrow-glib',
sources: sources + enums,
Expand Down
13 changes: 12 additions & 1 deletion c_glib/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,13 @@ if arrow_cpp_build_lib_dir == ''
modules: ['ArrowCUDA::arrow_cuda_shared'],
required: false,
)
# we do not support compiling GLib without Acero engine
# we do not support compiling GLib without Compute and Acero engine
arrow_compute = dependency(
'arrow-compute',
'ArrowCompute',
kwargs: common_args,
modules: ['ArrowCompute::arrow_compute_shared'],
)
arrow_acero = dependency(
'arrow-acero',
'ArrowAcero',
Expand Down Expand Up @@ -215,6 +221,11 @@ main(void)
dirs: [arrow_cpp_build_lib_dir],
required: false,
)
arrow_compute = cpp_compiler.find_library(
'arrow_compute',
dirs: [arrow_cpp_build_lib_dir],
required: true,
)
arrow_acero = cpp_compiler.find_library(
'arrow_acero',
dirs: [arrow_cpp_build_lib_dir],
Expand Down
1 change: 1 addition & 0 deletions c_glib/test/run-test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@

Gio = GI.load("Gio")
Arrow = GI.load("Arrow")
Arrow.compute_initialize
module Arrow
class Buffer
alias_method :initialize_raw, :initialize
Expand Down
7 changes: 7 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,13 @@ if(ARROW_BUILD_STATIC)
string(APPEND ARROW_ACERO_PC_CFLAGS_PRIVATE " -DARROW_ACERO_STATIC")
endif()

# For arrow-compute.pc.
set(ARROW_COMPUTE_PC_CFLAGS "")
set(ARROW_COMPUTE_PC_CFLAGS_PRIVATE "")
if(ARROW_BUILD_STATIC)
string(APPEND ARROW_COMPUTE_PC_CFLAGS_PRIVATE " -DARROW_COMPUTE_STATIC")
endif()

# For arrow-cuda.pc.
set(ARROW_CUDA_PC_CFLAGS "")
set(ARROW_CUDA_PC_CFLAGS_PRIVATE "")
Expand Down
8 changes: 7 additions & 1 deletion cpp/examples/arrow/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,13 @@ if(ARROW_SUBSTRAIT)
endif()

if(ARROW_COMPUTE AND ARROW_CSV)
add_arrow_example(compute_and_write_csv_example)
if(ARROW_BUILD_SHARED)
set(COMPUTE_KERNELS_LINK_LIBS arrow_compute_shared)
else()
set(COMPUTE_KERNELS_LINK_LIBS arrow_compute_static)
endif()
add_arrow_example(compute_and_write_csv_example EXTRA_LINK_LIBS
${COMPUTE_KERNELS_LINK_LIBS})
endif()

if(ARROW_FLIGHT)
Expand Down
1 change: 1 addition & 0 deletions cpp/examples/arrow/compute_and_write_csv_example.cc
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
// in the current directory

arrow::Status RunMain(int argc, char** argv) {
ARROW_RETURN_NOT_OK(arrow::compute::Initialize());
// Make Arrays
arrow::NumericBuilder<arrow::Int64Type> int64_builder;
arrow::BooleanBuilder boolean_builder;
Expand Down
1 change: 1 addition & 0 deletions cpp/examples/arrow/join_example.cc
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ arrow::Result<std::shared_ptr<arrow::dataset::Dataset>> CreateDataSetFromCSVData
}

arrow::Status DoHashJoin() {
ARROW_RETURN_NOT_OK(arrow::compute::Initialize());
arrow::dataset::internal::Initialize();

ARROW_ASSIGN_OR_RAISE(auto l_dataset, CreateDataSetFromCSVData(true));
Expand Down
2 changes: 1 addition & 1 deletion cpp/examples/tutorial_examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ target_link_libraries(file_access_example PRIVATE Arrow::arrow_shared
Parquet::parquet_shared)

add_executable(compute_example compute_example.cc)
target_link_libraries(compute_example PRIVATE Arrow::arrow_shared)
target_link_libraries(compute_example PRIVATE ArrowCompute::arrow_compute_shared)

add_executable(dataset_example dataset_example.cc)
target_link_libraries(dataset_example PRIVATE ArrowDataset::arrow_dataset_shared)
3 changes: 3 additions & 0 deletions cpp/examples/tutorial_examples/compute_example.cc
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ arrow::Status RunMain() {

schema = arrow::schema({field_a, field_b});

// Initialize the compute module to register the required compute kernels.
ARROW_RETURN_NOT_OK(arrow::compute::Initialize());

std::shared_ptr<arrow::Table> table;
table = arrow::Table::Make(schema, {some_nums, more_nums}, 5);
// (Doc section: Create Tables)
Expand Down
3 changes: 3 additions & 0 deletions cpp/examples/tutorial_examples/dataset_example.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

// (Doc section: Includes)
#include <arrow/api.h>
#include <arrow/compute/api.h>
#include <arrow/dataset/api.h>
// We use Parquet headers for setting up examples; they are not required for using
// datasets.
Expand Down Expand Up @@ -75,6 +76,8 @@ arrow::Result<std::string> CreateExampleParquetDataset(
}

arrow::Status PrepareEnv() {
// Initilize the compute module to register the required kernels for Dataset
ARROW_RETURN_NOT_OK(arrow::compute::Initialize());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be done implicitly by the Dataset layer? Or is it not possible? cc @bkietz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow::dataset::internal::Initialize could certainly call arrow::compute::Initialize. However that function only registers dataset ExecNode factories, and isn't used here. Implicit initialization is something I'd recommend avoiding until we can provide better guarantees

// Get our environment prepared for reading, by setting up some quick writing.
ARROW_ASSIGN_OR_RAISE(auto src_table, CreateTable())
std::shared_ptr<arrow::fs::FileSystem> setup_fs;
Expand Down
38 changes: 38 additions & 0 deletions cpp/src/arrow/ArrowComputeConfig.cmake.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# This config sets the following variables in your project::
#
# ArrowCompute_FOUND - true if Arrow Compute found on the system
#
# This config sets the following targets in your project::
#
# ArrowCompute::arrow_compute_shared - for linked as shared library if shared library is built
# ArrowCompute::arrow_compute_static - for linked as static library if static library is built

@PACKAGE_INIT@

include(CMakeFindDependencyMacro)
find_dependency(Arrow CONFIG)

include("${CMAKE_CURRENT_LIST_DIR}/ArrowComputeTargets.cmake")

arrow_keep_backward_compatibility(ArrowCompute arrow_compute)

check_required_components(ArrowCompute)

arrow_show_details(ArrowCompute ARROW_COMPUTE)
Loading
Loading