Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
69a9a1c
ARROW-11303: [Release][C++] Enable mimalloc in the windows verificati…
kszucs Jan 18, 2021
903b41c
ARROW-11309: [Release][C#] Use .NET 3.1 for verification
kou Jan 19, 2021
19e9559
ARROW-11315: [Packaging][APT][arm64] Add missing gir1.2 files
kou Jan 19, 2021
17a3fab
ARROW-11314: [Release][APT][Yum] Add support for verifying arm64 pack…
kou Jan 19, 2021
275fda1
ARROW-7633: [C++][CI] Create fuzz targets for tensors and sparse tensors
mrkn Jan 19, 2021
2d3e8f9
ARROW-11246: [Rust] Add type to Unexpected accumulator state error
ovr Jan 19, 2021
e20f439
ARROW-11254: [Rust][DataFusion] Add SIMD and snmalloc flags as option…
Dandandan Jan 19, 2021
18dc62c
ARROW-11074: [Rust][DataFusion] Implement predicate push-down for par…
yordan-pavlov Jan 19, 2021
127961a
ARROW-10489: [C++] Add Intel C++ compiler options for different warni…
jcmuel Jan 19, 2021
0e5d646
ARROW-9128: [C++] Implement string space trimming kernels: trim, ltri…
maartenbreddels Jan 19, 2021
f63cffa
ARROW-11305 Skip first argument (which is the program name) in parque…
jhorstmann Jan 19, 2021
7e0cb0a
ARROW-11108: [Rust] Fixed performance issue in mutableBuffer.
jorgecarleitao Jan 19, 2021
b448de7
ARROW-11216: [Rust] add doc example for StringDictionaryBuilder
alamb Jan 19, 2021
4a6eb19
ARROW-11268: [Rust][DataFusion] MemTable::load output partition support
Dandandan Jan 19, 2021
a4266a1
ARROW-11321: [Rust][DataFusion] Fix DataFusion compilation error
Dandandan Jan 19, 2021
bbc9029
ARROW-11156: [Rust][DataFusion] Create hashes vectorized in hash join
Dandandan Jan 19, 2021
8e218e0
ARROW-11313: [Rust] Fixed size_hint
jorgecarleitao Jan 19, 2021
35053fe
ARROW-11222: [Rust] Catch up with flatbuffers 0.8.1 which had some UB…
mqy Jan 19, 2021
50ba534
ARROW-11277: [C++] Workaround macOS 10.11: don't default construct co…
bkietz Jan 19, 2021
a7633c7
ARROW-11322: [Rust] Re-opening `memory` module as public
maxburke Jan 20, 2021
555643a
ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader
nevi-me Jan 20, 2021
e7c69e6
ARROW-11279: [Rust][Parquet] ArrowWriter Definition Levels Memory Usage
Jan 20, 2021
71572bd
ARROW-11318: [Rust] Support pretty printing timestamp, date, and time…
alamb Jan 20, 2021
ed709e0
ARROW-11311: [Rust] Fixed unset_bit
jorgecarleitao Jan 20, 2021
01c5aec
ARROW-11265: [Rust] Made bool not ArrowNativeType
jorgecarleitao Jan 20, 2021
6912869
ARROW-11290: [Rust][DataFusion] Address hash aggregate performance is…
Dandandan Jan 20, 2021
23550c2
ARROW-11149: [Rust] DF Support List/LargeList/FixedSizeList in create…
ovr Jan 20, 2021
a0e1244
ARROW-11329: [Rust] Don't rerun build.rs on every file change
mbrubeck Jan 20, 2021
8b56f85
ARROW-11220: [Rust] Implement GROUP BY support for Boolean
ovr Jan 21, 2021
4601c02
ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expre…
alamb Jan 21, 2021
84126d5
ARROW-11323: [Rust][DataFusion] Allow sort queries to return no results
alamb Jan 21, 2021
bd90043
ARROW-10831: [C++][Compute] Implement quantile kernel
cyb70289 Jan 21, 2021
72bf95a
ARROW-11334: [Python][CI] Fix failing pandas nightly tests
jorisvandenbossche Jan 21, 2021
bc5d8bf
ARROW-11320: [C++] Try to strengthen temporary dir creation
pitrou Jan 21, 2021
c413566
ARROW-11141: [Rust] Add basic Miri checks to CI pipeline
vertexclique Jan 21, 2021
6959e46
ARROW-11337: [C++] Compilation error with ThreadSanitizer
westonpace Jan 22, 2021
499b6d0
ARROW-11333: [Rust] Generalized creation of empty arrays.
jorgecarleitao Jan 22, 2021
629a6fd
ARROW-10299: [Rust] Use IPC Metadata V5 as default
nevi-me Jan 22, 2021
457fa91
ARROW-11343: [Rust][DataFusion] Simplified example with UDF.
jorgecarleitao Jan 22, 2021
251ecac
ARROW-10766: [Rust] [Parquet] Compute nested list definitions
nevi-me Jan 22, 2021
262bbdc
ARROW-11332: [Rust] Use MutableBuffer in take_string instead of Vec
Dandandan Jan 22, 2021
b44a4ad
ARROW-11299: [Python] Fix invalid-offsetof warnings
cyb70289 Jan 22, 2021
13e2134
ARROW-11291: [Rust] Add extend to MutableBuffer (-20% for arithmetic,…
jorgecarleitao Jan 23, 2021
67d0c2e
ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record b…
alamb Jan 23, 2021
c970549
Move Gandiva files and parameterize setup args
wjones127 Oct 8, 2020
1be4559
Setup CMake to be able to build python_gandiva
wjones127 Oct 11, 2020
6981b2a
Remove with-gandiva option
wjones127 Oct 11, 2020
9d8bc15
Add license header
wjones127 Oct 11, 2020
901a76c
import gandiva
wjones127 Oct 11, 2020
64f23ad
Add development instructions for pyarrow_gandiva
wjones127 Oct 11, 2020
676e107
Get tests to pass
wjones127 Oct 12, 2020
7906f34
Remove mark for gandiva, since it's now a package
wjones127 Oct 12, 2020
b3a424d
Remove mentions of gandiva in conftest
wjones127 Oct 12, 2020
605d2e0
Remove import, since it will error
wjones127 Nov 1, 2020
d77ac23
Add description and installation instructions in readme
wjones127 Nov 1, 2020
06f78f0
Fix rebase mistakes
Jan 24, 2021
4963060
Don't bundle libarrow and libarrow_python with pyarrow_gandiva
Jan 24, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,41 @@ jobs:
cd rust
cargo clippy --all-targets --workspace -- -D warnings -A clippy::redundant_field_names

miri-checks:
name: Miri Checks
runs-on: ubuntu-latest
strategy:
matrix:
arch: [amd64]
rust: [nightly-2021-01-19]
steps:
- uses: actions/checkout@v2
with:
submodules: true
- uses: actions/cache@v2
with:
path: |
~/.cargo/registry
~/.cargo/git
target
key: ${{ runner.os }}-cargo-miri-${{ hashFiles('**/Cargo.lock') }}
- name: Setup Rust toolchain
run: |
rustup toolchain install ${{ matrix.rust }}
rustup default ${{ matrix.rust }}
rustup component add rustfmt clippy miri
- name: Run Miri Checks
env:
RUST_BACKTRACE: full
RUST_LOG: 'trace'
run: |
export MIRIFLAGS="-Zmiri-disable-isolation"
cd rust
cargo miri setup
cargo clean
# Ignore MIRI errors until we can get a clean run
cargo miri test || true

coverage:
name: Coverage
runs-on: ubuntu-latest
Expand Down
7 changes: 6 additions & 1 deletion ci/scripts/python_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ export PYARROW_WITH_CUDA=${ARROW_CUDA:-OFF}
export PYARROW_WITH_HDFS=${ARROW_HDFS:-OFF}
export PYARROW_WITH_FLIGHT=${ARROW_FLIGHT:-OFF}
export PYARROW_WITH_PLASMA=${ARROW_PLASMA:-OFF}
export PYARROW_WITH_GANDIVA=${ARROW_GANDIVA:-OFF}
export PYARROW_WITH_PARQUET=${ARROW_PARQUET:-OFF}
export PYARROW_WITH_DATASET=${ARROW_DATASET:-OFF}

Expand All @@ -51,4 +50,10 @@ ${PYTHON:-python} \
install --single-version-externally-managed \
--record $relative_build_dir/record.txt

${PYTHON:-python} \
setup.py build --target=pyarrow_gandiva \
--build-base $build_dir \
install --single-version-externally-managed \
--record $relative_build_dir/record.txt

popd
2 changes: 2 additions & 0 deletions ci/scripts/python_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,5 @@ export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH}
export PYTHONDEVMODE=1

pytest -r s ${PYTEST_ARGS} --pyargs pyarrow

${PYTHON:-python} -m pytest -r s ${PYTEST_ARGS} --pyargs pyarrow_gandiva
4 changes: 4 additions & 0 deletions cpp/build-support/fuzzing/generate_corpuses.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ rm -rf ${CORPUS_DIR}
${OUT}/arrow-ipc-generate-fuzz-corpus -file ${CORPUS_DIR}
${ARROW_CPP}/build-support/fuzzing/pack_corpus.py ${CORPUS_DIR} ${OUT}/arrow-ipc-file-fuzz_seed_corpus.zip

rm -rf ${CORPUS_DIR}
${OUT}/arrow-ipc-generate-tensor-fuzz-corpus -stream ${CORPUS_DIR}
${ARROW_CPP}/build-support/fuzzing/pack_corpus.py ${CORPUS_DIR} ${OUT}/arrow-ipc-tensor-stream-fuzz_seed_corpus.zip

rm -rf ${CORPUS_DIR}
${OUT}/parquet-arrow-generate-fuzz-corpus ${CORPUS_DIR}
cp ${ARROW_CPP}/submodules/parquet-testing/data/*.parquet ${CORPUS_DIR}
Expand Down
25 changes: 23 additions & 2 deletions cpp/cmake_modules/SetupCxxFlags.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,16 @@ if("${BUILD_WARNING_LEVEL}" STREQUAL "CHECKIN")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wno-deprecated-declarations")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wno-sign-conversion")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wno-unused-variable")
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Intel")
if(WIN32)
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} /Wall")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} /Wno-deprecated")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} /Wno-unused-variable")
else()
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wall")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wno-deprecated")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wno-unused-variable")
endif()
else()
message(FATAL_ERROR "${UNKNOWN_COMPILER_MESSAGE}")
endif()
Expand All @@ -289,6 +299,12 @@ elseif("${BUILD_WARNING_LEVEL}" STREQUAL "EVERYTHING")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wpedantic")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wextra")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wno-unused-parameter")
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Intel")
if(WIN32)
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} /Wall")
else()
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wall")
endif()
else()
message(FATAL_ERROR "${UNKNOWN_COMPILER_MESSAGE}")
endif()
Expand All @@ -304,9 +320,14 @@ else()
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} /W3")
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "AppleClang"
OR CMAKE_CXX_COMPILER_ID STREQUAL "Clang"
OR CMAKE_CXX_COMPILER_ID STREQUAL "GNU"
OR CMAKE_CXX_COMPILER_ID STREQUAL "Intel")
OR CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wall")
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Intel")
if(WIN32)
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} /Wall")
else()
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wall")
endif()
else()
message(FATAL_ERROR "${UNKNOWN_COMPILER_MESSAGE}")
endif()
Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,7 @@ if(ARROW_COMPUTE)
compute/registry.cc
compute/kernels/aggregate_basic.cc
compute/kernels/aggregate_mode.cc
compute/kernels/aggregate_quantile.cc
compute/kernels/aggregate_var_std.cc
compute/kernels/codegen_internal.cc
compute/kernels/scalar_arithmetic.cc
Expand Down
5 changes: 5 additions & 0 deletions cpp/src/arrow/compute/api_aggregate.cc
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,10 @@ Result<Datum> Variance(const Datum& value, const VarianceOptions& options,
return CallFunction("variance", {value}, &options, ctx);
}

Result<Datum> Quantile(const Datum& value, const QuantileOptions& options,
ExecContext* ctx) {
return CallFunction("quantile", {value}, &options, ctx);
}

} // namespace compute
} // namespace arrow
41 changes: 41 additions & 0 deletions cpp/src/arrow/compute/api_aggregate.h
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,33 @@ struct ARROW_EXPORT VarianceOptions : public FunctionOptions {
int ddof = 0;
};

/// \brief Control Quantile kernel behavior
///
/// By default, returns the median value.
struct ARROW_EXPORT QuantileOptions : public FunctionOptions {
/// Interpolation method to use when quantile lies between two data points
enum Interpolation {
LINEAR = 0,
LOWER,
HIGHER,
NEAREST,
MIDPOINT,
};

explicit QuantileOptions(double q = 0.5, enum Interpolation interpolation = LINEAR)
: q{q}, interpolation{interpolation} {}

explicit QuantileOptions(std::vector<double> q,
enum Interpolation interpolation = LINEAR)
: q{std::move(q)}, interpolation{interpolation} {}

static QuantileOptions Defaults() { return QuantileOptions{}; }

/// quantile must be between 0 and 1 inclusive
std::vector<double> q;
enum Interpolation interpolation;
};

/// @}

/// \brief Count non-null (or null) values in an array.
Expand Down Expand Up @@ -229,5 +256,19 @@ Result<Datum> Variance(const Datum& value,
const VarianceOptions& options = VarianceOptions::Defaults(),
ExecContext* ctx = NULLPTR);

/// \brief Calculate the quantiles of a numeric array
///
/// \param[in] value input datum, expecting Array or ChunkedArray
/// \param[in] options see QuantileOptions for more information
/// \param[in] ctx the function execution context, optional
/// \return resulting datum as an array
///
/// \since 4.0.0
/// \note API not yet finalized
ARROW_EXPORT
Result<Datum> Quantile(const Datum& value,
const QuantileOptions& options = QuantileOptions::Defaults(),
ExecContext* ctx = NULLPTR);

} // namespace compute
} // namespace arrow
7 changes: 7 additions & 0 deletions cpp/src/arrow/compute/api_scalar.h
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,13 @@ struct ARROW_EXPORT StrptimeOptions : public FunctionOptions {
TimeUnit::type unit;
};

struct ARROW_EXPORT TrimOptions : public FunctionOptions {
explicit TrimOptions(std::string characters) : characters(std::move(characters)) {}

/// The individual characters that can be trimmed from the string.
std::string characters;
};

enum CompareOperator : int8_t {
EQUAL,
NOT_EQUAL,
Expand Down
23 changes: 9 additions & 14 deletions cpp/src/arrow/compute/exec.cc
Original file line number Diff line number Diff line change
Expand Up @@ -688,10 +688,9 @@ Status PackBatchNoChunks(const std::vector<Datum>& args, ExecBatch* out) {
switch (arg.kind()) {
case Datum::SCALAR:
case Datum::ARRAY:
case Datum::CHUNKED_ARRAY:
length = std::max(arg.length(), length);
break;
case Datum::CHUNKED_ARRAY:
return Status::Invalid("Kernel does not support chunked array arguments");
default:
DCHECK(false);
break;
Expand Down Expand Up @@ -722,19 +721,15 @@ class VectorExecutor : public KernelExecutorImpl<VectorKernel> {
const std::vector<Datum>& outputs) override {
// If execution yielded multiple chunks (because large arrays were split
// based on the ExecContext parameters, then the result is a ChunkedArray
if (kernel_->output_chunked) {
if (HaveChunkedArray(inputs) || outputs.size() > 1) {
return ToChunkedArray(outputs, output_descr_.type);
} else if (outputs.size() == 1) {
// Outputs have just one element
return outputs[0];
} else {
// XXX: In the case where no outputs are omitted, is returning a 0-length
// array always the correct move?
return MakeArrayOfNull(output_descr_.type, /*length=*/0).ValueOrDie();
}
} else {
if (kernel_->output_chunked && (HaveChunkedArray(inputs) || outputs.size() > 1)) {
return ToChunkedArray(outputs, output_descr_.type);
} else if (outputs.size() == 1) {
// Outputs have just one element
return outputs[0];
} else {
// XXX: In the case where no outputs are omitted, is returning a 0-length
// array always the correct move?
return MakeArrayOfNull(output_descr_.type, /*length=*/0).ValueOrDie();
}
}

Expand Down
Loading