Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion parquet-variant-compute/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ rust-version = { workspace = true }
[dependencies]
arrow = { workspace = true }
arrow-schema = { workspace = true }
half = { version = "2.1", default-features = false }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to reference f16 in the code and in the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- we should probably directly export he f16 type from the arrow crate (pub use) to avoid having users explicitly have to use half . Maybe as a follow on PR

parquet-variant = { workspace = true }
parquet-variant-json = { workspace = true }

Expand All @@ -49,4 +50,3 @@ arrow = { workspace = true, features = ["test_utils"] }
[[bench]]
name = "variant_kernels"
harness = false

50 changes: 46 additions & 4 deletions parquet-variant-compute/src/cast_to_variant.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,11 @@
use crate::{VariantArray, VariantArrayBuilder};
use arrow::array::{Array, AsArray};
use arrow::datatypes::{
Float32Type, Float64Type, Int16Type, Int32Type, Int64Type, Int8Type, UInt16Type, UInt32Type,
UInt64Type, UInt8Type,
Float16Type, Float32Type, Float64Type, Int16Type, Int32Type, Int64Type, Int8Type, UInt16Type,
UInt32Type, UInt64Type, UInt8Type,
};
use arrow_schema::{ArrowError, DataType};
use half::f16;
use parquet_variant::Variant;

/// Convert the input array of a specific primitive type to a `VariantArray`
Expand All @@ -39,6 +40,22 @@ macro_rules! primitive_conversion {
}};
}

/// Convert the input array to a `VariantArray` row by row,
/// transforming each element with `cast_fn`
macro_rules! cast_conversion {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro applies cast_fn to each element before converting it to a Variant. Some of the remaining types will require casting before being accepted by Variant::from.

We could also add the cast_fn argument to primitive_conversion! macro and not add this macro. It would require passing something like |v| v (ie. a no-op function) to the existing primitive conversions that don't require casts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on a couple of these issues in parallel and made some additional tweaks to the macro here: https://github.com/apache/arrow-rs/pull/8074/files.

($t:ty, $cast_fn:expr, $input:expr, $builder:expr) => {{
let array = $input.as_primitive::<$t>();
for i in 0..array.len() {
if array.is_null(i) {
$builder.append_null();
continue;
}
let cast_value = $cast_fn(array.value(i));
$builder.append_variant(Variant::from(cast_value));
}
}};
}

/// Casts a typed arrow [`Array`] to a [`VariantArray`]. This is useful when you
/// need to convert a specific data type
///
Expand Down Expand Up @@ -92,6 +109,9 @@ pub fn cast_to_variant(input: &dyn Array) -> Result<VariantArray, ArrowError> {
DataType::UInt64 => {
primitive_conversion!(UInt64Type, input, builder);
}
DataType::Float16 => {
cast_conversion!(Float16Type, |v: f16| -> f32 { v.into() }, input, builder);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casted f16 to f32 so that the value can be wrapped by Variant::Float.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

In general, getting a macro that knows how to convert various Arrow types to Variant I think is an important building block

}
DataType::Float32 => {
primitive_conversion!(Float32Type, input, builder);
}
Expand All @@ -115,8 +135,8 @@ pub fn cast_to_variant(input: &dyn Array) -> Result<VariantArray, ArrowError> {
mod tests {
use super::*;
use arrow::array::{
ArrayRef, Float32Array, Float64Array, Int16Array, Int32Array, Int64Array, Int8Array,
UInt16Array, UInt32Array, UInt64Array, UInt8Array,
ArrayRef, Float16Array, Float32Array, Float64Array, Int16Array, Int32Array, Int64Array,
Int8Array, UInt16Array, UInt32Array, UInt64Array, UInt8Array,
};
use parquet_variant::{Variant, VariantDecimal16};
use std::sync::Arc;
Expand Down Expand Up @@ -284,6 +304,28 @@ mod tests {
)
}

#[test]
fn test_cast_to_variant_float16() {
run_test(
Arc::new(Float16Array::from(vec![
Some(f16::MIN),
None,
Some(f16::from_f32(-1.5)),
Some(f16::from_f32(0.0)),
Some(f16::from_f32(1.5)),
Some(f16::MAX),
])),
vec![
Some(Variant::Float(f16::MIN.into())),
None,
Some(Variant::Float(-1.5)),
Some(Variant::Float(0.0)),
Some(Variant::Float(1.5)),
Some(Variant::Float(f16::MAX.into())),
],
)
}

#[test]
fn test_cast_to_variant_float32() {
run_test(
Expand Down
Loading