Skip to content
Merged
Changes from 13 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
3e2a299
[draft] add shot circuit in BinaryExpr
acking-you Jul 3, 2024
57d6645
refactor: add check_short_circuit function
acking-you Jul 3, 2024
b85807d
refactor: change if condition to match
acking-you Jul 4, 2024
bf4e218
feat: Add support for --mem-pool-type and --memory-limit options to m…
Kontinuation Feb 14, 2025
c41465e
Chore/Add additional FFI unit tests (#14802)
timsaucer Feb 21, 2025
f171227
Improve feature flag CI coverage `datafusion` and `datafusion-functio…
alamb Mar 17, 2025
ac3c918
add extend sql & docs
acking-you Mar 27, 2025
a642394
feat: Add support for --mem-pool-type and --memory-limit options to m…
Kontinuation Feb 14, 2025
39855d3
Chore/Add additional FFI unit tests (#14802)
timsaucer Feb 21, 2025
2205edb
Improve feature flag CI coverage `datafusion` and `datafusion-functio…
alamb Mar 17, 2025
d79a75a
fix: incorrect false judgment
acking-you Mar 28, 2025
575c3f3
add test
acking-you Mar 29, 2025
8801063
separate q6 to new PR
acking-you Mar 31, 2025
ad1210e
Merge branch 'apache:main' into add_short_circuit
acking-you Apr 7, 2025
e190119
feat: Add support for --mem-pool-type and --memory-limit options to m…
Kontinuation Feb 14, 2025
f2c4caa
Chore/Add additional FFI unit tests (#14802)
timsaucer Feb 21, 2025
0ef29b1
Improve feature flag CI coverage `datafusion` and `datafusion-functio…
alamb Mar 17, 2025
6ea2502
feat: Add support for --mem-pool-type and --memory-limit options to m…
Kontinuation Feb 14, 2025
f8f4d6f
Chore/Add additional FFI unit tests (#14802)
timsaucer Feb 21, 2025
13742d2
Improve feature flag CI coverage `datafusion` and `datafusion-functio…
alamb Mar 17, 2025
1394f3f
add benchmark for boolean_op
acking-you Apr 7, 2025
59cfced
fix cargo doc
acking-you Apr 7, 2025
775e70a
add binary_op bench
acking-you Apr 7, 2025
11816e9
Better comments
acking-you Apr 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions datafusion/physical-expr/src/expressions/binary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,12 @@ impl PhysicalExpr for BinaryExpr {
use arrow::compute::kernels::numeric::*;

let lhs = self.left.evaluate(batch)?;

// Optimize for short-circuiting `Operator::And` or `Operator::Or` operations and return early.
if check_short_circuit(&lhs, &self.op) {
return Ok(lhs);
}

let rhs = self.right.evaluate(batch)?;
let left_data_type = lhs.data_type();
let right_data_type = rhs.data_type();
Expand Down Expand Up @@ -805,6 +811,47 @@ impl BinaryExpr {
}
}

/// Check if it meets the short-circuit condition
/// 1. For the `AND` operator, if the `lhs` result all are `false`
/// 2. For the `OR` operator, if the `lhs` result all are `true`
/// 3. Otherwise, it does not meet the short-circuit condition
fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing some obvious points, but

  1. applying the same check for rhs is redundant? (does it approximately requires the same computation if we continue the execution as is)
  2. why aren't we also covering the cases when lhs is all true and op is AND / lhs is all false and op is OR

Copy link
Contributor Author

@acking-you acking-you Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your CR, your opinion is correct. I actually did the same. I'm very sorry. Maybe my comment is a little confusing. I will fix it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing some obvious points, but

  1. applying the same check for rhs is redundant? (does it approximately requires the same computation if we continue the execution as is)
  2. why aren't we also covering the cases when lhs is all true and op is AND / lhs is all false and op is OR

I've optimized the comments, so it may look a bit clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I couldn't be clear in my last comment.

This PR short-circuits this case:

[false,false,false...false] AND [xxx] => return lhs, which is [false,false,false...false]
[true, true, true... true] OR [xxx] => return lhs, which is [true, true, true...true]

I have 2 further ideas to discuss:
1)

[xxx] AND [false,false,false...false] => return rhs, which is [false,false,false...false]
[xxx] OR [true, true, true... true] => return rhs, which is [true, true, true...true]

Why don't we check this case as well?

[xxx] AND [true, true, true... true] => return lhs, which is [xxx]
[xxx] OR [false, false, false... false] => return lhs, which is [xxx]

isn't this case optimizable as well? Are we handling those cases in another place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I understand now. However, the current execution order of BinaryExpr is fixed to evaluate the left side first and then the right side. The situation you mentioned I had also considered before, but wouldn't it be better to do it this way: the optimizer rewrites simpler expressions to the left as much as possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but wouldn't it be better to do it this way: the optimizer rewrites simpler expressions to the left as much as possible?
Is there a planned work to implement this? If it is so, trying first lhs makes sense. But even if that's the case, checking the rhs values as well, before doing the all binary computation, is a preferable option?

Let's imagine this case also (I adapt my second example such that homogenous side is lhs)

[true, true, true... true] AND [xxx] => return rhs, which is [xxx]
[false, false, false... false] OR [xxx] => return rhs, which is [xxx]

Doesn't this brings a clear gain by short-circuiting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this brings a clear gain by short-circuiting?

I understand these now. Sorry, I didn't notice it just now and thought it was only a positional difference.

This is a great idea. I think we can open a new issue and add this optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can rewrite expressions like this without changing the short-circuit semantics (if lHS is false then don't run the RHS in a AND b)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can rewrite expressions like this without changing the short-circuit semantics (if lHS is false then don't run the RHS in a AND b)

We don't need a rewrite. evaluate() needs to return a ColumnarValue, so we can just return the short-circuited versions in my examples

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this brings a clear gain by short-circuiting?

I understand these now. Sorry, I didn't notice it just now and thought it was only a positional difference.

This is a great idea. I think we can open a new issue and add this optimization.

what about these?

[xxx] AND [false,false,false...false] => return rhs, which is [false,false,false...false]
[xxx] OR [true, true, true... true] => return rhs, which is [true, true, true...true]

let data_type = arg.data_type();
match (data_type, op) {
(DataType::Boolean, Operator::And) => {
match arg {
ColumnarValue::Array(array) => {
if let Ok(array) = as_boolean_array(&array) {
return array.false_count() == array.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this had some overhead (for calculating the counts) from a previous try.
I wonder if it helps to short optimize this expression (e.g. match until we get a chunk of the bitmap != 0)

Copy link
Contributor Author

@acking-you acking-you Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it helps to short optimize this expression (e.g. match until we get a chunk of the bitmap != 0)

I think the overhead added here should be very small (the compiler optimization should work well), and the test results we discussed before were sometimes fast and sometimes slow (maybe noise).

Your suggestion of making an early judgment and returning false seems like a good idea, but I'm not sure if it will be effective.
The concern I have with this approach is that it requires adding an if condition inside the for loop, which will most likely disable the compiler's SIMD instruction optimization (I've encountered a similar situation before, and I had to manually unroll the SIMD...).

Copy link
Contributor

@Dandandan Dandandan Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, we can use bool_and (https://docs.rs/arrow/latest/arrow/compute/fn.bool_and.html) and bool_or which operates on u64 values to test performance changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, we can use bool_and (https://docs.rs/arrow/latest/arrow/compute/fn.bool_and.html) and bool_or which operates on u64 values to test performance changes.

Thank you for your suggestion. I will try it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be overkill, but one could try a sampling approach: Run the loop with the early exit for the first few chunks, and then switch over to the unconditional loop.

Almost seems like something the compiler could automagically do...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be overkill, but one could try a sampling approach: Run the loop with the early exit for the first few chunks, and then switch over to the unconditional loop.

Thank you for your suggestion, but if we're only applying conditional checks to the first few blocks, then I feel this optimization might not be meaningful. If nearly all blocks can be filtered out by the preceding filter, the optimization will no longer be effective.

If we find that this slows down some other performance we could also add some sort of heuristic check to calling false_count / true_count -- like for example if the rhs arg is "complex" (not a Column for example)

I tend to agree with @alamb's point that if the overhead of verification is somewhat unacceptable, adopting some heuristic approaches would be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked more carefully at bool_or and I do think it would be faster than this implementation on the case where there are some true values (as it stops as soon as it finds a single non zero): https://docs.rs/arrow/latest/arrow/compute/fn.bool_or.html

}
}
ColumnarValue::Scalar(scalar) => {
if let ScalarValue::Boolean(Some(value)) = scalar {
return !value;
}
}
}
false
}
(DataType::Boolean, Operator::Or) => {
match arg {
ColumnarValue::Array(array) => {
if let Ok(array) = as_boolean_array(&array) {
return array.true_count() == array.len();
}
}
ColumnarValue::Scalar(scalar) => {
if let ScalarValue::Boolean(Some(value)) = scalar {
return *value;
}
}
}
false
}
_ => false,
}
}

fn concat_elements(left: Arc<dyn Array>, right: Arc<dyn Array>) -> Result<ArrayRef> {
Ok(match left.data_type() {
DataType::Utf8 => Arc::new(concat_elements_utf8(
Expand Down Expand Up @@ -4832,4 +4879,39 @@ mod tests {

Ok(())
}

#[test]
fn test_check_short_circuit() {
use crate::planner::logical2physical;
use datafusion_expr::col as logical_col;
use datafusion_expr::lit;
let schema = Arc::new(Schema::new(vec![
Field::new("a", DataType::Int32, false),
Field::new("b", DataType::Int32, false),
]));
let a_array = Int32Array::from(vec![1, 3, 4, 5, 6]);
let b_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
let batch = RecordBatch::try_new(
Arc::clone(&schema),
vec![Arc::new(a_array), Arc::new(b_array)],
)
.unwrap();

// op: AND left: all false
let left_expr = logical2physical(&logical_col("a").eq(lit(2)), &schema);
let left_value = left_expr.evaluate(&batch).unwrap();
assert!(check_short_circuit(&left_value, &Operator::And));
// op: AND left: not all false
let left_expr = logical2physical(&logical_col("a").eq(lit(3)), &schema);
let left_value = left_expr.evaluate(&batch).unwrap();
assert!(!check_short_circuit(&left_value, &Operator::And));
// op: OR left: all true
let left_expr = logical2physical(&logical_col("a").gt(lit(0)), &schema);
let left_value = left_expr.evaluate(&batch).unwrap();
assert!(check_short_circuit(&left_value, &Operator::Or));
// op: OR left: not all true
let left_expr = logical2physical(&logical_col("a").gt(lit(2)), &schema);
let left_value = left_expr.evaluate(&batch).unwrap();
assert!(!check_short_circuit(&left_value, &Operator::Or));
}
}