Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove type coercion in the binary physical expr #3396

Merged
merged 17 commits into from
Sep 24, 2022

Conversation

liukun4515
Copy link
Contributor

@liukun4515 liukun4515 commented Sep 8, 2022

Which issue does this PR close?

Closes #3388

depend on #3421 #3289 #3459 #3472

fixed the row filter predication for the data type of NULL value #3470

Rationale for this change

What changes are included in this PR?

In order to make the pr review more friendly for reviewer, I will explain what I did in this pr:

  1. remove the type coercion in the creation of the binary physical expr and fix the test case

  2. redo the simplify expression again after the type coercion in the optimizer. This is just an minor fix. If the issue move the type coercion out of the optimizer and refactor the optimizer #3582 done, we can remove the duplicated simplify expression in the optimizer.

  3. try do the type coercion before evaluate the logical expr in the file according to the comments in this issue simplify_expressions don't support different data type for binary #3556 from @alamb . This is just an temporary fix, and it will be remove when do move the type coercion out of the optimizer and refactor the optimizer #3582

  4. remove the between optimization in the simplify expression because of issue simplify between expr should consider the data type #3587, and will recovery when do move the type coercion out of the optimizer and refactor the optimizer #3582

  5. refresh the test cases

cc @alamb @andygrove

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions labels Sep 8, 2022
@liukun4515 liukun4515 force-pushed the remove_phy_coercion_#3387_#3388 branch from c150de1 to c2884f7 Compare September 8, 2022 13:39
@liukun4515 liukun4515 force-pushed the remove_phy_coercion_#3387_#3388 branch from 1d7c2a0 to 8c38433 Compare September 13, 2022 10:54
@github-actions github-actions bot added the optimizer Optimizer rules label Sep 13, 2022
@liukun4515
Copy link
Contributor Author

after rebase #3380, the test case evolved_schema_intersection_filter_with_filter_pushdown and evolved_schema_disjoint_schema_filter_with_pushdown failed

@liukun4515
Copy link
Contributor Author

this pr depend on the #3470 and #3472

@github-actions github-actions bot removed the optimizer Optimizer rules label Sep 16, 2022
@liukun4515
Copy link
Contributor Author

I meet so many issue when remove the binary type coercion which is caused from these pr #3301 #3246
For example is TRUE, these is a logical expr Expr::IsTrue but convert it to the physical binary op in https://github.com/andygrove/arrow-datafusion/blob/92b1a341aca334c54235ef65737c88037010e30b/datafusion/physical-expr/src/planner.rs#L86

@andygrove why not generate the logical binary op when create the Expr::IsTrue logical expr.

@liukun4515
Copy link
Contributor Author

liukun4515 commented Sep 16, 2022

I meet so many issue when remove the binary type coercion which is caused from these pr #3301 #3246 For example is TRUE, these is a logical expr Expr::IsTrue but convert it to the physical binary op in https://github.com/andygrove/arrow-datafusion/blob/92b1a341aca334c54235ef65737c88037010e30b/datafusion/physical-expr/src/planner.rs#L86

@andygrove why not generate the logical binary op when create the Expr::IsTrue logical expr.

cc @alamb

I create the issue #3509 to support other logical expr for the typ coercion.
If we can convert such expr like istrue, isfalse,like to binary op in the logical phase, we don't need to do this by adding more case in the TypeCoercion optimizer rule.
I don't know why we don't do this when implement the is true, is false in the https://github.com/apache/arrow-datafusion/pull/3301/files @andygrove , is there any optimization should be done in the logical phase with the Expr:IsTrue?

@alamb
Copy link
Contributor

alamb commented Sep 17, 2022

is there any optimization should be done in the logical phase with the Expr:IsTrue?

I think the only coercion it might need is that its argument is DataType::Boolean

@liukun4515 liukun4515 marked this pull request as ready for review September 21, 2022 15:00
@github-actions github-actions bot added the optimizer Optimizer rules label Sep 21, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 22, 2022

Codecov Report

Merging #3396 (3062cd5) into master (b54a56f) will increase coverage by 0.05%.
The diff coverage is 98.52%.

@@            Coverage Diff             @@
##           master    #3396      +/-   ##
==========================================
+ Coverage   86.03%   86.08%   +0.05%     
==========================================
  Files         300      300              
  Lines       56253    56313      +60     
==========================================
+ Hits        48395    48477      +82     
+ Misses       7858     7836      -22     
Impacted Files Coverage Δ
datafusion/core/src/physical_plan/planner.rs 78.03% <ø> (+0.56%) ⬆️
datafusion/core/tests/sql/aggregates.rs 99.38% <ø> (+<0.01%) ⬆️
datafusion/core/tests/sql/decimal.rs 100.00% <ø> (ø)
datafusion/optimizer/src/type_coercion.rs 96.01% <92.30%> (+3.61%) ⬆️
datafusion/physical-expr/src/expressions/binary.rs 97.42% <98.33%> (-0.21%) ⬇️
datafusion/core/src/execution/context.rs 79.38% <100.00%> (+0.07%) ⬆️
...ore/src/physical_optimizer/aggregate_statistics.rs 100.00% <100.00%> (ø)
...sion/core/src/physical_plan/file_format/parquet.rs 94.68% <100.00%> (+0.02%) ⬆️
datafusion/core/tests/sql/predicates.rs 100.00% <100.00%> (ø)
datafusion/core/tests/sql/select.rs 99.78% <100.00%> (ø)
... and 28 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@liukun4515 liukun4515 marked this pull request as ready for review September 22, 2022 14:27
@andygrove
Copy link
Member

I've started reviewing this, but there are many changes and referenced issues I need to go read. I will continue with the review tomorrow.

@alamb
Copy link
Contributor

alamb commented Sep 23, 2022

I also plan to review this PR, but I may not get to it today

@liukun4515
Copy link
Contributor Author

I try to explain what I do and why i do in the description What changes are included in this PR?

I hope this works for your review.

cc @andygrove @alamb

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a great PR to read @liukun4515 -- really nice.

Thank you for sticking with it.

I think we should merge it asap to minimize conflicts.

@@ -1452,7 +1452,11 @@ impl SessionState {
rules.push(Arc::new(FilterNullJoinKeys::default()));
}
rules.push(Arc::new(ReduceOuterJoin::new()));
// TODO: https://github.com/apache/arrow-datafusion/issues/3557
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense to me that we need to simplify expressons after coercion

"+--------------+---------------------------+---------------------------+---------------------------+",
"| 1.5 | 2.5 | 3.5 | 2.5 |",
"+--------------+---------------------------+---------------------------+---------------------------+",
"+--------------+-------------------------+-------------------------+-------------------------+",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Operator::Modulo,
DataType::Decimal128(10, 2)
);
// TODO add other data type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should file another ticket to track this gap? Specifically, a "help wanted" ticket that explained what was needed might encourage some additional contributions

Operator::GtEq,
DataType::Decimal128(15, 3)
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the tests cases in this file carefully. Very nice 👌

DataType::Boolean,
DataType::Boolean,
Operator::Or,
DataType::Boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any other types that are coerced to boolean for logical operations? Or are the tests for boolean just showing that boolean types are not changed when coerced?

@@ -69,14 +67,8 @@ impl OptimizerRule for TypeCoercion {
},
);

let mut execution_props = ExecutionProps::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice

@@ -892,25 +891,6 @@ impl BinaryExpr {
}
}

/// return two physical expressions that are optionally coerced to a
/// common type that the binary operator supports.
fn binary_cast(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@@ -1080,7 +1071,7 @@ mod tests {
Operator::Plus,
Int32Array,
DataType::Int32,
vec![2i32, 4i32]
vec![2i32, 4i32],
);
test_coercion!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these tests are doing anything useful anymore now that we are coercing in the logical layer? 🤔 It seems like they are now testing the test code 😆

@alamb
Copy link
Contributor

alamb commented Sep 24, 2022

I plan to get this PR ready to merge by: add my suggested comments, fix clippy, and merge to master

@alamb
Copy link
Contributor

alamb commented Sep 24, 2022

(since I already have it checked out this will be a simple thing for me)

@alamb alamb merged commit d7c0e42 into apache:master Sep 24, 2022
@ursabot
Copy link

ursabot commented Sep 24, 2022

Benchmark runs are scheduled for baseline = b625277 and contender = d7c0e42. d7c0e42 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove type-coercion from physical planner
5 participants