-
Notifications
You must be signed in to change notification settings - Fork 264
Feat: Support arrays_overlap function #1312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1312 +/- ##
=============================================
- Coverage 56.12% 39.09% -17.04%
- Complexity 976 2065 +1089
=============================================
Files 119 260 +141
Lines 11743 60269 +48526
Branches 2251 12834 +10583
=============================================
+ Hits 6591 23562 +16971
- Misses 4012 32226 +28214
- Partials 1140 4481 +3341 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| expr.children(1), | ||
| inputs, | ||
| (builder, binaryExpr) => builder.setArrayAppend(binaryExpr)) | ||
| case ArraysOverlap(leftArrayExpr, rightArrayExpr) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support this for all input data types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the PR only contains basic tests, could you add a check to enable this expression only if CometConf.COMET_CAST_ALLOW_INCOMPATIBLE is enabled? We can remove this check in a future PR that adds comprehensive tests and demonstrates that we have Spark-compatible behavior for all supported data types.`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense and COMET_CAST_ALLOW_INCOMPATIBLE check has been added.
| } | ||
| } | ||
|
|
||
| test("arrays_overlap") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests where one or both input expressions evaluate to null on some rows? This can be achieved with a CASE expression as in some of the other array tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for more coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case Expression based test case has been added.
|
@erenavsarogullari could you rebase this PR when you have time? We can go ahead and merge this and then improve tests as part of #1269 |
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Outdated
Show resolved
Hide resolved
32b78fd to
86d8f57
Compare
86d8f57 to
96f776e
Compare
Thanks @andygrove and @kazuyukitanimura for the review. PR has been rebased and addressed the comments. FYI, Currently, DataFusion does not support |
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @erenavsarogullari. LGTM.
| checkSparkAnswerAndOperator(sql( | ||
| "SELECT arrays_overlap(array('a', null), array('b', null)) from t1 where _1 is not null")) | ||
| checkSparkAnswerAndOperator(spark.sql( | ||
| "SELECT arrays_overlap((CASE WHEN _2 =_3 THEN array(_6, _7) END), array(_6, _7)) FROM t1")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think _2 = _3 may always be true. The problem with makeParquetFileAllTypes is that every for each row, each column contains the same integer value cast to the column's type, so it is not ideal for tests like this. We can improve as part of #1269
Which issue does this PR close?
Related to Epic: #1042
arrays_overlap:select arrays_overlap(array('hello', '-', 'world'), array('hello'))=>trueDataFusion' s array_has_any has same behavior with Spark 's arrays_overlap function
Spark:https://docs.databricks.com/en/sql/language-manual/functions/arrays_overlap.htmlDataFusion:https://datafusion.apache.org/user-guide/sql/scalar_functions.html#array-has-anyRationale for this change
Defined under Epic: #1042
What changes are included in this PR?
planner.rs:Maps Spark 'sarrays_overlapfunction to DataFusionarray_has_anyphysical expression from Spark physical expression with return type: DataType::Boolean,expr.proto:arrays_overlap message has been added,QueryPlanSerde.scala:arrays_overlap pattern matching case has been added,CometExpressionSuite.scala:A new UT has been added for arrays_overlap function.How are these changes tested?
A new UT has been added.