Conversation
elharo
left a comment
There was a problem hiding this comment.
Implementation looks good. I think this needs an issue, and perhaps an RFC.
Also if it's in the code is it a UDF or just a function?
presto-main/src/test/java/com/facebook/presto/type/TestArrayOperators.java
Outdated
Show resolved
Hide resolved
|
I suggest array_contains_all as the name |
presto-main/src/main/java/com/facebook/presto/operator/scalar/ArrayContainsAllFunction.java
Show resolved
Hide resolved
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for the doc! Nit only.
50d6889 to
62668f4
Compare
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, new local doc build, docs look good. Thanks!
468b26a to
f86b0f8
Compare
presto-main/src/main/java/com/facebook/presto/operator/scalar/ArrayContainsAllFunction.java
Show resolved
Hide resolved
| @SqlType("array(T)") Block firstArray, | ||
| @SqlType("array(T)") Block secondArray) | ||
| { | ||
| TypedSet firstSet = new TypedSet(elementType, firstArray.getPositionCount(), "arrayContainsAll"); |
There was a problem hiding this comment.
Can you benchmark this version vs. using OptimizedTypeSet and checking the cardinality of the intersection of the two sets? I expect that using OptimizedTypedSet would be faster except for the case where a large second array is able to short circuit very early.
There was a problem hiding this comment.
Actually curious - how about CONTAINS? @rschlussel did you fix that as well to be IS DISTINCT FROM? This function should behave like contains for that part
There was a problem hiding this comment.
contains fails for all null values. If we want this to fail for all nulls, we can do that as well, but we shouldn't have failures only for nulls nested in complex types, but otherwise compare nulls as equal (which is what typedset does by default)
There was a problem hiding this comment.
@jainavi17 did you have a chance to benchmark vs. a version using OptimizedTypeSet?
presto-main/src/test/java/com/facebook/presto/type/TestArrayContainsAll.java
Show resolved
Hide resolved
|
FWIW, if anything, this function does not exist in Spark SQL, so adding this function increases the divergence between Presto SQL and Spark SQL. I don't know how much weight that carries here, though I do know some people care about this |
That's not a concern especially because adding udfs to Spark is easier |
|
@jainavi17 : Please can you create an issue to add this function for the Native engine. |
Native engine as in? |
d209068 to
a411156
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Just a nit of formatting.
Scalar function that takes two arrays an input and checks it all elements of right array are present in left array
a411156 to
8311dd3
Compare
|
Hi @NikhilCollooru , if you could review the PR please. |
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, new local doc build. Thanks!
|
@elharo if you could take a look at the changes please. |
| @SqlType("array(T)") Block firstArray, | ||
| @SqlType("array(T)") Block secondArray) | ||
| { | ||
| TypedSet firstSet = new TypedSet(elementType, firstArray.getPositionCount(), "arrayContainsAll"); |
There was a problem hiding this comment.
can you use distinct semantics here so that we don't throw an exception for arrays of arrays with nulls? You pass in the "distinct" function handle to the typedSet. You can follow how it's done for array_distinct here: You can use "distinct comparison" for the typedset following what we do for array_distinct see
| @SqlType("array(T)") Block firstArray, | ||
| @SqlType("array(T)") Block secondArray) | ||
| { | ||
| TypedSet firstSet = new TypedSet(elementType, firstArray.getPositionCount(), "arrayContainsAll"); |
There was a problem hiding this comment.
@jainavi17 did you have a chance to benchmark vs. a version using OptimizedTypeSet?
| } | ||
|
|
||
| @Test | ||
| public void testNulls() |
There was a problem hiding this comment.
add a test for array of arrays/rows containing nulls
native engine means that we need a corresponding c++ function for this as we are in the process of migrating from java workers to c++ workers. |
|
Hi @rschlussel I tried to do benchmark test but somehow my IDE is breaking and I'm unable to set up the benchmark test. Is there any way I can do it on command line? Thanks |
Core function to return whether all elements of a second array are present in the first array
Test plan
Added unit tests.
Build successfully using the following terminal command
mvn clean install -Dtest=TestArrayOperators -Dmaven.javadoc.skip=true -DskipUI -T1C -fn -pl presto-main