Skip to content

Add all_match, any_match and none_match functions for arrays#1045

Closed
lxynov wants to merge 3 commits intotrinodb:masterfrom
lxynov:array_match_functions
Closed

Add all_match, any_match and none_match functions for arrays#1045
lxynov wants to merge 3 commits intotrinodb:masterfrom
lxynov:array_match_functions

Conversation

@lxynov
Copy link
Member

@lxynov lxynov commented Jun 26, 2019

#1036

Test done:
Added TestArrayMatchFunctions. mvn clean install runs successfully in presto-main directory.

Notes:

  1. NULL elements in an array are considered as "not matching". E.g., select all_match(array[1,null], x -> x = 1) returns false.
  2. Empty arrays result in a false in all_match and any_match, but a true in none_match.
  3. NIT: Some interfaces have the same signature, so they can be coalesced into one. For example, interfaces ArrayAllMatchFunction.AllMatchBlockLambda, ArrayAnyMatchFunction.AnyMatchBlockLambda and ArrayFilterFunction.FilterBlockLambda are identical, and they can be replaced by a new PredicateLambdaBlock interface. I think this can be done in a separate commit if it's necessary.

@cla-bot cla-bot bot added the cla-signed label Jun 26, 2019
@findepi
Copy link
Member

findepi commented Jun 26, 2019

Empty arrays result in a false in all_match

Empty array should make all_match return true (just like java.util.stream.Stream#allMatch does).

NULL elements in an array are considered as "not matching".

We have two other options to consider

  1. invoke lambda for NULL values and see what it returns (my preferred)
  2. return NULL when array contains any NULL

@martint?

@dain
Copy link
Member

dain commented Jun 26, 2019

We have two other options to consider

  1. invoke lambda for NULL values and see what it returns (my preferred)

I would assume we want this so we can write all_match(my_array, e -> e is null)

@lxynov
Copy link
Member Author

lxynov commented Jun 26, 2019

@findepi @dain Thanks for the comments. I think these are good points. I'll update the PR.

@lxynov lxynov force-pushed the array_match_functions branch from e02e12b to e130e6d Compare June 26, 2019 21:56
@lxynov
Copy link
Member Author

lxynov commented Jun 26, 2019

@findepi @dain

The PR has been updated. Now the behaviors are:

  1. NULL elements are passed to lambda function
  2. all_match(empty array, lambda) and none_match(empty array, lambda return true, and any_match(empty array, lambda) returns false, which is consistent with relevant methods in java.util.stream.Stream

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. Please add documentation in presto-docs/src/main/sphinx/functions/array.rst as part of the same commit.

@lxynov lxynov force-pushed the array_match_functions branch 2 times, most recently from 650b0fe to 22e6f16 Compare June 26, 2019 23:41
@lxynov
Copy link
Member Author

lxynov commented Jun 26, 2019

@electrum Thanks for the review. I've updated the PR. Please note that I also changed the implementation because the previous one doesn't take into account the case when these functions return null.

Copy link
Contributor

@wagnermarkd wagnermarkd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect null handling for any_match(ARRAY[a, b], x -> f(x)) to behave the same as f(a) OR f(b). So for example, any_match(ARRAY[NULL, true], x -> x) should be true. Analogously, all_match(ARRAY[NULL, false], x -> x) ought to be false.

In addition to consistency with non array behavior, this has the benefit of regaining the ability to exit loops early.

The SQL standard defines the set functions EVERY, ANY, and SOME. @electrum do you know if there's a standardized correspondence of set functions and functions on array?

@findepi
Copy link
Member

findepi commented Jun 27, 2019

I expect null handling for any_match(ARRAY[a, b], x -> f(x)) to behave the same as f(a) OR f(b). So for example, any_match(ARRAY[NULL, true], x -> x) should be true. Analogously, all_match(ARRAY[NULL, false], x -> x) ought to be false.

@wagnermarkd this indeed is a reasonable expectation.

@martint
Copy link
Member

martint commented Jun 27, 2019

There are not many functions defined for arrays in the SQL standard, but there are ways to convert from array to table and viceversa. These functions should mimic the semantics of operations over tables as much as possible. EVERY, ANY and SOME are quantifiers for comparison operations, so they are not quite the same.

The closest to that would be an expression like:

true = any (select f(x) from unnest(a) as t(x))

@lxynov
Copy link
Member Author

lxynov commented Jun 27, 2019

I expect null handling for any_match(ARRAY[a, b], x -> f(x)) to behave the same as f(a) OR f(b). So for example, any_match(ARRAY[NULL, true], x -> x) should be true. Analogously, all_match(ARRAY[NULL, false], x -> x) ought to be false.

In addition to consistency with non array behavior, this has the benefit of regaining the ability to exit loops early.

@wagnermarkd Thanks for the review. I think this makes sense from the user's perspective. And if the lambda returns null for the element, should this element be regarded as "not matching"? In other words, should all_match(ARRAY[null, null], x -> x) return false or null?

@wagnermarkd
Copy link
Contributor

That should return the same as null AND null, which is null.

@lxynov
Copy link
Member Author

lxynov commented Jun 27, 2019

That should return the same as null AND null, which is null.

@wagnermarkd
Thanks! I see your point better now.

This is quite interesting. To summarize here, in Presto:

  • SELECT TRUE AND NULL returns NULL
  • SELECT FALSE AND NULL returns FALSE
  • SELECT NULL AND NULL returns NULL
  • SELECT TRUE OR NULL returns TRUE
  • SELECT FALSE OR NULL returns NULL
  • SELECT NULL OR NULL returns NULL

Not sure if these are SQL standards or Presto's own behaviors, but I will make the match functions be consistent with them.

@martint
Copy link
Member

martint commented Jun 27, 2019

Yes, that’s standard SQL behavior. In boolean expressions, NULL behaves like the UNKNOWN value. You can easily reason about those outcomes if you take that into account.

@dain
Copy link
Member

dain commented Jun 27, 2019

We even have documentation for that :) https://prestosql.io/docs/current/functions/logical.html

@lxynov lxynov force-pushed the array_match_functions branch from 22e6f16 to b647ec1 Compare July 9, 2019 23:05
@lxynov
Copy link
Member Author

lxynov commented Jul 9, 2019

@wagnermarkd @findepi @dain @electrum @martint
Thanks for the review. This PR has been updated to be consistent with the logical operators' behaviors in Presto.

@lxynov lxynov force-pushed the array_match_functions branch from b647ec1 to 5b4761c Compare July 9, 2019 23:38
Copy link
Contributor

@wagnermarkd wagnermarkd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @findepi or @electrum, would you take a look?

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, but otherwise it looks great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating these interfaces across the multiple functions, pull them out and reuse them. There's nothing that ties them to each of the functions -- all that matters is that they are functional interfaces for XXX -> YYY, for some XXX and YYY types.

Also, I'd rename them to SliceToBooleanFunction, LongToBooleanFunction, DoubleToBooleanFunction and BooleanToBooleanFunction (the "lambda" is the syntactic construct of what gets passed in, but the methods that use them take a "function")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint
I'd also pull out interfaces in ArrayFilterFunction because they are same as those in ArrayAllMatchFunction and ArrayAnyMatchFunction.

Two questions:

  1. Should I put these interfaces in package io.prestosql.operator.scalar?
  2. Should I separate the changes in two commits? One for extracting and renaming interfaces from ArrayFilterFunction, the other one for adding match functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both seem reasonable

@lxynov lxynov force-pushed the array_match_functions branch from 5b4761c to 7bc5b2b Compare July 24, 2019 20:32
@martint
Copy link
Member

martint commented Jul 26, 2019

Merged, thanks!

@martint martint mentioned this pull request Jul 26, 2019
5 tasks
@dain
Copy link
Member

dain commented Aug 1, 2019

Looks like this was merged.

@dain dain closed this Aug 1, 2019
@lxynov lxynov deleted the array_match_functions branch August 1, 2019 20:58
wenleix added a commit to wenleix/presto that referenced this pull request Nov 21, 2019
Cherry-picked from trinodb/trino#1045

Co-authored-by: Xingyuan Lin <xinlin@linkedin.com>
wenleix added a commit to wenleix/presto that referenced this pull request Nov 21, 2019
Cherry-picked from: trinodb/trino#1045

Co-authored-by: Xingyuan Lin <xinlin@linkedin.com>
wenleix added a commit to wenleix/presto that referenced this pull request Nov 22, 2019
Cherry-picked from: trinodb/trino#1045

Co-authored-by: Xingyuan Lin <xinlin@linkedin.com>
wenleix added a commit to prestodb/presto that referenced this pull request Nov 22, 2019
Cherry-picked from trinodb/trino#1045

Co-authored-by: Xingyuan Lin <xinlin@linkedin.com>
wenleix added a commit to prestodb/presto that referenced this pull request Nov 22, 2019
Cherry-picked from: trinodb/trino#1045

Co-authored-by: Xingyuan Lin <xinlin@linkedin.com>
wenleix added a commit to prestodb/presto that referenced this pull request Nov 22, 2019
Cherry-picked from: trinodb/trino#1045

Co-authored-by: Xingyuan Lin <xinlin@linkedin.com>
kaikalur pushed a commit to kaikalur/presto that referenced this pull request Jan 22, 2020
Cherry-picked from trinodb/trino#1045

Co-authored-by: Xingyuan Lin <xinlin@linkedin.com>
kaikalur pushed a commit to kaikalur/presto that referenced this pull request Jan 22, 2020
Cherry-picked from: trinodb/trino#1045

Co-authored-by: Xingyuan Lin <xinlin@linkedin.com>
kaikalur pushed a commit to kaikalur/presto that referenced this pull request Jan 22, 2020
Cherry-picked from: trinodb/trino#1045

Co-authored-by: Xingyuan Lin <xinlin@linkedin.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants