Add SQL functions ARRAY_DUPES and ARRAY_HAS_DUPES#16317
Add SQL functions ARRAY_DUPES and ARRAY_HAS_DUPES#16317rongrong merged 1 commit intoprestodb:masterfrom
Conversation
There was a problem hiding this comment.
'repeated only once' is confusing. Just say the result is a set of elements that occur more than once in the original array.
There was a problem hiding this comment.
You don't need to repeat for every type. Just make it ARRAY like we do for other functions.
There was a problem hiding this comment.
because that's the only 2 versions of the functions we spell out. It's probably nice to support generic types for these builtin SQL functions at least.
There was a problem hiding this comment.
I think we already have array_frequency. If so just use that and do something like:
map_keys(map_filter(array_frequency(input), (k, v) -> v > 1))
There was a problem hiding this comment.
Please double check, we might not be allowing using sql functions inside sql functions. We probably don't need to be this strict. It should be more like not allowing dynamic functions in other dynamic functions (I don't remember what we were actually checking)
|
Also I wonder if we should rename this file to ArrayFunctions or something, it's not just arithmetic anymore. |
036f58e to
c931df0
Compare
|
Addressed comments |
There was a problem hiding this comment.
Let's say that element type of x has to be coercible to bigint or varchar and the return type will be bigint / varchar? Please refer to the documentation of array_sum.
There was a problem hiding this comment.
Let's say that element type of x has to be coercible to bigint or varchar and the return type will be bigint / varchar? Please refer to the documentation of
array_sum.
Hmm, why only those two types? It should be applicable to any comparable type right?
101e881 to
4c74c1a
Compare
|
CC: @tdcmeehan @highker |
|
These functions are already used in production. Do you want to add aliases for them? |
Perhaps, we can add aliases, then update prod queries, then delete original names. Are there a lot of these? |
|
Do we have other examples of Presto functions where we use abbreviated names? Or other SQL engines that use these names? Would be nice to be consistent. |
Added SQL functions:
ARRAY_DUPES: Returns elements that are duplicated in input
ARRAY_HAS_DUPES: Retruns boolean whether array has any duplicates
null is also considered a valid element and accounted for. This follows what array_distinct does.
Tracking Issue: #15656