Skip to content

Add find_first lambda function#18316

Merged
rschlussel merged 1 commit intoprestodb:masterfrom
feilong-liu:array_udf_find
Sep 12, 2022
Merged

Add find_first lambda function#18316
rschlussel merged 1 commit intoprestodb:masterfrom
feilong-liu:array_udf_find

Conversation

@feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Sep 8, 2022

Fix #18304.

Add an UDF find for array, which returns the first array element which matches the predicate. Returns null if no match found.

  • find_first(array(T), function(T, boolean)) -> T
    Returns the first element which returns true for function(T, boolean). Return null if no match found.

  • find_first(array(T), startIndex, function(T, boolean)) -> T
    If startIndex>0, find match starting at the startIndex in array until the end of array. If startIndex<0, find match starting
    at the absolute(startIndex) counting from last, until the start of array. Array is 1 indexed, startIndex=0 will return exception.

Test plan - (Please fill in how you tested your changes)

Add unit tests.

== RELEASE NOTES ==

General Changes
* Add an UDF find for array
 Finds the first array element which matches a predicated. Returns null if no match found.
 
.. function:: find_first(array(E), function(T,boolean)) -> E

    Returns the first element of ``array`` which returns true for ``function(T,boolean)``. Returns ``NULL`` if no such element exists.

.. function:: find_first(array(E), index, function(T,boolean)) -> E

    Returns the first element of ``array`` which returns true for ``function(T,boolean)``. Returns ``NULL`` if no such element exists.
    If ``index`` > 0, the search for element starts at position ``index`` until the end of array.
    If ``index`` < 0, the search for element starts at position ``abs(index)`` counting from last, until the start of array.

        SELECT find_first(ARRAY[3, 4, 5, 6], 2, x -> x > 0); -- 4
        SELECT find_first(ARRAY[3, 4, 5, 6], -2, x -> x > 0); -- 5
        SELECT find_first(ARRAY[3, 4, 5, 6], 2, x -> x < 4); -- NULL
        SELECT find_first(ARRAY[3, 4, 5, 6], -2, x -> x > 5); -- NULL

@feilong-liu feilong-liu requested a review from a team as a code owner September 8, 2022 20:51
@feilong-liu feilong-liu force-pushed the array_udf_find branch 2 times, most recently from 23c66a6 to a7e2a12 Compare September 8, 2022 21:07
Copy link
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please see if we need these specializations or if we can just use boxed types and write a single method (for example like we do in: ArrayReduceFunction)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - you should not need two functions for this, I don't think. Just make it a variadic function and check in the constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get an error when trying to combine the two classes to be one, there is the same error in PR #15170 too from the conversation.

Copy link
Contributor

@pranjalssh pranjalssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits

@feilong-liu
Copy link
Contributor Author

Also please see if we need these specializations or if we can just use boxed types and write a single method (for example like we do in: ArrayReduceFunction)

Updated the diff. I see that there are some conversations in PR #15170 about boxed type vs. implementing multiple functions regarding performance which ends up in implementing multiple implementations. Should we still keep the same tradeoff here?

@kaikalur
Copy link
Contributor

kaikalur commented Sep 9, 2022

Also don't forget to add documentation in the rst file

@feilong-liu feilong-liu force-pushed the array_udf_find branch 2 times, most recently from 7e41b6b to c52f924 Compare September 9, 2022 05:06
@feilong-liu
Copy link
Contributor Author

Also please see if we need these specializations or if we can just use boxed types and write a single method (for example like we do in: ArrayReduceFunction)

Updated the diff. I see that there are some conversations in PR #15170 about boxed type vs. implementing multiple functions regarding performance which ends up in implementing multiple implementations. Should we still keep the same tradeoff here?

Discussed with @kaikalur offline, we are to keep it consistent with PR #15170 and implement multiple functions.

@feilong-liu
Copy link
Contributor Author

Also don't forget to add documentation in the rst file

Add document to rst file.

Add an UDF find for array, which returns the first array element which matches the predicate. Returns null if no match found.
Copy link
Contributor

@pranjalssh pranjalssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont have strong opinions of 2 vs 1 file. You could write multiple @ScalarFunction annotations but not very clean

Copy link
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@feilong-liu feilong-liu deleted the array_udf_find branch January 24, 2023 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add UDF to find the first element matching a predicate

4 participants