Skip to content

Add a function for splitting array into slices of given length#23264

Merged
abhinavmuk04 merged 1 commit intoprestodb:masterfrom
abhinavmuk04:splitudf
Jul 24, 2024
Merged

Add a function for splitting array into slices of given length#23264
abhinavmuk04 merged 1 commit intoprestodb:masterfrom
abhinavmuk04:splitudf

Conversation

@abhinavmuk04
Copy link
Contributor

@abhinavmuk04 abhinavmuk04 commented Jul 19, 2024

Description

Add a function for splitting array into slices of given length

Motivation and Context

(#22996)

Impact

UDF for more features within Presto

Test Plan

Various unit tests, similar to other existing UDF tests

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Note

== RELEASE NOTES ==
* Add a new built-in function for array splitting. Given an array and an integer N, split it into pieces of size N  :pr:`23264`

@abhinavmuk04 abhinavmuk04 linked an issue Jul 19, 2024 that may be closed by this pull request
@abhinavmuk04 abhinavmuk04 changed the title Create and test array split UDF Add a UDF for splitting array into slices of given length Jul 19, 2024
@abhinavmuk04 abhinavmuk04 requested a review from kaikalur July 19, 2024 21:53
kaikalur
kaikalur previously approved these changes Jul 19, 2024
Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is a UDF and not just a function?

@abhinavmuk04
Copy link
Contributor Author

Is there a reason this is a UDF and not just a function?

@elharo With things such as preventing division by zero and checking for larger element it allows for a bit more customization and can be beneficial as a UDF

@abhinavmuk04 abhinavmuk04 requested a review from kaikalur July 23, 2024 16:37
@abhinavmuk04 abhinavmuk04 marked this pull request as ready for review July 23, 2024 16:52
@abhinavmuk04 abhinavmuk04 requested a review from a team as a code owner July 23, 2024 16:52
@abhinavmuk04 abhinavmuk04 requested a review from presto-oss July 23, 2024 16:52
kaikalur
kaikalur previously approved these changes Jul 23, 2024
elharo
elharo previously requested changes Jul 23, 2024
Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell this is NOT a UDF. It is a new built-in function. Is there something I'm missing?

@kaikalur
Copy link
Contributor

we use the term loosely also SQL UDFs is a term we use for all these functions that we add now. No need to split hairs

@kaikalur kaikalur requested a review from feilong-liu July 23, 2024 19:07
Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presto-docs/src/main/sphinx/functions/array.rst needs to be updated in this PR to reflect the new function

@elharo elharo changed the title Add a UDF for splitting array into slices of given length Add a function for splitting array into slices of given length Jul 23, 2024
@elharo elharo dismissed their stale review July 23, 2024 19:34

edited

@abhinavmuk04 abhinavmuk04 requested a review from feilong-liu July 23, 2024 23:11
@aditi-pandit
Copy link
Contributor

aditi-pandit commented Jul 24, 2024

@feilong-liu : Thanks for this function. Could we add a test in https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeArrayFunctionQueries.java as well.

Since this is a SQL function, the native engine should be able to execute it without any changes.

@amitkdutta

@amitkdutta
Copy link
Contributor

@feilong-liu : Thanks for this function. Could we add a test in https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeArrayFunctionQueries.java as well.

Since this is a SQL function, the native engine should be able to execute it without any changes.

@amitkdutta

@aditi-pandit We have disabled inline-sql-function in all our production systems. There is a historical reason for it, I heard its due to regresssion in planning time, but I am not totally sure (CC: @tdcmeehan). In java, this means planning time the function is passed as-is to worker. In worker, inline-sql-function config is not respected and java worker can expand and execute the functions. In case of C++, the expansion is not there so C++ worker fails to run the function, and in some cases, functions can be implemented in worker (both native and java) to be very efficient. Hence, we are trying to add the functions as a regular function if possible.

@kaikalur
Copy link
Contributor

@feilong-liu : Thanks for this function. Could we add a test in https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeArrayFunctionQueries.java as well.
Since this is a SQL function, the native engine should be able to execute it without any changes.
@amitkdutta

@aditi-pandit We have disabled inline-sql-function in all our production systems. There is a historical reason for it, I heard its due to regresssion in planning time, but I am not totally sure (CC: @tdcmeehan). In java, this means planning time the function is passed as-is to worker. In worker, inline-sql-function config is not respected and java worker can expand and execute the functions. In case of C++, the expansion is not there so C++ worker fails to run the function, and in some cases, functions can be implemented in worker (both native and java) to be very efficient. Hence, we are trying to add the functions as a regular function if possible.

This is not acceptable. SQL functions is a firstclass feature and we should support them.

@kaikalur
Copy link
Contributor

@feilong-liu : Thanks for this function. Could we add a test in https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeArrayFunctionQueries.java as well.

Since this is a SQL function, the native engine should be able to execute it without any changes.

@amitkdutta

By definition we should be able to support all presto features in the backend especially sql functions. There are tests added and native should run all tests if they want to be compatible. It's too much burden to test both. It should be part of CI

@abhinavmuk04 abhinavmuk04 merged commit 1caebcc into prestodb:master Jul 24, 2024
@abhinavmuk04 abhinavmuk04 deleted the splitudf branch July 24, 2024 18:05
@aditi-pandit
Copy link
Contributor

@amitkdutta : We too have disabled inline-sql-function based on your guidance. Though the current setup is very hacky.

We are also on the wrong track to add new functionality as SQL functions. Might be better to make them Java functions, and expect a C++ implementation as well.

Agree with @kaikalur, that SQL functions should be supported at the backend.

Another option is to have a per-function annotation to enable/disable individual SQL functions.

@tdcmeehan tdcmeehan mentioned this pull request Aug 23, 2024
34 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a UDF for splitting array into slices of given length

6 participants