Skip to content

Add support for array_least_frequent UDF#20752

Merged
NikhilCollooru merged 1 commit intoprestodb:masterfrom
jainavi17:array_least_frequent
Oct 8, 2023
Merged

Add support for array_least_frequent UDF#20752
NikhilCollooru merged 1 commit intoprestodb:masterfrom
jainavi17:array_least_frequent

Conversation

@jainavi17
Copy link
Contributor

@jainavi17 jainavi17 commented Sep 3, 2023

Description

UDF to return the least frequent element of an array

Motivation and Context

Currently there is no easy way to get least frequent element of an array. This change add that support natively.

Test Plan

./mvnw clean install -Dtest=TestArraySqlFunctions -Dmaven.javadoc.skip=true -T1C -fn -pl presto-main

== RELEASE NOTES ==

General Changes

  • Add function :func:array_least_frequent

    • SELECT ARRAY_LEAST_FREQUENT(ARRAY[1, 2, 2, 3, 3, 3]); -- 1

@jainavi17 jainavi17 requested a review from a team as a code owner September 3, 2023 21:16
@jainavi17 jainavi17 requested a review from presto-oss September 3, 2023 21:16
@github-actions
Copy link

github-actions bot commented Sep 3, 2023

Codenotify: Notifying subscribers in CODENOTIFY files for diff ed0c6e7...2f66df7.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/functions/array.rst

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (docs), with a very small suggestion for consideration.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

@jainavi17 jainavi17 force-pushed the array_least_frequent branch 4 times, most recently from 8e580e9 to 55b37f2 Compare September 20, 2023 22:08
@jainavi17 jainavi17 requested a review from kaikalur September 20, 2023 22:09
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try transposing and sorting?

array_sort(TRANSFORM(MAP_ENTRIES(m),(k,v)->(v,k)))

Then you don't need the lambda for the comparison

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah what I'm doing is pretty much same. If we're to remove the lambda then it requires additional transform. This is how its gonna look which is not much different from what we have.

TRANSFORM(SLICE(ARRAY_SORT(TRANSFORM(MAP_ENTRIES(ARRAY_FREQUENCY(REMOVE_NULLS(input))), x -> ROW(x[2], x[1]))), 1, n), x -> x[2])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it will be good to do a sqlbenchmark to see which is best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran the banchmark test. Turned out the non lambda version is 0.7 secs faster than compared to the one with lambda.

Here is the result without lambda
Screenshot 2023-10-08 at 14 41 20

Here is the result with lambda
Screenshot 2023-10-08 at 14 30 29

UDF to return the least frequent eleement of an array
@jainavi17 jainavi17 force-pushed the array_least_frequent branch from e2ffedd to 2f66df7 Compare October 8, 2023 14:07
@NikhilCollooru NikhilCollooru merged commit 9d121df into prestodb:master Oct 8, 2023
@kaikalur
Copy link
Contributor

kaikalur commented Oct 8, 2023 via email

@wanglinsong wanglinsong mentioned this pull request Dec 8, 2023
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants