Skip to content

Fix map_top_n to use keys to break ties on NULL values. #22979

Merged
kgpai merged 1 commit intoprestodb:masterfrom
kgpai:map_top_n_fix
Jul 8, 2024
Merged

Fix map_top_n to use keys to break ties on NULL values. #22979
kgpai merged 1 commit intoprestodb:masterfrom
kgpai:map_top_n_fix

Conversation

@kgpai
Copy link
Contributor

@kgpai kgpai commented Jun 10, 2024

Description

Fix map_top_n to use keys to break ties for NULL values.

Motivation and Context

Map_top_n currently uses keys to break ties when values match. This is however not done when values are null , which leads to some non determinism when compared agains the presto_native implementation.

fixes #22778

Impact

Since current behavior is non deterministic, this has no impact on existing queries.

Test Plan

See attached unit test.

Release Notes

== RELEASE NOTES ==
General Changes
* Make map_top_n function more deterministic by using keys to break ties.

@kgpai kgpai requested a review from a team as a code owner June 10, 2024 22:01
@kgpai kgpai requested review from presto-oss and rschlussel June 10, 2024 22:01
{
assertFunction("MAP_TOP_N(NULL, 1)", mapType(UNKNOWN, UNKNOWN), null);

// If values are null , then use keys to break ties.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no space before comma

Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Since current behavior is non deterministic, this has no impact on existing queries."

Is it truly random, or is it possibly deterministic for some users in a way they might be depending on? I do think this one should be called out in the release notes.

@rschlussel
Copy link
Contributor

Please update the documentation as well.

@kgpai
Copy link
Contributor Author

kgpai commented Jun 13, 2024

@elharo Current behavior would depend on order of values in the valueBlock - I presume this means its dependent on order of input, and thus might be non deterministic. I will update and call out the present change to make it deterministic in the documentation.

@kgpai kgpai requested a review from steveburnett as a code owner June 13, 2024 17:55
@kgpai
Copy link
Contributor Author

kgpai commented Jun 13, 2024

cc: @elharo @rschlussel Please have a look when possible. Thanks !

.. function:: map_top_n_keys(x(K,V), n) -> array(K)

Returns top ``n`` keys in the map ``x`` by sorting its keys in descending order.
Returns top ``n`` keys in the map ``x`` by sorting its keys in descending order. Keys are used to break ties with the max key being chosen. Both keys and values should be orderable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you updated the wrong documentation. This should be at map_top_n (a few entries below), not map_top_n_keys

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies and Thank you @rschlussel . Fixed.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc update! Two small nits, nothing big.

.. function:: map_top_n(x(K,V), n) -> map(K, V)

Truncates map items. Keeps only the top N elements by value.
Truncates map items. Keeps only the top N elements by value. Keys are used to break ties with the max key being chosen. Both keys and values should be orderable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Truncates map items. Keeps only the top N elements by value. Keys are used to break ties with the max key being chosen. Both keys and values should be orderable.
Truncates map items. Keeps only the top ``n`` elements by value. Keys are used to break ties with the max key being chosen. Both keys and values should be orderable.

Nit, formatting.


Truncates map items. Keeps only the top N elements by value.
Truncates map items. Keeps only the top N elements by value. Keys are used to break ties with the max key being chosen. Both keys and values should be orderable.
``n`` must be a non-negative integer.::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``n`` must be a non-negative integer.::
``n`` must be a non-negative integer. ::

Nit: no space between the . and the first : results in "a non-negative integer.: " as shown in this screenshot
Screenshot 2024-06-18 at 10 56 21 AM

@kgpai
Copy link
Contributor Author

kgpai commented Jun 18, 2024

@steveburnett Thank you, made the changes you requested.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final nit in formatting, nothing else!

@kgpai
Copy link
Contributor Author

kgpai commented Jun 20, 2024

@steveburnett Thank you again ! Made changes as per your request.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, reviewed new local docs build, looks good. Thanks!

@rschlussel
Copy link
Contributor

@elharo can you review please?

public static String mapTopN()
{
return "RETURN IF(n < 0, fail('n must be greater than or equal to 0'), map_from_entries(slice(array_sort(map_entries(map_filter(input, (k, v) -> v is not null)), (x, y) -> IF(x[2] < y[2], 1, IF(x[2] = y[2], IF(x[1] < y[1], 1, -1), -1))) || map_entries(map_filter(input, (k, v) -> v is null)), 1, n)))";
return "RETURN IF(n < 0, fail('n must be greater than or equal to 0'), map_from_entries(slice(array_sort(map_entries(map_filter(input, (k, v) -> v is not null)), (x, y) -> IF(x[2] < y[2], 1, IF(x[2] = y[2], IF(x[1] < y[1], 1, -1), -1))) "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required but if there's any plausible way to use indentation and line breaks here to make this code easier to read, that would help a lot. Without that, it's very hard to follow the logic.

@kgpai kgpai merged commit 6a0b194 into prestodb:master Jul 8, 2024
@tdcmeehan tdcmeehan mentioned this pull request Aug 23, 2024
34 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve map_top_n to consistently break ties for null values

4 participants