Fix map_top_n to use keys to break ties on NULL values. #22979
Fix map_top_n to use keys to break ties on NULL values. #22979kgpai merged 1 commit intoprestodb:masterfrom
Conversation
| { | ||
| assertFunction("MAP_TOP_N(NULL, 1)", mapType(UNKNOWN, UNKNOWN), null); | ||
|
|
||
| // If values are null , then use keys to break ties. |
elharo
left a comment
There was a problem hiding this comment.
"Since current behavior is non deterministic, this has no impact on existing queries."
Is it truly random, or is it possibly deterministic for some users in a way they might be depending on? I do think this one should be called out in the release notes.
|
Please update the documentation as well. |
|
@elharo Current behavior would depend on order of values in the valueBlock - I presume this means its dependent on order of input, and thus might be non deterministic. I will update and call out the present change to make it deterministic in the documentation. |
|
cc: @elharo @rschlussel Please have a look when possible. Thanks ! |
| .. function:: map_top_n_keys(x(K,V), n) -> array(K) | ||
|
|
||
| Returns top ``n`` keys in the map ``x`` by sorting its keys in descending order. | ||
| Returns top ``n`` keys in the map ``x`` by sorting its keys in descending order. Keys are used to break ties with the max key being chosen. Both keys and values should be orderable. |
There was a problem hiding this comment.
I think you updated the wrong documentation. This should be at map_top_n (a few entries below), not map_top_n_keys
There was a problem hiding this comment.
Apologies and Thank you @rschlussel . Fixed.
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for the doc update! Two small nits, nothing big.
| .. function:: map_top_n(x(K,V), n) -> map(K, V) | ||
|
|
||
| Truncates map items. Keeps only the top N elements by value. | ||
| Truncates map items. Keeps only the top N elements by value. Keys are used to break ties with the max key being chosen. Both keys and values should be orderable. |
There was a problem hiding this comment.
| Truncates map items. Keeps only the top N elements by value. Keys are used to break ties with the max key being chosen. Both keys and values should be orderable. | |
| Truncates map items. Keeps only the top ``n`` elements by value. Keys are used to break ties with the max key being chosen. Both keys and values should be orderable. |
Nit, formatting.
|
|
||
| Truncates map items. Keeps only the top N elements by value. | ||
| Truncates map items. Keeps only the top N elements by value. Keys are used to break ties with the max key being chosen. Both keys and values should be orderable. | ||
| ``n`` must be a non-negative integer.:: |
|
@steveburnett Thank you, made the changes you requested. |
steveburnett
left a comment
There was a problem hiding this comment.
One final nit in formatting, nothing else!
|
@steveburnett Thank you again ! Made changes as per your request. |
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, reviewed new local docs build, looks good. Thanks!
|
@elharo can you review please? |
| public static String mapTopN() | ||
| { | ||
| return "RETURN IF(n < 0, fail('n must be greater than or equal to 0'), map_from_entries(slice(array_sort(map_entries(map_filter(input, (k, v) -> v is not null)), (x, y) -> IF(x[2] < y[2], 1, IF(x[2] = y[2], IF(x[1] < y[1], 1, -1), -1))) || map_entries(map_filter(input, (k, v) -> v is null)), 1, n)))"; | ||
| return "RETURN IF(n < 0, fail('n must be greater than or equal to 0'), map_from_entries(slice(array_sort(map_entries(map_filter(input, (k, v) -> v is not null)), (x, y) -> IF(x[2] < y[2], 1, IF(x[2] = y[2], IF(x[1] < y[1], 1, -1), -1))) " |
There was a problem hiding this comment.
Not required but if there's any plausible way to use indentation and line breaks here to make this code easier to read, that would help a lot. Without that, it's very hard to follow the logic.

Description
Fix map_top_n to use keys to break ties for NULL values.
Motivation and Context
Map_top_n currently uses keys to break ties when values match. This is however not done when values are null , which leads to some non determinism when compared agains the presto_native implementation.
fixes #22778
Impact
Since current behavior is non deterministic, this has no impact on existing queries.
Test Plan
See attached unit test.
Release Notes