Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for order sensitive NTH_VALUE aggregation, make reverse ARRAY_AGG more efficient #8841

Merged
merged 32 commits into from
Jan 16, 2024
Merged

Support for order sensitive NTH_VALUE aggregation, make reverse ARRAY_AGG more efficient #8841

merged 32 commits into from
Jan 16, 2024

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Currently, we have FIRST_VALUE, LAST_VALUE, ARRAY_AGG order sensitive aggregations functions. Similar to the window counter part one may need to index entries other than first and last. With NTH_VALUE aggregation function one can choose arbitrary indices. With this PR one can write following query, to get second row in the group.

SELECT a, b, NTH_VALUE(c, 2)
FROM multiple_ordered_table
GROUP BY a, b
ORDER BY a, b;

As a workaround following query can be used to produce desired output of the query above.

SELECT a, b, ARRAY_AGG(c)[2]
FROM multiple_ordered_table
GROUP BY a, b
ORDER BY a, b;

However, with dedicated aggregator, first query will use less memory.

What changes are included in this PR?

This PR adds NTH_VALUE aggregator support.

Are these changes tested?

Yes,

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Jan 12, 2024
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this PR and it looks good except one issue about clones during merge_batch operations. There seems to be an inefficiency there which is not introduced by this PR, but I discovered it during review. I will consult with @mustafasrepo tomorrow to discuss possible fixes.

@ozankabak ozankabak changed the title ORDER sensitive NTH_VALUE Aggregation support Support for order sensitive NTH_VALUE aggregation, make reverse ARRAY_AGG more efficient Jan 16, 2024
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this PR thoroughly again and the inefficiency I found in my first review is resolved -- we both have faster ARRAY_AGGs now, and also have an efficient NTH_VALUE.

@ozankabak ozankabak merged commit 8cf1abb into apache:main Jan 16, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants