Skip to content

Fix analyzer for lambda in aggregation#22539

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:fix_lambda_agg
Apr 19, 2024
Merged

Fix analyzer for lambda in aggregation#22539
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:fix_lambda_agg

Conversation

@feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Apr 16, 2024

Description

Fix issue #22558
Fix query compilation error for lambda function in aggregation.
Sample query

SELECT
    id,
    REDUCE_AGG(value, 0, (a, b) -> a + b + 0, (a, b) -> a + b)
FROM (
    VALUES
        (1, 2),
        (1, 3),
        (1, 4),
        (2, 20),
        (2, 30),
        (2, 40)
) AS t(id, value)
GROUP BY
    id

It will throw error (GENERIC_INTERNAL_ERROR) no type found for symbol 'expr_10'
This is because the 0 inside the lambda was rewritten to symbol 'expr_10'. The fix is to skip rewrite for constants within lambda function.

Motivation and Context

Fix bug.

Impact

Fix compilation error for query

Test Plan

Unit tests

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fix compilation error for queries with lambda in aggregation function

@feilong-liu feilong-liu requested review from a team and jaystarshot as code owners April 16, 2024 21:13
@feilong-liu feilong-liu requested a review from presto-oss April 16, 2024 21:13
@feilong-liu feilong-liu marked this pull request as draft April 16, 2024 21:13
@feilong-liu feilong-liu force-pushed the fix_lambda_agg branch 2 times, most recently from 7f4bf51 to a8dd1dc Compare April 17, 2024 16:30
tempExpression = ((Cast) tempExpression).getExpression();
}

if (tempExpression instanceof Literal || tempExpression instanceof ArrayConstructor) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elements in array constructor should also be Literals

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one more test for this in abstracttestqueries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This util is used for the reduce_agg function, and it will change the behaviour for the reduce_agg function.
For example, currently the query
SELECT id, reduce_agg(value, array[id, value], (a, b) -> a || b, (a, b) -> a || b) FROM ( VALUES (1, 2), (1, 3), (1, 4), (2, 20), (2, 30), (2, 40) ) AS t(id, value) GROUP BY id will succeed as it considers expression array[id, value] as constant.
However, after this fix, it will fail with error REDUCE_AGG only supports non-NULL literal as the initial value as array[id, value] is not considered as constant now.
cc @kaikalur

@Override
public Expression rewriteLambdaExpression(LambdaExpression node, Boolean context, ExpressionTreeRewriter<Boolean> treeRewriter)
{
Expression result = super.rewriteLambdaExpression(node, true, treeRewriter);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context set to true for lambda expression

}

@Test
public void testLambdaInAggregation()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two will fail without fix here

@feilong-liu feilong-liu marked this pull request as ready for review April 18, 2024 17:24
@feilong-liu feilong-liu requested a review from rschlussel April 18, 2024 17:43
assertQuery("SELECT id, reduce_agg(value, 0, (a, b) -> a + b+0, (a, b) -> a + b) FROM ( VALUES (1, 2), (1, 3), (1, 4), (2, 20), (2, 30), (2, 40) ) AS t(id, value) GROUP BY id", "values (1, 9), (2, 90)");
assertQuery("SELECT id, reduce_agg(value, 's', (a, b) -> concat(a, b, 's'), (a, b) -> concat(a, b, 's')) FROM ( VALUES (1, '2'), (1, '3'), (1, '4'), (2, '20'), (2, '30'), (2, '40') ) AS t(id, value) GROUP BY id",
"values (1, 's2s3s4s'), (2, 's20s30s40s')");
assertQueryFails("SELECT id, reduce_agg(value, array[id, value], (a, b) -> a || b, (a, b) -> a || b) FROM ( VALUES (1, 2), (1, 3), (1, 4), (2, 20), (2, 30), (2, 40) ) AS t(id, value) GROUP BY id",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the change of isConstant(Expression expression) function. Before change "array[id, value]" is considered constant and this query pass. Now it will throw exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feilong-liu explained to me that the reason we want this to fail is because semantically it doesn't make sense for the initial state of the lambda function to depend on the value of a column (because which row of the column is it even talking about? It wouldn't even be consistent within a query because on any given worker it would depend on which row it happened to read first)

@feilong-liu feilong-liu requested review from rschlussel and removed request for rschlussel April 18, 2024 20:03
@feilong-liu feilong-liu merged commit 71ad56f into prestodb:master Apr 19, 2024
@feilong-liu feilong-liu deleted the fix_lambda_agg branch April 19, 2024 16:53
@wanglinsong wanglinsong mentioned this pull request Jun 25, 2024
36 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants