fix: Revert #27199 Select Alias in Having Clause#27372
fix: Revert #27199 Select Alias in Having Clause#27372feilong-liu merged 1 commit intoprestodb:masterfrom
Conversation
Summary: This reverts the following PR GitHub Pull Request: prestodb#27199 Differential Revision: D97181706
Reviewer's guide (collapsed on small PRs)Reviewer's GuideReverts support for referencing SELECT output aliases in HAVING clauses and simplifies the OrderByExpressionRewriter to only apply to ORDER BY, restoring previous HAVING semantics and expectations. Class diagram for reverted HAVING alias handling in StatementAnalyzerclassDiagram
class StatementAnalyzer {
+Scope visitQuerySpecification(QuerySpecification node, Optional_Scope scope)
-void analyzeHaving(QuerySpecification node, Scope scope)
}
class QuerySpecification {
+Optional_Expression getHaving()
+Select getSelect()
}
class Analysis {
+void setOrderByExpressions(QuerySpecification node, List_Expression orderByExpressions)
+void setHaving(QuerySpecification node, Expression predicate)
+Expression getHaving(QuerySpecification node)
+void recordSubqueries(Node node, ExpressionAnalysis expressionAnalysis)
}
class ExpressionAnalysis {
+Type getType(Expression expression)
+Collection_Expression getWindowFunctions()
}
class OrderByExpressionRewriter {
-Multimap_QualifiedName_Expression assignments
+OrderByExpressionRewriter(Multimap_QualifiedName_Expression assignments)
+Expression rewriteIdentifier(Identifier reference, Void context, ExpressionTreeRewriter_RewriteContext_Void treeRewriter)
}
class SemanticException {
+SemanticException(SemanticErrorCode code, Node node, String messageFormat, Type predicateType)
}
class Identifier
class Expression
class Select
class Scope
class Type
class QualifiedName
class Multimap_QualifiedName_Expression
class ExpressionTreeRewriter_RewriteContext_Void
StatementAnalyzer --> Analysis : uses
StatementAnalyzer --> QuerySpecification : analyzes
StatementAnalyzer --> OrderByExpressionRewriter : uses
StatementAnalyzer --> ExpressionAnalysis : uses
OrderByExpressionRewriter --> Multimap_QualifiedName_Expression : has
OrderByExpressionRewriter --> Identifier : rewrites
ExpressionAnalysis --> Type : returns
%% Key reverted relationships
StatementAnalyzer ..> Expression : HAVING analyzed directly
Analysis ..> Expression : setHaving now stores original predicate
OrderByExpressionRewriter ..> Identifier : only for ORDER_BY ambiguity messages
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java" line_range="308-312" />
<code_context>
- analyze("SELECT sum(a) as sum_a FROM t1 GROUP BY b HAVING sum_a > 1");
- }
-
- @Test
- public void testHavingAmbiguousAlias()
- {
- // Ambiguous alias referenced in HAVING should throw appropriate error
- assertFails(AMBIGUOUS_ATTRIBUTE, "SELECT sum(a) AS x, count(b) AS x FROM t1 GROUP BY c HAVING x > 5");
- }
-
- @Test
- public void testHavingNonExistentAlias()
- {
- // Non-existent alias in HAVING should fail with MISSING_ATTRIBUTE
- assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) AS total FROM t1 GROUP BY b HAVING unknown_alias > 5");
- }
-
- @Test
- public void testHavingWindowFunctionViaAlias()
- {
- // Window functions are not allowed in HAVING, even when referenced via alias
- assertFails(NESTED_WINDOW, "SELECT row_number() OVER () AS rn FROM t1 GROUP BY b HAVING rn > 1");
+ assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) x FROM t1 HAVING x > 5");
}
</code_context>
<issue_to_address>
**suggestion (testing):** Add a couple more negative HAVING alias cases to fully cover reverted behavior
Given the revert, this now exercises only the simplest HAVING alias case. To better protect the reverted behavior, please add a couple more `assertFails(MISSING_ATTRIBUTE, ...)` cases mirroring the former positive tests, e.g.:
- `SELECT sum(a) AS total FROM t1 GROUP BY b HAVING total > 10`
- `SELECT count(*) AS cnt, sum(a) AS total FROM t1 GROUP BY b HAVING cnt > 5 AND total > 100`
so we catch any future reintroduction of alias resolution in HAVING.
```suggestion
@Test
public void testHavingReferencesOutputAlias()
{
assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) x FROM t1 HAVING x > 5");
assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) AS total FROM t1 GROUP BY b HAVING total > 10");
assertFails(MISSING_ATTRIBUTE, "SELECT count(*) AS cnt, sum(a) AS total FROM t1 GROUP BY b HAVING cnt > 5 AND total > 100");
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @Test | ||
| public void testHavingReferencesOutputAlias() | ||
| { | ||
| // HAVING now support referencing SELECT aliases for improved SQL compatibility | ||
| analyze("SELECT sum(a) x FROM t1 HAVING x > 5"); | ||
| analyze("SELECT sum(a) AS total FROM t1 GROUP BY b HAVING total > 10"); | ||
| analyze("SELECT count(*) AS cnt, sum(a) AS total FROM t1 GROUP BY b HAVING cnt > 5 AND total > 100"); | ||
| analyze("SELECT sum(a) as sum_a FROM t1 GROUP BY b HAVING sum_a > 1"); | ||
| } | ||
|
|
||
| @Test | ||
| public void testHavingAmbiguousAlias() | ||
| { | ||
| // Ambiguous alias referenced in HAVING should throw appropriate error | ||
| assertFails(AMBIGUOUS_ATTRIBUTE, "SELECT sum(a) AS x, count(b) AS x FROM t1 GROUP BY c HAVING x > 5"); | ||
| } | ||
|
|
||
| @Test | ||
| public void testHavingNonExistentAlias() | ||
| { | ||
| // Non-existent alias in HAVING should fail with MISSING_ATTRIBUTE | ||
| assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) AS total FROM t1 GROUP BY b HAVING unknown_alias > 5"); | ||
| } | ||
|
|
||
| @Test | ||
| public void testHavingWindowFunctionViaAlias() | ||
| { | ||
| // Window functions are not allowed in HAVING, even when referenced via alias | ||
| assertFails(NESTED_WINDOW, "SELECT row_number() OVER () AS rn FROM t1 GROUP BY b HAVING rn > 1"); | ||
| assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) x FROM t1 HAVING x > 5"); | ||
| } |
There was a problem hiding this comment.
suggestion (testing): Add a couple more negative HAVING alias cases to fully cover reverted behavior
Given the revert, this now exercises only the simplest HAVING alias case. To better protect the reverted behavior, please add a couple more assertFails(MISSING_ATTRIBUTE, ...) cases mirroring the former positive tests, e.g.:
SELECT sum(a) AS total FROM t1 GROUP BY b HAVING total > 10SELECT count(*) AS cnt, sum(a) AS total FROM t1 GROUP BY b HAVING cnt > 5 AND total > 100
so we catch any future reintroduction of alias resolution in HAVING.
| @Test | |
| public void testHavingReferencesOutputAlias() | |
| { | |
| // HAVING now support referencing SELECT aliases for improved SQL compatibility | |
| analyze("SELECT sum(a) x FROM t1 HAVING x > 5"); | |
| analyze("SELECT sum(a) AS total FROM t1 GROUP BY b HAVING total > 10"); | |
| analyze("SELECT count(*) AS cnt, sum(a) AS total FROM t1 GROUP BY b HAVING cnt > 5 AND total > 100"); | |
| analyze("SELECT sum(a) as sum_a FROM t1 GROUP BY b HAVING sum_a > 1"); | |
| } | |
| @Test | |
| public void testHavingAmbiguousAlias() | |
| { | |
| // Ambiguous alias referenced in HAVING should throw appropriate error | |
| assertFails(AMBIGUOUS_ATTRIBUTE, "SELECT sum(a) AS x, count(b) AS x FROM t1 GROUP BY c HAVING x > 5"); | |
| } | |
| @Test | |
| public void testHavingNonExistentAlias() | |
| { | |
| // Non-existent alias in HAVING should fail with MISSING_ATTRIBUTE | |
| assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) AS total FROM t1 GROUP BY b HAVING unknown_alias > 5"); | |
| } | |
| @Test | |
| public void testHavingWindowFunctionViaAlias() | |
| { | |
| // Window functions are not allowed in HAVING, even when referenced via alias | |
| assertFails(NESTED_WINDOW, "SELECT row_number() OVER () AS rn FROM t1 GROUP BY b HAVING rn > 1"); | |
| assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) x FROM t1 HAVING x > 5"); | |
| } | |
| @Test | |
| public void testHavingReferencesOutputAlias() | |
| { | |
| assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) x FROM t1 HAVING x > 5"); | |
| assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) AS total FROM t1 GROUP BY b HAVING total > 10"); | |
| assertFails(MISSING_ATTRIBUTE, "SELECT count(*) AS cnt, sum(a) AS total FROM t1 GROUP BY b HAVING cnt > 5 AND total > 100"); | |
| } |
|
@mehradpk please take a look. I will be reverting your change due to it breaking existing query shapes |
## Summary: This reverts the following PR GitHub Pull Request: #27199 GitHub Author: Deepak Mehra mehradeeeepak@gmail.com Meta internal revision:D96106074 Fully reverts the HAVING alias expansion feature, which threw 7 failed queries (2 different errors) on meta internal test suites. I have verified that these errors are all caused by this change by attempting to fix them and testing that the revert clears the errors. ## Errors: 1. java.sql.SQLException: Query failed (#20260316_083518_04833_pwriw): type of variable 'expr_189' is expected 2. to be varchar(25), but the actual type is varchar java.sql.SQLException: Query failed (#20260316_083713_05663_pwriw): Cannot nest aggregations inside aggregation 'sum': ["sum"(CTX_pe_revenue)] ## Root cause: The PR added SELECT alias resolution in HAVING by rewriting the HAVING predicate with OrderByExpressionRewriter. This caused two bugs: 1. Type mismatches (5 failures): (AI reasoning) When a column name matches a SELECT alias (e.g., SELECT CASE...END AS category), the rewriter expands the column reference in HAVING to the full CASE expression, changing its type from varchar(25) to unbounded varchar. QueryPlanner.filter() uses the rewritten expression via analysis.getHaving(), creating plan nodes with mismatched types that TypeValidator rejects. 2. Nested aggregation (1 failure): (confirmed) When a SELECT alias references an aggregation and HAVING uses that alias inside another aggregation (e.g., HAVING sum(total) where total = sum(x)), expansion creates invalid nested aggregations sum(sum(x)) 3. bigint vs double (1 failure): Same mechanism as issue 1 but with numeric type coercion differences. ## Fix: I tried to fix these issues, but only managed to clear the error for #2. Reverting the change due to these errors. According to AI: The HAVING alias resolution feature needs a different implementation that doesn't change expression types in the plan — likely by resolving aliases during planning rather than during analysis. ## Reproduction: I can't share the reproducing queries since they reference meta internal data, but these should give you an idea of how to reproduce the issues Bug 1: Type mismatch (varchar(25) vs varchar) ``` -- The SELECT aliases `category` to a CASE expression returning varchar strings. -- HAVING references `category` which the rewriter expands to the CASE expression, -- changing its type from the subquery's varchar(25) to unbounded varchar. -- Triggers: "type of variable 'expr_N' is expected to be varchar(25), but the actual type is varchar" WITH data AS ( SELECT CAST('cpu' AS varchar(25)) AS category, true AS is_cpu, 100.0 AS value UNION ALL SELECT CAST('gpu' AS varchar(25)), false, 200.0 ) SELECT (CASE WHEN COALESCE(category, CAST(is_cpu AS varchar)) IS NULL THEN 'total' WHEN is_cpu THEN 'cpu_total' ELSE category END) category, sum(value) AS total_value FROM data GROUP BY GROUPING SETS ((), (category), (is_cpu)) HAVING (CASE WHEN COALESCE(category, CAST(is_cpu AS varchar)) IS NULL THEN 'total' WHEN is_cpu THEN 'cpu_total' ELSE category END) IS NOT NULL ``` Bug 2: Nested aggregation (SYNTAX_ERROR) ``` -- SELECT defines `total` as sum(revenue). HAVING uses sum(total). -- The rewriter expands `total` to `sum(revenue)`, creating sum(sum(revenue)). -- Triggers: "Cannot nest aggregations inside aggregation 'sum': ["sum"(revenue)]" SELECT region, sum(revenue) AS total FROM ( VALUES ('us', 100), ('eu', 200), ('us', 150), ('eu', 50) ) AS t(region, revenue) GROUP BY region HAVING sum(total) > 100 ``` Differential Revision: D97181706 ## Summary by Sourcery Revert support for resolving SELECT output aliases in HAVING clauses and restore previous validation behavior. Enhancements: - Simplify OrderByExpressionRewriter by removing clause-specific error context now that it is only used for ORDER BY. Tests: - Update analyzer tests to again treat alias references in HAVING as invalid and remove related positive and error-coverage cases. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == RELEASE NOTES == General Changes * Fix 2 bugs caused by Select Alias references in Having clause Co-authored-by: mehradeeeepak@gmail.com <mehradeeeepak@gmail.com>
Summary:
This reverts the following PR
GitHub Pull Request: #27199
GitHub Author: Deepak Mehra mehradeeeepak@gmail.com
Meta internal revision:D96106074
Fully reverts the HAVING alias expansion feature, which threw
7 failed queries (2 different errors) on meta internal test suites. I have verified that these errors are all caused by this change by attempting to fix them and testing that the revert clears the errors.
Errors:
java.sql.SQLException: Query failed (#20260316_083713_05663_pwriw): Cannot nest aggregations inside aggregation 'sum': ["sum"(CTX_pe_revenue)]
Root cause:
The PR added SELECT alias resolution in HAVING by rewriting
the HAVING predicate with OrderByExpressionRewriter. This caused two bugs:
(e.g., SELECT CASE...END AS category), the rewriter expands the column
reference in HAVING to the full CASE expression, changing its type from
varchar(25) to unbounded varchar. QueryPlanner.filter() uses the
rewritten expression via analysis.getHaving(), creating plan nodes with
mismatched types that TypeValidator rejects.
aggregation and HAVING uses that alias inside another aggregation
(e.g., HAVING sum(total) where total = sum(x)), expansion creates
invalid nested aggregations sum(sum(x))
type coercion differences.
Fix:
I tried to fix these issues, but only managed to clear the error for #2. Reverting the change due to these errors.
According to AI:
The HAVING alias resolution feature needs a different implementation that
doesn't change expression types in the plan — likely by resolving aliases
during planning rather than during analysis.
Reproduction:
I can't share the reproducing queries since they reference meta internal data, but these should give you an idea of how to reproduce the issues
Bug 1: Type mismatch (varchar(25) vs varchar)
Bug 2: Nested aggregation (SYNTAX_ERROR)
Differential Revision: D97181706
Summary by Sourcery
Revert support for resolving SELECT output aliases in HAVING clauses and restore previous validation behavior.
Enhancements:
Tests:
Release Notes
Please follow release notes guidelines and fill in the release notes below.