Skip to content

fix: Revert #27199 Select Alias in Having Clause#27372

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
shelton408:export-D97181706
Mar 19, 2026
Merged

fix: Revert #27199 Select Alias in Having Clause#27372
feilong-liu merged 1 commit intoprestodb:masterfrom
shelton408:export-D97181706

Conversation

@shelton408
Copy link
Copy Markdown
Contributor

@shelton408 shelton408 commented Mar 19, 2026

Summary:

This reverts the following PR
GitHub Pull Request: #27199
GitHub Author: Deepak Mehra mehradeeeepak@gmail.com
Meta internal revision:D96106074

Fully reverts the HAVING alias expansion feature, which threw
7 failed queries (2 different errors) on meta internal test suites. I have verified that these errors are all caused by this change by attempting to fix them and testing that the revert clears the errors.

Errors:

  1. java.sql.SQLException: Query failed (#20260316_083518_04833_pwriw): type of variable 'expr_189' is expected 2. to be varchar(25), but the actual type is varchar
    java.sql.SQLException: Query failed (#20260316_083713_05663_pwriw): Cannot nest aggregations inside aggregation 'sum': ["sum"(CTX_pe_revenue)]

Root cause:

The PR added SELECT alias resolution in HAVING by rewriting
the HAVING predicate with OrderByExpressionRewriter. This caused two bugs:

  1. Type mismatches (5 failures): (AI reasoning) When a column name matches a SELECT alias
    (e.g., SELECT CASE...END AS category), the rewriter expands the column
    reference in HAVING to the full CASE expression, changing its type from
    varchar(25) to unbounded varchar. QueryPlanner.filter() uses the
    rewritten expression via analysis.getHaving(), creating plan nodes with
    mismatched types that TypeValidator rejects.
  2. Nested aggregation (1 failure): (confirmed) When a SELECT alias references an
    aggregation and HAVING uses that alias inside another aggregation
    (e.g., HAVING sum(total) where total = sum(x)), expansion creates
    invalid nested aggregations sum(sum(x))
  3. bigint vs double (1 failure): Same mechanism as issue 1 but with numeric
    type coercion differences.

Fix:

I tried to fix these issues, but only managed to clear the error for #2. Reverting the change due to these errors.
According to AI:
The HAVING alias resolution feature needs a different implementation that
doesn't change expression types in the plan — likely by resolving aliases
during planning rather than during analysis.

Reproduction:

I can't share the reproducing queries since they reference meta internal data, but these should give you an idea of how to reproduce the issues

Bug 1: Type mismatch (varchar(25) vs varchar)

  -- The SELECT aliases `category` to a CASE expression returning varchar strings.
  -- HAVING references `category` which the rewriter expands to the CASE expression,
  -- changing its type from the subquery's varchar(25) to unbounded varchar.
  -- Triggers: "type of variable 'expr_N' is expected to be varchar(25), but the actual type is varchar"

  WITH data AS (
      SELECT
          CAST('cpu' AS varchar(25)) AS category,
          true AS is_cpu,
          100.0 AS value
      UNION ALL
      SELECT CAST('gpu' AS varchar(25)), false, 200.0
  )
  SELECT
      (CASE
          WHEN COALESCE(category, CAST(is_cpu AS varchar)) IS NULL THEN 'total'
          WHEN is_cpu THEN 'cpu_total'
          ELSE category
      END) category,
      sum(value) AS total_value
  FROM data
  GROUP BY GROUPING SETS ((), (category), (is_cpu))
  HAVING (CASE
      WHEN COALESCE(category, CAST(is_cpu AS varchar)) IS NULL THEN 'total'
      WHEN is_cpu THEN 'cpu_total'
      ELSE category
  END) IS NOT NULL

Bug 2: Nested aggregation (SYNTAX_ERROR)

  -- SELECT defines `total` as sum(revenue). HAVING uses sum(total).
  -- The rewriter expands `total` to `sum(revenue)`, creating sum(sum(revenue)).
  -- Triggers: "Cannot nest aggregations inside aggregation 'sum': ["sum"(revenue)]"

  SELECT
      region,
      sum(revenue) AS total
  FROM (
      VALUES ('us', 100), ('eu', 200), ('us', 150), ('eu', 50)
  ) AS t(region, revenue)
  GROUP BY region
  HAVING sum(total) > 100

Differential Revision: D97181706

Summary by Sourcery

Revert support for resolving SELECT output aliases in HAVING clauses and restore previous validation behavior.

Enhancements:

  • Simplify OrderByExpressionRewriter by removing clause-specific error context now that it is only used for ORDER BY.

Tests:

  • Update analyzer tests to again treat alias references in HAVING as invalid and remove related positive and error-coverage cases.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fix 2 bugs caused by Select Alias references in Having clause

Summary:
This reverts the following PR
GitHub Pull Request: prestodb#27199

Differential Revision: D97181706
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 19, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Reverts support for referencing SELECT output aliases in HAVING clauses and simplifies the OrderByExpressionRewriter to only apply to ORDER BY, restoring previous HAVING semantics and expectations.

Class diagram for reverted HAVING alias handling in StatementAnalyzer

classDiagram
    class StatementAnalyzer {
        +Scope visitQuerySpecification(QuerySpecification node, Optional_Scope scope)
        -void analyzeHaving(QuerySpecification node, Scope scope)
    }

    class QuerySpecification {
        +Optional_Expression getHaving()
        +Select getSelect()
    }

    class Analysis {
        +void setOrderByExpressions(QuerySpecification node, List_Expression orderByExpressions)
        +void setHaving(QuerySpecification node, Expression predicate)
        +Expression getHaving(QuerySpecification node)
        +void recordSubqueries(Node node, ExpressionAnalysis expressionAnalysis)
    }

    class ExpressionAnalysis {
        +Type getType(Expression expression)
        +Collection_Expression getWindowFunctions()
    }

    class OrderByExpressionRewriter {
        -Multimap_QualifiedName_Expression assignments
        +OrderByExpressionRewriter(Multimap_QualifiedName_Expression assignments)
        +Expression rewriteIdentifier(Identifier reference, Void context, ExpressionTreeRewriter_RewriteContext_Void treeRewriter)
    }

    class SemanticException {
        +SemanticException(SemanticErrorCode code, Node node, String messageFormat, Type predicateType)
    }

    class Identifier
    class Expression
    class Select
    class Scope
    class Type
    class QualifiedName
    class Multimap_QualifiedName_Expression
    class ExpressionTreeRewriter_RewriteContext_Void

    StatementAnalyzer --> Analysis : uses
    StatementAnalyzer --> QuerySpecification : analyzes
    StatementAnalyzer --> OrderByExpressionRewriter : uses
    StatementAnalyzer --> ExpressionAnalysis : uses
    OrderByExpressionRewriter --> Multimap_QualifiedName_Expression : has
    OrderByExpressionRewriter --> Identifier : rewrites
    ExpressionAnalysis --> Type : returns

    %% Key reverted relationships
    StatementAnalyzer ..> Expression : HAVING analyzed directly
    Analysis ..> Expression : setHaving now stores original predicate
    OrderByExpressionRewriter ..> Identifier : only for ORDER_BY ambiguity messages
Loading

File-Level Changes

Change Details Files
Restore original HAVING analysis behavior to ignore SELECT aliases and treat HAVING expressions as written.
  • Stop rewriting HAVING predicates through OrderByExpressionRewriter before analysis.
  • Analyze the original HAVING expression directly when computing expression types and validations.
  • Record the original HAVING expression in Analysis instead of the rewritten one.
  • Use the original HAVING expression node when validating that the predicate is BOOLEAN or UNKNOWN and when raising semantic exceptions.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
Simplify OrderByExpressionRewriter to be ORDER BY–specific only.
  • Remove the clauseName field and the overloaded constructor that accepted a clause name.
  • Hard-code the semantic error message for ambiguous identifiers to refer specifically to ORDER BY.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
Remove test coverage for HAVING alias support and restore prior failing behavior expectations.
  • Delete tests that assert HAVING can reference SELECT aliases, and related ambiguity/non-existent alias/window-function via alias cases.
  • Reintroduce a single test expecting MISSING_ATTRIBUTE when HAVING references a SELECT alias, matching pre-aliased behavior.
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java" line_range="308-312" />
<code_context>
-        analyze("SELECT sum(a) as sum_a FROM t1 GROUP BY b HAVING sum_a > 1");
-    }
-
-    @Test
-    public void testHavingAmbiguousAlias()
-    {
-        // Ambiguous alias referenced in HAVING should throw appropriate error
-        assertFails(AMBIGUOUS_ATTRIBUTE, "SELECT sum(a) AS x, count(b) AS x FROM t1 GROUP BY c HAVING x > 5");
-    }
-
-    @Test
-    public void testHavingNonExistentAlias()
-    {
-        // Non-existent alias in HAVING should fail with MISSING_ATTRIBUTE
-        assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) AS total FROM t1 GROUP BY b HAVING unknown_alias > 5");
-    }
-
-    @Test
-    public void testHavingWindowFunctionViaAlias()
-    {
-        // Window functions are not allowed in HAVING, even when referenced via alias
-        assertFails(NESTED_WINDOW, "SELECT row_number() OVER () AS rn FROM t1 GROUP BY b HAVING rn > 1");
+        assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) x FROM t1 HAVING x > 5");
     }

</code_context>
<issue_to_address>
**suggestion (testing):** Add a couple more negative HAVING alias cases to fully cover reverted behavior

Given the revert, this now exercises only the simplest HAVING alias case. To better protect the reverted behavior, please add a couple more `assertFails(MISSING_ATTRIBUTE, ...)` cases mirroring the former positive tests, e.g.:
- `SELECT sum(a) AS total FROM t1 GROUP BY b HAVING total > 10`
- `SELECT count(*) AS cnt, sum(a) AS total FROM t1 GROUP BY b HAVING cnt > 5 AND total > 100`
so we catch any future reintroduction of alias resolution in HAVING.

```suggestion
    @Test
    public void testHavingReferencesOutputAlias()
    {
        assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) x FROM t1 HAVING x > 5");
        assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) AS total FROM t1 GROUP BY b HAVING total > 10");
        assertFails(MISSING_ATTRIBUTE, "SELECT count(*) AS cnt, sum(a) AS total FROM t1 GROUP BY b HAVING cnt > 5 AND total > 100");
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 308 to 312
@Test
public void testHavingReferencesOutputAlias()
{
// HAVING now support referencing SELECT aliases for improved SQL compatibility
analyze("SELECT sum(a) x FROM t1 HAVING x > 5");
analyze("SELECT sum(a) AS total FROM t1 GROUP BY b HAVING total > 10");
analyze("SELECT count(*) AS cnt, sum(a) AS total FROM t1 GROUP BY b HAVING cnt > 5 AND total > 100");
analyze("SELECT sum(a) as sum_a FROM t1 GROUP BY b HAVING sum_a > 1");
}

@Test
public void testHavingAmbiguousAlias()
{
// Ambiguous alias referenced in HAVING should throw appropriate error
assertFails(AMBIGUOUS_ATTRIBUTE, "SELECT sum(a) AS x, count(b) AS x FROM t1 GROUP BY c HAVING x > 5");
}

@Test
public void testHavingNonExistentAlias()
{
// Non-existent alias in HAVING should fail with MISSING_ATTRIBUTE
assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) AS total FROM t1 GROUP BY b HAVING unknown_alias > 5");
}

@Test
public void testHavingWindowFunctionViaAlias()
{
// Window functions are not allowed in HAVING, even when referenced via alias
assertFails(NESTED_WINDOW, "SELECT row_number() OVER () AS rn FROM t1 GROUP BY b HAVING rn > 1");
assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) x FROM t1 HAVING x > 5");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a couple more negative HAVING alias cases to fully cover reverted behavior

Given the revert, this now exercises only the simplest HAVING alias case. To better protect the reverted behavior, please add a couple more assertFails(MISSING_ATTRIBUTE, ...) cases mirroring the former positive tests, e.g.:

  • SELECT sum(a) AS total FROM t1 GROUP BY b HAVING total > 10
  • SELECT count(*) AS cnt, sum(a) AS total FROM t1 GROUP BY b HAVING cnt > 5 AND total > 100
    so we catch any future reintroduction of alias resolution in HAVING.
Suggested change
@Test
public void testHavingReferencesOutputAlias()
{
// HAVING now support referencing SELECT aliases for improved SQL compatibility
analyze("SELECT sum(a) x FROM t1 HAVING x > 5");
analyze("SELECT sum(a) AS total FROM t1 GROUP BY b HAVING total > 10");
analyze("SELECT count(*) AS cnt, sum(a) AS total FROM t1 GROUP BY b HAVING cnt > 5 AND total > 100");
analyze("SELECT sum(a) as sum_a FROM t1 GROUP BY b HAVING sum_a > 1");
}
@Test
public void testHavingAmbiguousAlias()
{
// Ambiguous alias referenced in HAVING should throw appropriate error
assertFails(AMBIGUOUS_ATTRIBUTE, "SELECT sum(a) AS x, count(b) AS x FROM t1 GROUP BY c HAVING x > 5");
}
@Test
public void testHavingNonExistentAlias()
{
// Non-existent alias in HAVING should fail with MISSING_ATTRIBUTE
assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) AS total FROM t1 GROUP BY b HAVING unknown_alias > 5");
}
@Test
public void testHavingWindowFunctionViaAlias()
{
// Window functions are not allowed in HAVING, even when referenced via alias
assertFails(NESTED_WINDOW, "SELECT row_number() OVER () AS rn FROM t1 GROUP BY b HAVING rn > 1");
assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) x FROM t1 HAVING x > 5");
}
@Test
public void testHavingReferencesOutputAlias()
{
assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) x FROM t1 HAVING x > 5");
assertFails(MISSING_ATTRIBUTE, "SELECT sum(a) AS total FROM t1 GROUP BY b HAVING total > 10");
assertFails(MISSING_ATTRIBUTE, "SELECT count(*) AS cnt, sum(a) AS total FROM t1 GROUP BY b HAVING cnt > 5 AND total > 100");
}

@shelton408 shelton408 changed the title Revert D96106074 fix: Revert D96106074 Mar 19, 2026
@shelton408 shelton408 changed the title fix: Revert D96106074 fix: Revert #27199 Select Alias in Having Clause Mar 19, 2026
@shelton408
Copy link
Copy Markdown
Contributor Author

@mehradpk please take a look. I will be reverting your change due to it breaking existing query shapes

@feilong-liu feilong-liu merged commit 3633703 into prestodb:master Mar 19, 2026
114 of 135 checks passed
shelton408 added a commit that referenced this pull request Mar 20, 2026
## Summary:
This reverts the following PR
GitHub Pull Request: #27199
GitHub Author: Deepak Mehra mehradeeeepak@gmail.com
Meta internal revision:D96106074

Fully reverts the HAVING alias expansion feature, which threw
7 failed queries (2 different errors) on meta internal test suites. I
have verified that these errors are all caused by this change by
attempting to fix them and testing that the revert clears the errors.

## Errors:
1. java.sql.SQLException: Query failed (#20260316_083518_04833_pwriw):
type of variable 'expr_189' is expected 2. to be varchar(25), but the
actual type is varchar
java.sql.SQLException: Query failed (#20260316_083713_05663_pwriw):
Cannot nest aggregations inside aggregation 'sum':
["sum"(CTX_pe_revenue)]

## Root cause:
The PR added SELECT alias resolution in HAVING by rewriting
the HAVING predicate with OrderByExpressionRewriter. This caused two
bugs:
1. Type mismatches (5 failures): (AI reasoning) When a column name
matches a SELECT alias
(e.g., SELECT CASE...END AS category), the rewriter expands the column
reference in HAVING to the full CASE expression, changing its type from
varchar(25) to unbounded varchar. QueryPlanner.filter() uses the
rewritten expression via analysis.getHaving(), creating plan nodes with
mismatched types that TypeValidator rejects.
2. Nested aggregation (1 failure): (confirmed) When a SELECT alias
references an
aggregation and HAVING uses that alias inside another aggregation
(e.g., HAVING sum(total) where total = sum(x)), expansion creates
invalid nested aggregations sum(sum(x))
3. bigint vs double (1 failure): Same mechanism as issue 1 but with
numeric
type coercion differences.

## Fix:
I tried to fix these issues, but only managed to clear the error for #2.
Reverting the change due to these errors.
According to AI:
The HAVING alias resolution feature needs a different implementation
that
doesn't change expression types in the plan — likely by resolving
aliases
during planning rather than during analysis.

## Reproduction:
I can't share the reproducing queries since they reference meta internal
data, but these should give you an idea of how to reproduce the issues

Bug 1: Type mismatch (varchar(25) vs varchar)

```
  -- The SELECT aliases `category` to a CASE expression returning varchar strings.
  -- HAVING references `category` which the rewriter expands to the CASE expression,
  -- changing its type from the subquery's varchar(25) to unbounded varchar.
  -- Triggers: "type of variable 'expr_N' is expected to be varchar(25), but the actual type is varchar"

  WITH data AS (
      SELECT
          CAST('cpu' AS varchar(25)) AS category,
          true AS is_cpu,
          100.0 AS value
      UNION ALL
      SELECT CAST('gpu' AS varchar(25)), false, 200.0
  )
  SELECT
      (CASE
          WHEN COALESCE(category, CAST(is_cpu AS varchar)) IS NULL THEN 'total'
          WHEN is_cpu THEN 'cpu_total'
          ELSE category
      END) category,
      sum(value) AS total_value
  FROM data
  GROUP BY GROUPING SETS ((), (category), (is_cpu))
  HAVING (CASE
      WHEN COALESCE(category, CAST(is_cpu AS varchar)) IS NULL THEN 'total'
      WHEN is_cpu THEN 'cpu_total'
      ELSE category
  END) IS NOT NULL
```

  Bug 2: Nested aggregation (SYNTAX_ERROR)

```
  -- SELECT defines `total` as sum(revenue). HAVING uses sum(total).
  -- The rewriter expands `total` to `sum(revenue)`, creating sum(sum(revenue)).
  -- Triggers: "Cannot nest aggregations inside aggregation 'sum': ["sum"(revenue)]"

  SELECT
      region,
      sum(revenue) AS total
  FROM (
      VALUES ('us', 100), ('eu', 200), ('us', 150), ('eu', 50)
  ) AS t(region, revenue)
  GROUP BY region
  HAVING sum(total) > 100
```

Differential Revision: D97181706

## Summary by Sourcery

Revert support for resolving SELECT output aliases in HAVING clauses and
restore previous validation behavior.

Enhancements:
- Simplify OrderByExpressionRewriter by removing clause-specific error
context now that it is only used for ORDER BY.

Tests:
- Update analyzer tests to again treat alias references in HAVING as
invalid and remove related positive and error-coverage cases.


## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

General Changes
* Fix 2 bugs caused by Select Alias references in Having clause

Co-authored-by: mehradeeeepak@gmail.com <mehradeeeepak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants