Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop referring to ungrouped columns from outer query in subquery #27266

Closed
roji opened this issue Jan 24, 2022 · 14 comments · Fixed by #27931
Closed

Stop referring to ungrouped columns from outer query in subquery #27266

roji opened this issue Jan 24, 2022 · 14 comments · Fixed by #27931
Assignees
Labels
area-query area-test closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-bug
Milestone

Comments

@roji
Copy link
Member

roji commented Jan 24, 2022

PostgreSQL seems to be particularly strict around subqueries and GROUP BY. The following query (full repro below):

from s in (from i in ctx.Invoices
        group i by i.History.Month
        into g
        select new
        {
            Month = g.Key,
            Total = g.Sum(p => p.Amount),
        })
    select new
    {
        s.Month,
        s.Total,
        Payment = ctx.Payments.Where(p => p.History.Month == s.Month).Sum(p => p.Amount)
    }

... generates the following SQL on PostgreSQL:

SELECT date_part('month', i."History")::INT AS "Month", COALESCE(SUM(i."Amount"), 0)::INT AS "Total", (
    SELECT COALESCE(SUM(p."Amount"), 0)::INT
    FROM "Payments" AS p
    WHERE date_part('month', p."History")::INT = date_part('month', i."History")::INT) AS "Payment"
FROM "Invoices" AS i
GROUP BY date_part('month', i."History")::INT

... and errors with the following: subquery uses ungrouped column "i.History" from outer query.

The same works fine on SQL Server, SQLite and MariaDB.

Note also that the current SQL is less efficient even when supported, since the same expression gets evaluated more than once (see #27266 (comment)).

See suggested alternative SQL by @mojtabakaviani in npgsql/efcore.pg#2236 (comment).

Full repro
await using var ctx = new BlogContext();
await ctx.Database.EnsureDeletedAsync();
await ctx.Database.EnsureCreatedAsync();

var query = from s in (from i in ctx.Invoices
        group i by i.History.Month
        into g
        select new
        {
            Month = g.Key,
            Total = g.Sum(p => p.Amount),
        })
    select new
    {
        s.Month,
        s.Total,
        Payment = ctx.Payments.Where(p => p.History.Month == s.Month).Sum(p => p.Amount)
    };

_ = query.ToList();

public class BlogContext : DbContext
{
    public DbSet<Invoice> Invoices { get; set; }
    public DbSet<Payment> Payments { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            // .UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0")
            .UseNpgsql(@"Host=localhost;Username=test;Password=test")
            // .UseSqlite("Data Source=/tmp/foo.sql")
            // .UseMySql("Server=localhost;user=test;password=test;database=test", new MariaDbServerVersion("10.5"))
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();
}

public class Invoice
{
    public int Id { get; set; }
    public int Amount { get; set; }

    public DateTime History { get; set; }
}

public class Payment
{
    public int Id { get; set; }
    public int Amount { get; set; }

    public DateTime History { get; set; }
}

Raised by @mojtabakaviani in npgsql/efcore.pg#2236

@smitpatel
Copy link
Member

Cannot really do much, probably something which is just unsupported.

@mojtabakaviani
Copy link

Yes, but if EF Core avoid convert query to join work properly.

@smitpatel
Copy link
Member

There is no conversion to join here.

@mojtabakaviani
Copy link

@smitpatel I mean, converting nested query to scalar result, for this example.

@smitpatel
Copy link
Member

But there is no conversion EF Core performs here. It is nested query which happens to be scalar result. Every scalar results are same regardless of where they come from.

@roji
Copy link
Member Author

roji commented Jan 26, 2022

Note @mojtabakaviani alternate SQL proposal in npgsql/efcore.pg#2236:

SELECT i0."Month", i0."Total", (
    SELECT COALESCE(SUM(p.amount), 0.0) FROM payments AS p
    WHERE date_part('month', p.history)::INT = i0."Month"::INT) AS "Payment"
FROM (
    SELECT date_part('month', i.history)::INT AS "Month", COALESCE(SUM(i.amount), 0.0) AS "Total"
    FROM invoices AS i
    GROUP BY date_part('month', i.history)::INT) AS i0;

Note that this is also better from a perf standpoint than what we currently produce (for any database), as the month is calculated only once rather than three times in the current query. If the GROUP BY key is a complex function or even a subquery, that could be very significant (see the similarity with #27285).

@mojtabakaviani
Copy link

Thanks @roji for proposal, I test and work in the both SQL Server and Postgresql. also solve nested query issue #27267

@roji
Copy link
Member Author

roji commented Jan 26, 2022

@mojtabakaviani I ended up removing the WITH alternative since it doesn't really add anything that a simple subquery doesn't already accomplish (as in your original proposal).

@ajcvickers ajcvickers added this to the 7.0.0 milestone Jan 26, 2022
@ajcvickers
Copy link
Member

Note from triage: add tests for this case.

@roji
Copy link
Member Author

roji commented Jan 26, 2022

@ajcvickers note that this has what could be an important perf aspect as well (multiple function/subquery evaluation which could be avoided), so this isn't just about finding a solution for PG. We could add a test here, and have another issue tracking changing the SQL we generate (which would both help perf and make the query work on PG).

@smitpatel
Copy link
Member

Pending selector work covers that already.

@ajcvickers ajcvickers removed this from the 7.0.0 milestone Jan 26, 2022
@roji
Copy link
Member Author

roji commented Jan 26, 2022

Great! In that case I guess this can just remind us to add the test.

@ajcvickers ajcvickers added this to the 7.0.0 milestone Jan 28, 2022
@roji roji changed the title PostgreSQL disallows using ungrouped columns from outer query in subquery Stop referring to ungrouped columns from outer query in subquery Feb 3, 2022
@smitpatel smitpatel assigned smitpatel and unassigned maumar Apr 27, 2022
@smitpatel
Copy link
Member

Poaching this. Pending selector work is on hold now. #27433 has same issue for SqlServer in different scenario. Also #27132 work detected that we may not be able to lift always causing subquery. An ideal fix here would be to basically pushdown into subquery when grouping key is complex to avoid issues generated by subquery having correlation predicate with grouping key. Does that sound good to you @roji ?

@roji
Copy link
Member Author

roji commented Apr 27, 2022

I think so - that would mean the key expression would be present only once in the SQL, instead of twice (as above), right? Sounds great.

@smitpatel smitpatel added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label May 2, 2022
smitpatel added a commit that referenced this issue May 2, 2022
Design:
- Introduce `SqlEnumerableExpression` - a holder class which indicates the `SqlExpression` is in form of a enumerable (or group) coming as a result of whole table selection or a grouping element. It also stores details about if `Distinct` is applied over grouping or if there are any orderings.
- Due to above `DistinctExpression` has been removed. The token while used to denote `Distinct` over grouping element were not valid in other parts of SQL tree hence it makes more sense to combine it with `SqlEnumerableExpression`.
- To support dual pass, `GroupByShaperExpression` contains 2 forms of grouping element. One element selector form which correlates directly with the parent grouped query, second subquery form which correlates to parent grouped query through a correlation predicate. Element selector is first used to translate aggregation. If that fails we use subquery form to translate as a subquery. Due to 2 forms of same component, GroupByShaperExpression disallows calling into VisitChildren method, any visitor which is visiting a tree containing GroupByShaperExpression (which appears only in `QueryExpression.ShaperExpression` or LINQ expression after remapping but before translation) must intercept the tree and either ignore or process it appropriately.
- An internal visitor (`GroupByAggregateChainProcessor`) inside SqlTranslator visits and process chain of queryable operations on a grouping element before aggregate is called and condense it into `SqlEnumerableExpression` which is then passed to method which translates aggregate. This visitor only processes Where/Distinct/Select for now. Future PR will add processing for OrderBy/ThenBy(Descending) operations to generate orderings.
- Side-effect above is that joins expanded over the grouping element (due to navigations used on aggregate chain), doesn't translate to aggregate anymore since we need to translate the join on parent query, remove the translated join if the chain didn't end in aggregate and also de-dupe same joins. Filing issue to improve this in future. Due to fragile nature of matching to lift the join, we shouldn't try to lift joins.
- To support custom aggregate operations, we will either reused `IMethodCallTranslator` or create a parallel structure for aggregate methods and call into it from SqlTranslator by passing translated SqlEnumerableExpression as appropriate.
- For complex grouping key, we cause a pushdown so that we can reference the grouping key through columns only. This allows us to reference the grouping key in correlation predicate for subquery without generating invalid SQL in many cases.
- With complex grouping key converting to columns, now we are able to correctly generate identifiers for grouping queries which makes more queries with correlated collections (where either parent or inner both queries can be groupby query) translatable.

Resolves #27132
Introduce a pass to translate grouping element chain ending in aggregate before translating it as a subquery.

Resolves #27266
Resolves #27433
Cause a pushdown when grouping key is complex so that SQL parser doesn't throw error that ungrouped columns are being referenced.

Relates to #22957
smitpatel added a commit that referenced this issue May 3, 2022
Design:
- Introduce `SqlEnumerableExpression` - a holder class which indicates the `SqlExpression` is in form of a enumerable (or group) coming as a result of whole table selection or a grouping element. It also stores details about if `Distinct` is applied over grouping or if there are any orderings.
- Due to above `DistinctExpression` has been removed. The token while used to denote `Distinct` over grouping element were not valid in other parts of SQL tree hence it makes more sense to combine it with `SqlEnumerableExpression`.
- To support dual pass, `GroupByShaperExpression` contains 2 forms of grouping element. One element selector form which correlates directly with the parent grouped query, second subquery form which correlates to parent grouped query through a correlation predicate. Element selector is first used to translate aggregation. If that fails we use subquery form to translate as a subquery. Due to 2 forms of same component, GroupByShaperExpression disallows calling into VisitChildren method, any visitor which is visiting a tree containing GroupByShaperExpression (which appears only in `QueryExpression.ShaperExpression` or LINQ expression after remapping but before translation) must intercept the tree and either ignore or process it appropriately.
- An internal visitor (`GroupByAggregateChainProcessor`) inside SqlTranslator visits and process chain of queryable operations on a grouping element before aggregate is called and condense it into `SqlEnumerableExpression` which is then passed to method which translates aggregate. This visitor only processes Where/Distinct/Select for now. Future PR will add processing for OrderBy/ThenBy(Descending) operations to generate orderings.
- Side-effect above is that joins expanded over the grouping element (due to navigations used on aggregate chain), doesn't translate to aggregate anymore since we need to translate the join on parent query, remove the translated join if the chain didn't end in aggregate and also de-dupe same joins. Filing issue to improve this in future. Due to fragile nature of matching to lift the join, we shouldn't try to lift joins.
- To support custom aggregate operations, we will either reused `IMethodCallTranslator` or create a parallel structure for aggregate methods and call into it from SqlTranslator by passing translated SqlEnumerableExpression as appropriate.
- For complex grouping key, we cause a pushdown so that we can reference the grouping key through columns only. This allows us to reference the grouping key in correlation predicate for subquery without generating invalid SQL in many cases.
- With complex grouping key converting to columns, now we are able to correctly generate identifiers for grouping queries which makes more queries with correlated collections (where either parent or inner both queries can be groupby query) translatable.
- Erase client projection when applying aggregate operation over GroupBy result.
- When processing result selector in GroupBy use the updated key selector if the select expression was pushed down.

Resolves #27132
Resolves #27266
Resolves #27433
Resolves #23601
Resolves #27721
Resolves #27796
Resolves #27801
Resolves #19683

Relates to #22957
smitpatel added a commit that referenced this issue May 3, 2022
Design:
- Introduce `SqlEnumerableExpression` - a holder class which indicates the `SqlExpression` is in form of a enumerable (or group) coming as a result of whole table selection or a grouping element. It also stores details about if `Distinct` is applied over grouping or if there are any orderings.
- Due to above `DistinctExpression` has been removed. The token while used to denote `Distinct` over grouping element were not valid in other parts of SQL tree hence it makes more sense to combine it with `SqlEnumerableExpression`.
- To support dual pass, `GroupByShaperExpression` contains 2 forms of grouping element. One element selector form which correlates directly with the parent grouped query, second subquery form which correlates to parent grouped query through a correlation predicate. Element selector is first used to translate aggregation. If that fails we use subquery form to translate as a subquery. Due to 2 forms of same component, GroupByShaperExpression disallows calling into VisitChildren method, any visitor which is visiting a tree containing GroupByShaperExpression (which appears only in `QueryExpression.ShaperExpression` or LINQ expression after remapping but before translation) must intercept the tree and either ignore or process it appropriately.
- An internal visitor (`GroupByAggregateChainProcessor`) inside SqlTranslator visits and process chain of queryable operations on a grouping element before aggregate is called and condense it into `SqlEnumerableExpression` which is then passed to method which translates aggregate. This visitor only processes Where/Distinct/Select for now. Future PR will add processing for OrderBy/ThenBy(Descending) operations to generate orderings.
- Side-effect above is that joins expanded over the grouping element (due to navigations used on aggregate chain), doesn't translate to aggregate anymore since we need to translate the join on parent query, remove the translated join if the chain didn't end in aggregate and also de-dupe same joins. Filing issue to improve this in future. Due to fragile nature of matching to lift the join, we shouldn't try to lift joins.
- To support custom aggregate operations, we will either reused `IMethodCallTranslator` or create a parallel structure for aggregate methods and call into it from SqlTranslator by passing translated SqlEnumerableExpression as appropriate.
- For complex grouping key, we cause a pushdown so that we can reference the grouping key through columns only. This allows us to reference the grouping key in correlation predicate for subquery without generating invalid SQL in many cases.
- With complex grouping key converting to columns, now we are able to correctly generate identifiers for grouping queries which makes more queries with correlated collections (where either parent or inner both queries can be groupby query) translatable.
- Erase client projection when applying aggregate operation over GroupBy result.
- When processing result selector in GroupBy use the updated key selector if the select expression was pushed down.

Resolves #27132
Resolves #27266
Resolves #27433
Resolves #23601
Resolves #27721
Resolves #27796
Resolves #27801
Resolves #19683

Relates to #22957
smitpatel added a commit that referenced this issue May 3, 2022
Design:
- Introduce `SqlEnumerableExpression` - a holder class which indicates the `SqlExpression` is in form of a enumerable (or group) coming as a result of whole table selection or a grouping element. It also stores details about if `Distinct` is applied over grouping or if there are any orderings.
- Due to above `DistinctExpression` has been removed. The token while used to denote `Distinct` over grouping element were not valid in other parts of SQL tree hence it makes more sense to combine it with `SqlEnumerableExpression`.
- To support dual pass, `GroupByShaperExpression` contains 2 forms of grouping element. One element selector form which correlates directly with the parent grouped query, second subquery form which correlates to parent grouped query through a correlation predicate. Element selector is first used to translate aggregation. If that fails we use subquery form to translate as a subquery. Due to 2 forms of same component, GroupByShaperExpression disallows calling into VisitChildren method, any visitor which is visiting a tree containing GroupByShaperExpression (which appears only in `QueryExpression.ShaperExpression` or LINQ expression after remapping but before translation) must intercept the tree and either ignore or process it appropriately.
- An internal visitor (`GroupByAggregateChainProcessor`) inside SqlTranslator visits and process chain of queryable operations on a grouping element before aggregate is called and condense it into `SqlEnumerableExpression` which is then passed to method which translates aggregate. This visitor only processes Where/Distinct/Select for now. Future PR will add processing for OrderBy/ThenBy(Descending) operations to generate orderings.
- Side-effect above is that joins expanded over the grouping element (due to navigations used on aggregate chain), doesn't translate to aggregate anymore since we need to translate the join on parent query, remove the translated join if the chain didn't end in aggregate and also de-dupe same joins. Filing issue to improve this in future. Due to fragile nature of matching to lift the join, we shouldn't try to lift joins.
- To support custom aggregate operations, we will either reused `IMethodCallTranslator` or create a parallel structure for aggregate methods and call into it from SqlTranslator by passing translated SqlEnumerableExpression as appropriate.
- For complex grouping key, we cause a pushdown so that we can reference the grouping key through columns only. This allows us to reference the grouping key in correlation predicate for subquery without generating invalid SQL in many cases.
- With complex grouping key converting to columns, now we are able to correctly generate identifiers for grouping queries which makes more queries with correlated collections (where either parent or inner both queries can be groupby query) translatable.
- Erase client projection when applying aggregate operation over GroupBy result.
- When processing result selector in GroupBy use the updated key selector if the select expression was pushed down.

Resolves #27132
Resolves #27266
Resolves #27433
Resolves #23601
Resolves #27721
Resolves #27796
Resolves #27801
Resolves #19683

Relates to #22957
smitpatel added a commit that referenced this issue May 4, 2022
Design:
- Introduce `SqlEnumerableExpression` - a holder class which indicates the `SqlExpression` is in form of a enumerable (or group) coming as a result of whole table selection or a grouping element. It also stores details about if `Distinct` is applied over grouping or if there are any orderings.
- Due to above `DistinctExpression` has been removed. The token while used to denote `Distinct` over grouping element were not valid in other parts of SQL tree hence it makes more sense to combine it with `SqlEnumerableExpression`.
- To support dual pass, `GroupByShaperExpression` contains 2 forms of grouping element. One element selector form which correlates directly with the parent grouped query, second subquery form which correlates to parent grouped query through a correlation predicate. Element selector is first used to translate aggregation. If that fails we use subquery form to translate as a subquery. Due to 2 forms of same component, GroupByShaperExpression disallows calling into VisitChildren method, any visitor which is visiting a tree containing GroupByShaperExpression (which appears only in `QueryExpression.ShaperExpression` or LINQ expression after remapping but before translation) must intercept the tree and either ignore or process it appropriately.
- An internal visitor (`GroupByAggregateChainProcessor`) inside SqlTranslator visits and process chain of queryable operations on a grouping element before aggregate is called and condense it into `SqlEnumerableExpression` which is then passed to method which translates aggregate. This visitor only processes Where/Distinct/Select for now. Future PR will add processing for OrderBy/ThenBy(Descending) operations to generate orderings.
- Side-effect above is that joins expanded over the grouping element (due to navigations used on aggregate chain), doesn't translate to aggregate anymore since we need to translate the join on parent query, remove the translated join if the chain didn't end in aggregate and also de-dupe same joins. Filing issue to improve this in future. Due to fragile nature of matching to lift the join, we shouldn't try to lift joins.
- To support custom aggregate operations, we will either reused `IMethodCallTranslator` or create a parallel structure for aggregate methods and call into it from SqlTranslator by passing translated SqlEnumerableExpression as appropriate.
- For complex grouping key, we cause a pushdown so that we can reference the grouping key through columns only. This allows us to reference the grouping key in correlation predicate for subquery without generating invalid SQL in many cases.
- With complex grouping key converting to columns, now we are able to correctly generate identifiers for grouping queries which makes more queries with correlated collections (where either parent or inner both queries can be groupby query) translatable.
- Erase client projection when applying aggregate operation over GroupBy result.
- When processing result selector in GroupBy use the updated key selector if the select expression was pushed down.

Resolves #27132
Resolves #27266
Resolves #27433
Resolves #23601
Resolves #27721
Resolves #27796
Resolves #27801
Resolves #19683

Relates to #22957
smitpatel added a commit that referenced this issue May 4, 2022
…7931)

Design:
- Introduce `SqlEnumerableExpression` - a holder class which indicates the `SqlExpression` is in form of a enumerable (or group) coming as a result of whole table selection or a grouping element. It also stores details about if `Distinct` is applied over grouping or if there are any orderings.
- Due to above `DistinctExpression` has been removed. The token while used to denote `Distinct` over grouping element were not valid in other parts of SQL tree hence it makes more sense to combine it with `SqlEnumerableExpression`.
- To support dual pass, `GroupByShaperExpression` contains 2 forms of grouping element. One element selector form which correlates directly with the parent grouped query, second subquery form which correlates to parent grouped query through a correlation predicate. Element selector is first used to translate aggregation. If that fails we use subquery form to translate as a subquery. Due to 2 forms of same component, GroupByShaperExpression disallows calling into VisitChildren method, any visitor which is visiting a tree containing GroupByShaperExpression (which appears only in `QueryExpression.ShaperExpression` or LINQ expression after remapping but before translation) must intercept the tree and either ignore or process it appropriately.
- An internal visitor (`GroupByAggregateChainProcessor`) inside SqlTranslator visits and process chain of queryable operations on a grouping element before aggregate is called and condense it into `SqlEnumerableExpression` which is then passed to method which translates aggregate. This visitor only processes Where/Distinct/Select for now. Future PR will add processing for OrderBy/ThenBy(Descending) operations to generate orderings.
- Side-effect above is that joins expanded over the grouping element (due to navigations used on aggregate chain), doesn't translate to aggregate anymore since we need to translate the join on parent query, remove the translated join if the chain didn't end in aggregate and also de-dupe same joins. Filing issue to improve this in future. Due to fragile nature of matching to lift the join, we shouldn't try to lift joins.
- To support custom aggregate operations, we will either reused `IMethodCallTranslator` or create a parallel structure for aggregate methods and call into it from SqlTranslator by passing translated SqlEnumerableExpression as appropriate.
- For complex grouping key, we cause a pushdown so that we can reference the grouping key through columns only. This allows us to reference the grouping key in correlation predicate for subquery without generating invalid SQL in many cases.
- With complex grouping key converting to columns, now we are able to correctly generate identifiers for grouping queries which makes more queries with correlated collections (where either parent or inner both queries can be groupby query) translatable.
- Erase client projection when applying aggregate operation over GroupBy result.
- When processing result selector in GroupBy use the updated key selector if the select expression was pushed down.

Resolves #27132
Resolves #27266
Resolves #27433
Resolves #23601
Resolves #27721
Resolves #27796
Resolves #27801
Resolves #19683

Relates to #22957
@ajcvickers ajcvickers modified the milestones: 7.0.0, 7.0.0-preview5 May 25, 2022
@ajcvickers ajcvickers modified the milestones: 7.0.0-preview5, 7.0.0 Nov 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query area-test closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants