Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL Server: transform aggregate functions over subqueries #34262

Merged
merged 1 commit into from
Jul 30, 2024

Conversation

roji
Copy link
Member

@roji roji commented Jul 21, 2024

Single-handed weekend project!

Closes #34256

@roji roji requested a review from a team July 21, 2024 10:50
/// any release. You should only use it directly in your code with extreme caution and knowing that
/// doing so can result in application failures when updating to a new Entity Framework Core release.
/// </summary>
protected override ShapedQueryExpression? TranslateAverage(ShapedQueryExpression source, LambdaExpression? selector, Type resultType)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR basically reverts #32374, which added detection of parameterized Contains inside aggregate function invocations, to avoid translating that to an OPENJSON subquery (as usual) but to instead fall back to the traditional IN+constants translation.

This PR now adds general support for IN+subquery - OPENJSON or otherwise - by lifting it out to OUTER APPLY, so we can remove the special casing and use OPENJSON.

Note that this creates more complicated SQL (see below).

SELECT COUNT(CASE
WHEN [c].[City] IN (N'London', N'Berlin') THEN 1
WHEN [s].[value] = CAST(1 AS bit) THEN 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the parameterized Contains change, as mentioned above. We now do IN+subquery with OPENJSON, just like everywhere else; the new postprocessing step then lifts that subquery up to an OUTER APPLY.

This makes the SQL more complex; I don't think there's a significant perf aspect here since I'm assuming no indexes would be used anyway for these cases. In any case, the OPENJSON vs. constants question is its own separate discussion, so I think it's the right thing to remove the special-casing here for SQL Server.

/cc @cincuranet

AssertSql(
"""
SELECT COALESCE(SUM(CASE
WHEN [s].[value] = CAST(1 AS bit) THEN [s0].[OrderID]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranma42 you may be slightly interested in this (I know you have no context whatsoever)... This specific test shows us lifting an EXISTS subquery out when it's nested inside an aggregate function call (not supported by SQL Server), and integrating it as an OUTER APPLY, at which point it can be referenced from inside the aggregate function call.

Specifically with EXISTS, it's funny/painful to see how we need to translate the boolean (search condition) to bit for projection out of the OUTER APPLY, and then to translate it back to bool via CASE/WHEN in the projection... So much trouble and ceremony due to the lack of a proper bool type.

Copy link
Contributor

@ranma42 ranma42 Jul 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would something like [s].[value] * [s0].[OrderID] make sense for this?

EDIT: already handled by #34262 (comment)

FROM [Customers] AS [c]
CROSS JOIN (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the CROSS JOIN instead of CROSS APPLY here, since the subquery is uncorrelated.

@roji roji force-pushed the SqlServerAggregateOverAggregate branch from e7752d9 to 95146a8 Compare July 21, 2024 11:59
Comment on lines +1152 to +1158
SELECT 5 + (
SELECT MAX([o0].[ProductID])
FROM [Order Details] AS [o0]
WHERE [o].[OrderID] = [o0].[OrderID]) AS [value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to have

    OUTER APPLY (
        SELECT 5 + MAX([o0].[ProductID]) AS [value]
        FROM [Order Details] AS [o0]
        WHERE [o].[OrderID] = [o0].[OrderID]
    ) AS [s]

here?

I don't expect this to perform better; it is mainly for readability, so just ignore this if it is not trivial 😇

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good suggestion but it seems orthogonal to this PR (i.e. the original SQL on the left had the same thing). Maybe open an issue to track that potential optimization?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #34314 to track this

130,
(await Assert.ThrowsAsync<SqlException>(
async () => await base.Average_over_nested_subquery_is_client_eval(async))).Number);
// Expected: 46.7126322751323
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use custom asserter to verify results with some error margin:

        => AssertAverage(
            async,
            ss => ss.Set<Customer>().OrderBy(c => c.CustomerID).Take(3),
            selector: c => (decimal)c.Orders.Average(o => 5 + o.OrderDetails.Average(od => od.ProductID)),
            asserter: (e, a) => Assert.Equal(e, a, precision: 3));

@roji roji force-pushed the SqlServerAggregateOverAggregate branch from 39c1584 to 2e1f0f8 Compare July 30, 2024 17:38
@roji roji enabled auto-merge (squash) July 30, 2024 17:38
@roji roji merged commit 390b5b0 into dotnet:main Jul 30, 2024
7 checks passed
@roji roji deleted the SqlServerAggregateOverAggregate branch July 30, 2024 19:17
@roji roji changed the title Transform aggregate functions over subqueries on SQL Server SQL Server: transform aggregate functions over subqueries Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL Server: Rewrite query to support aggregate functions over aggregates/subqueries
4 participants