Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group By combined with Where Exists generates inefficient SQL #27773

Closed
negatedx opened this issue Apr 6, 2022 · 4 comments
Closed

Group By combined with Where Exists generates inefficient SQL #27773

negatedx opened this issue Apr 6, 2022 · 4 comments

Comments

@negatedx
Copy link

negatedx commented Apr 6, 2022

Group By combined with Where Exists generates inefficient SQL

After upgrading to .net 6 and ef core 6 one of our queries started breaking. The query contained a group by and a where exists clause and on examination we found that essentially the entire query was being reproduced as a sub query just to apply an aggregate function to the group. The query failed because one of the where clauses was referencing a column that was not included in the group by which I couldn't reproduce as the query that broke was very complicated but fixing the inefficient sql generation would fix the breakage anyway. In our specific case we managed to work around the issue by replacing the "where exists" with a join but since this does not necessarily result in failure but rather inefficient sql I thought it was worth raising.

I have attached a complete project reproducing the issue but it comes down to the fact that this code:
GroupByIssue.zip

var groupByWithWhereExists = db.Blogs
    .Where(x => db.Posts.Select(y => new { y.BlogId }).Any(y => y.BlogId == x.BlogId))
    .Where(x => x.BlogId > 0)
    .GroupBy(x => new { x.BlogId })
    .Select(x => new { x.Key.BlogId, Url = x.Min(y => y.Url) });

generates this sql

SELECT
	[b].[BlogId]
   ,(SELECT
			MIN([b0].[Url])
		FROM [Blogs] AS [b0]
		WHERE (EXISTS (SELECT
				1
			FROM [Posts] AS [p0]
			WHERE [p0].[BlogId] = [b0].[BlogId])
		AND ([b0].[BlogId] > 0))
		AND ([b].[BlogId] = [b0].[BlogId]))
	AS [Url]
FROM [Blogs] AS [b]
WHERE EXISTS (SELECT
		1
	FROM [Posts] AS [p]
	WHERE [p].[BlogId] = [b].[BlogId])
AND ([b].[BlogId] > 0)
GROUP BY [b].[BlogId]

But I would expect it to produce this sql

SELECT
	[b].[BlogId]
   ,MIN([b].[Url]) AS [Url]
FROM [Blogs] AS [b]
WHERE EXISTS (SELECT
		1
	FROM [Posts] AS [p]
	WHERE [p].[BlogId] = [b].[BlogId])
AND ([b].[BlogId] > 0)
GROUP BY [b].[BlogId]

Provider and version information

EF Core version: 6.0.3
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: .NET 6.0
Operating system: Windows
IDE: Visual Studio Code

@maumar
Copy link
Contributor

maumar commented Apr 6, 2022

dupe of #27433

@maumar
Copy link
Contributor

maumar commented Apr 6, 2022

workaround: use switch to revert to pre-patch behavior:

AppContext.SetSwitch("Microsoft.EntityFrameworkCore.Issue27102", true);

@negatedx
Copy link
Author

negatedx commented Apr 6, 2022

Thanks for the reply @maumar. I searched for an existing issue but didn't find it sorry.

@maumar
Copy link
Contributor

maumar commented Apr 6, 2022

@negatedx all good, we can always use more scenarios for potential regression tests to be added to our test suite.

@maumar maumar closed this as completed Apr 6, 2022
@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants