-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query: optimize queries projecting correlated collections, so that they don't result in N+1 database queries #9282
Comments
This feature is quite important for complex DTO/ViewModel generation. Here is a simple example public class ChatInfo
{
public long Id { get; set; }
public string Title { get; set; }
public List<ChatEntry> Entries { get; set; } = new List<ChatEntry>();
public List<ChatAssignment> Assignments { get; set; } = new List<ChatAssignment>();
}
public class ChatEntry
{
public long Id { get; set; }
public ChatInfo Chat { get; set; }
public string UserId { get; set; }
public string Message { get; set; }
public DateTime Time { get; set; }
}
public class ChatAssignment
{
public long Id { get; set; }
public ChatInfo Chat { get; set; }
public string UserId { get; set; }
} A common view model for a list of chat items: public class ChatListDto
{
public string Title { get; set; }
public List<string> AssignedUserIds { get; set; }
public string LatestEntry { get; set; }
}
public class ChatEntryDto
{
public string UserId { get; set; }
public string Message { get; set; }
public DateTime Time { get; set; }
} A typical query to retrieve a list of chat list view models: await _context.Chats
.Include(c => c.Assignments)
.Where(c => c.Assignments.Any(a => a.UserId == "Steinmeier"))
.Select(c => new ChatListDto
{
Title = c.Title,
AssignedUserIds = c.Assignments.Select(a => a.UserId).ToList(),
LatestEntry = c.Entries
.OrderByDescending(e => e.Time)
.Select(e => new ChatEntryDto {
UserId = e.UserId,
Message = e.Message,
Time = e.Time
})
.FirstOrDefault()
})
.ToListAsync(); This generates n+1 queries (2 for each list item):
|
I think I'm running into the same problem with the following projection query. The one-to-manys/collections are causing N+1 queries to occur. This is different than the behavior in Entity Framework 6 which creates nested JOINs and no N+1 queries. This is a use case that is important for a reporting application that I have. I have been wanting to switch to Entity Framework Core for everything, but, have run into issues like this. I think this is one of the last remaining issues that I have.
Does anyone know if this issue will be solved sometime soon? |
…r queries projecting composed collection navigations (filters and projections) This feature optimizes a number of queries that project correlated collections. Previously those would produce N+1 queries. Now, we rewrite queries similarly to how Include pipeline does it, producing only two queries and correlating them on the client. To enable the feature the inner subquery needs to be wrapped around ToList() or ToArray() call. Current limitations: - only works for sync queries, - child entities are not being tracked, - no fixup between parent and child, - doesn't work if the parent query results in a CROSS JOIN, - doesn't work with result operators (i.e. Skip/Take/Distinct) - doesn't work if outer query needs client evaluation anywhere outside projection (e.g. order by or filter by NonMapped property) - doesn't work if inner query is correlated with query two (or more) levels up, (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name == c.Name))) Optimization process: original query: from c in ctx.Customers where c.CustomerID != "ALFKI" orderby c.City descending select (from o in c.Orders where o.OrderID > 100 orderby o.EmployeeID select new { o.OrderID, o.CustomerID }).ToList() nav rewrite converts it to: from c in customers where c.CustomerID != "ALFKI" order by c.City descending select (from o in orders where o.OrderID > 100 order by o.EmployeeID where c.CustomerID ?= o.CustomerID select new { o.OrderID, o.CustomerID }).ToList() which gets rewritten to (simplified): from c in customers where c.CustomerID != "ALFKI" order by c.City desc, c.CustomerID asc select CorrelateSubquery( outerKey: new { c.CustomerID }, correlationPredicate: (outer, inner) => outer.GetValue(0) == null || inner.GetValue(0) == null ? false : outer.GetValue(0) == inner.GetValue(0) correlatedCollectionFactory: () => from o in orders where o.OrderID > 100 join _c in from c in customers where c.CustomerID != "ALFKI" select new { c.City, c.CustomerID } on o.CustomerID equals _c.GetValue(1) order by _c.GetValue(0) descending, _c.GetValue(1), o.EmployeeID select new { InnerResult = new { o.OrderID, o.CustomerID } InnerKey = new { o.CustomerID }, OriginKey = new { _c.GetValue(1) } }).ToList() CorrelateSubquery is the method that combines results of outer and inner queries. Because order for both queries is the same we can perform only one pass thru inner query. We use correlation predicate (between outerKey parameter passed to CorrelateSubquery and InnerKey which is part of the final result) to determine whether giver result of the inner query belongs to the outer. We also remember latest origin key (i.e. PK of the outer, which is not always the same as outer key). If the origin key changes, it means that all inners for that outer have already been encountered.
…r queries projecting composed collection navigations (filters and projections) This feature optimizes a number of queries that project correlated collections. Previously those would produce N+1 queries. Now, we rewrite queries similarly to how Include pipeline does it, producing only two queries and correlating them on the client. To enable the feature the inner subquery needs to be wrapped around ToList() or ToArray() call. Current limitations: - only works for sync queries, - child entities are not being tracked, - no fixup between parent and child, - doesn't work if the parent query results in a CROSS JOIN, - doesn't work with result operators (i.e. Skip/Take/Distinct) - doesn't work if outer query needs client evaluation anywhere outside projection (e.g. order by or filter by NonMapped property) - doesn't work if inner query is correlated with query two (or more) levels up, (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name == c.Name).ToList()).ToList()) - doesn't work in nested scenarios where the outer collection is streaming (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name != "Foo").ToList())) - to make it work, outer collection must also be wrapped in ToList(). However it is OK to stream inner collection - in that case outer collection will take advantage of the optimization. Optimization process: original query: from c in ctx.Customers where c.CustomerID != "ALFKI" orderby c.City descending select (from o in c.Orders where o.OrderID > 100 orderby o.EmployeeID select new { o.OrderID, o.CustomerID }).ToList() nav rewrite converts it to: from c in customers where c.CustomerID != "ALFKI" order by c.City descending select (from o in orders where o.OrderID > 100 order by o.EmployeeID where c.CustomerID ?= o.CustomerID select new { o.OrderID, o.CustomerID }).ToList() which gets rewritten to (simplified): from c in customers where c.CustomerID != "ALFKI" order by c.City desc, c.CustomerID asc select CorrelateSubquery( outerKey: new { c.CustomerID }, correlationPredicate: (outer, inner) => outer.GetValue(0) == null || inner.GetValue(0) == null ? false : outer.GetValue(0) == inner.GetValue(0) correlatedCollectionFactory: () => from o in orders where o.OrderID > 100 join _c in from c in customers where c.CustomerID != "ALFKI" select new { c.City, c.CustomerID } on o.CustomerID equals _c.GetValue(1) order by _c.GetValue(0) descending, _c.GetValue(1), o.EmployeeID select new { InnerResult = new { o.OrderID, o.CustomerID } InnerKey = new { o.CustomerID }, OriginKey = new { _c.GetValue(1) } }).ToList() CorrelateSubquery is the method that combines results of outer and inner queries. Because order for both queries is the same we can perform only one pass thru inner query. We use correlation predicate (between outerKey parameter passed to CorrelateSubquery and InnerKey which is part of the final result) to determine whether giver result of the inner query belongs to the outer. We also remember latest origin key (i.e. PK of the outer, which is not always the same as outer key). If the origin key changes, it means that all inners for that outer have already been encountered.
…r queries projecting composed collection navigations (filters and projections) This feature optimizes a number of queries that project correlated collections. Previously those would produce N+1 queries. Now, we rewrite queries similarly to how Include pipeline does it, producing only two queries and correlating them on the client. To enable the feature the inner subquery needs to be wrapped around ToList() or ToArray() call. Current limitations: - only works for sync queries, - child entities are not being tracked, - no fixup between parent and child, - doesn't work if the parent query results in a CROSS JOIN, - doesn't work with result operators (i.e. Skip/Take/Distinct) - doesn't work if outer query needs client evaluation anywhere outside projection (e.g. order by or filter by NonMapped property) - doesn't work if inner query is correlated with query two (or more) levels up, (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name == c.Name).ToList()).ToList()) - doesn't work in nested scenarios where the outer collection is streaming (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name != "Foo").ToList())) - to make it work, outer collection must also be wrapped in ToList(). However it is OK to stream inner collection - in that case outer collection will take advantage of the optimization. Optimization process: original query: from c in ctx.Customers where c.CustomerID != "ALFKI" orderby c.City descending select (from o in c.Orders where o.OrderID > 100 orderby o.EmployeeID select new { o.OrderID, o.CustomerID }).ToList() nav rewrite converts it to: from c in customers where c.CustomerID != "ALFKI" order by c.City descending select (from o in orders where o.OrderID > 100 order by o.EmployeeID where c.CustomerID ?= o.CustomerID select new { o.OrderID, o.CustomerID }).ToList() which gets rewritten to (simplified): from c in customers where c.CustomerID != "ALFKI" order by c.City desc, c.CustomerID asc select CorrelateSubquery( outerKey: new { c.CustomerID }, correlationPredicate: (outer, inner) => outer.GetValue(0) == null || inner.GetValue(0) == null ? false : outer.GetValue(0) == inner.GetValue(0) correlatedCollectionFactory: () => from o in orders where o.OrderID > 100 join _c in from c in customers where c.CustomerID != "ALFKI" select new { c.City, c.CustomerID } on o.CustomerID equals _c.GetValue(1) order by _c.GetValue(0) descending, _c.GetValue(1), o.EmployeeID select new { InnerResult = new { o.OrderID, o.CustomerID } InnerKey = new { o.CustomerID }, OriginKey = new { _c.GetValue(1) } }).ToList() CorrelateSubquery is the method that combines results of outer and inner queries. Because order for both queries is the same we can perform only one pass thru inner query. We use correlation predicate (between outerKey parameter passed to CorrelateSubquery and InnerKey which is part of the final result) to determine whether giver result of the inner query belongs to the outer. We also remember latest origin key (i.e. PK of the outer, which is not always the same as outer key). If the origin key changes, it means that all inners for that outer have already been encountered.
…r queries projecting composed collection navigations (filters and projections) This feature optimizes a number of queries that project correlated collections. Previously those would produce N+1 queries. Now, we rewrite queries similarly to how Include pipeline does it, producing only two queries and correlating them on the client. To enable the feature the inner subquery needs to be wrapped around ToList() or ToArray() call. Current limitations: - only works for sync queries, - child entities are not being tracked, - no fixup between parent and child, - doesn't work if the parent query results in a CROSS JOIN, - doesn't work with result operators (i.e. Skip/Take/Distinct) - doesn't work if outer query needs client evaluation anywhere outside projection (e.g. order by or filter by NonMapped property) - doesn't work if inner query is correlated with query two (or more) levels up, (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name == c.Name).ToList()).ToList()) - doesn't work in nested scenarios where the outer collection is streaming (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name != "Foo").ToList())) - to make it work, outer collection must also be wrapped in ToList(). However it is OK to stream inner collection - in that case outer collection will take advantage of the optimization. Optimization process: original query: from c in ctx.Customers where c.CustomerID != "ALFKI" orderby c.City descending select (from o in c.Orders where o.OrderID > 100 orderby o.EmployeeID select new { o.OrderID, o.CustomerID }).ToList() nav rewrite converts it to: from c in customers where c.CustomerID != "ALFKI" order by c.City descending select (from o in orders where o.OrderID > 100 order by o.EmployeeID where c.CustomerID ?= o.CustomerID select new { o.OrderID, o.CustomerID }).ToList() which gets rewritten to (simplified): from c in customers where c.CustomerID != "ALFKI" order by c.City desc, c.CustomerID asc select CorrelateSubquery( outerKey: new { c.CustomerID }, correlationPredicate: (outer, inner) => outer.GetValue(0) == null || inner.GetValue(0) == null ? false : outer.GetValue(0) == inner.GetValue(0) correlatedCollectionFactory: () => from o in orders where o.OrderID > 100 join _c in from c in customers where c.CustomerID != "ALFKI" select new { c.City, c.CustomerID } on o.CustomerID equals _c.GetValue(1) order by _c.GetValue(0) descending, _c.GetValue(1), o.EmployeeID select new { InnerResult = new { o.OrderID, o.CustomerID } InnerKey = new { o.CustomerID }, OriginKey = new { _c.GetValue(1) } }).ToList() CorrelateSubquery is the method that combines results of outer and inner queries. Because order for both queries is the same we can perform only one pass thru inner query. We use correlation predicate (between outerKey parameter passed to CorrelateSubquery and InnerKey which is part of the final result) to determine whether giver result of the inner query belongs to the outer. We also remember latest origin key (i.e. PK of the outer, which is not always the same as outer key). If the origin key changes, it means that all inners for that outer have already been encountered.
…ons, so that they don't result in N+1 database queries This feature optimizes a number of queries that project correlated collections. Previously those would produce N+1 queries. Now, we rewrite queries similarly to how Include pipeline does it, producing only two queries and correlating them on the client. To enable the feature the inner subquery needs to be wrapped around ToList() or ToArray() call. Current limitations: - only works for sync queries, - child entities are not being tracked, - no fixup between parent and child, - doesn't work if the parent query results in a CROSS JOIN, - doesn't work with result operators (i.e. Skip/Take/Distinct) - doesn't work if outer query needs client evaluation anywhere outside projection (e.g. order by or filter by NonMapped property) - doesn't work if inner query is correlated with query two (or more) levels up, (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name == c.Name).ToList()).ToList()) - doesn't work in nested scenarios where the outer collection is streaming (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name != "Foo").ToList())) - to make it work, outer collection must also be wrapped in ToList(). However it is OK to stream inner collection - in that case outer collection will take advantage of the optimization. Optimization process: original query: from c in ctx.Customers where c.CustomerID != "ALFKI" orderby c.City descending select (from o in c.Orders where o.OrderID > 100 orderby o.EmployeeID select new { o.OrderID, o.CustomerID }).ToList() nav rewrite converts it to: from c in customers where c.CustomerID != "ALFKI" order by c.City descending select (from o in orders where o.OrderID > 100 order by o.EmployeeID where c.CustomerID ?= o.CustomerID select new { o.OrderID, o.CustomerID }).ToList() which gets rewritten to (simplified): from c in customers where c.CustomerID != "ALFKI" order by c.City desc, c.CustomerID asc select CorrelateSubquery( outerKey: new { c.CustomerID }, correlationPredicate: (outer, inner) => outer.GetValue(0) == null || inner.GetValue(0) == null ? false : outer.GetValue(0) == inner.GetValue(0) correlatedCollectionFactory: () => from o in orders where o.OrderID > 100 join _c in from c in customers where c.CustomerID != "ALFKI" select new { c.City, c.CustomerID } on o.CustomerID equals _c.GetValue(1) order by _c.GetValue(0) descending, _c.GetValue(1), o.EmployeeID select new { InnerResult = new { o.OrderID, o.CustomerID } InnerKey = new { o.CustomerID }, OriginKey = new { _c.GetValue(1) } }).ToList() CorrelateSubquery is the method that combines results of outer and inner queries. Because order for both queries is the same we can perform only one pass thru inner query. We use correlation predicate (between outerKey parameter passed to CorrelateSubquery and InnerKey which is part of the final result) to determine whether giver result of the inner query belongs to the outer. We also remember latest origin key (i.e. PK of the outer, which is not always the same as outer key). If the origin key changes, it means that all inners for that outer have already been encountered.
…ons, so that they don't result in N+1 database queries This feature optimizes a number of queries that project correlated collections. Previously those would produce N+1 queries. Now, we rewrite queries similarly to how Include pipeline does it, producing only two queries and correlating them on the client. To enable the feature the inner subquery needs to be wrapped around ToList() or ToArray() call. Current limitations: - only works for sync queries, - child entities are not being tracked, - no fixup between parent and child, - doesn't work if the parent query results in a CROSS JOIN, - doesn't work with result operators (i.e. Skip/Take/Distinct) - doesn't work if outer query needs client evaluation anywhere outside projection (e.g. order by or filter by NonMapped property) - doesn't work if inner query is correlated with query two (or more) levels up, (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name == c.Name).ToList()).ToList()) - doesn't work in nested scenarios where the outer collection is streaming (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name != "Foo").ToList())) - to make it work, outer collection must also be wrapped in ToList(). However it is OK to stream inner collection - in that case outer collection will take advantage of the optimization. Optimization process: original query: from c in ctx.Customers where c.CustomerID != "ALFKI" orderby c.City descending select (from o in c.Orders where o.OrderID > 100 orderby o.EmployeeID select new { o.OrderID, o.CustomerID }).ToList() nav rewrite converts it to: from c in customers where c.CustomerID != "ALFKI" order by c.City descending select (from o in orders where o.OrderID > 100 order by o.EmployeeID where c.CustomerID ?= o.CustomerID select new { o.OrderID, o.CustomerID }).ToList() which gets rewritten to (simplified): from c in customers where c.CustomerID != "ALFKI" order by c.City desc, c.CustomerID asc select CorrelateSubquery( outerKey: new { c.CustomerID }, correlationPredicate: (outer, inner) => outer.GetValue(0) == null || inner.GetValue(0) == null ? false : outer.GetValue(0) == inner.GetValue(0) correlatedCollectionFactory: () => from o in orders where o.OrderID > 100 join _c in from c in customers where c.CustomerID != "ALFKI" select new { c.City, c.CustomerID } on o.CustomerID equals _c.GetValue(1) order by _c.GetValue(0) descending, _c.GetValue(1), o.EmployeeID select new { InnerResult = new { o.OrderID, o.CustomerID } InnerKey = new { o.CustomerID }, OriginKey = new { _c.GetValue(1) } }).ToList() CorrelateSubquery is the method that combines results of outer and inner queries. Because order for both queries is the same we can perform only one pass thru inner query. We use correlation predicate (between outerKey parameter passed to CorrelateSubquery and InnerKey which is part of the final result) to determine whether giver result of the inner query belongs to the outer. We also remember latest origin key (i.e. PK of the outer, which is not always the same as outer key). If the origin key changes, it means that all inners for that outer have already been encountered.
…ons, so that they don't result in N+1 database queries This feature optimizes a number of queries that project correlated collections. Previously those would produce N+1 queries. Now, we rewrite queries similarly to how Include pipeline does it, producing only two queries and correlating them on the client. To enable the feature the inner subquery needs to be wrapped around ToList() or ToArray() call. Current limitations: - only works for sync queries, - child entities are not being tracked, - no fixup between parent and child, - doesn't work if the parent query results in a CROSS JOIN, - doesn't work with result operators (i.e. Skip/Take/Distinct) - doesn't work if outer query needs client evaluation anywhere outside projection (e.g. order by or filter by NonMapped property) - doesn't work if inner query is correlated with query two (or more) levels up, (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name == c.Name).ToList()).ToList()) - doesn't work in nested scenarios where the outer collection is streaming (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name != "Foo").ToList())) - to make it work, outer collection must also be wrapped in ToList(). However it is OK to stream inner collection - in that case outer collection will take advantage of the optimization. Optimization process: original query: from c in ctx.Customers where c.CustomerID != "ALFKI" orderby c.City descending select (from o in c.Orders where o.OrderID > 100 orderby o.EmployeeID select new { o.OrderID, o.CustomerID }).ToList() nav rewrite converts it to: from c in customers where c.CustomerID != "ALFKI" order by c.City descending select (from o in orders where o.OrderID > 100 order by o.EmployeeID where c.CustomerID ?= o.CustomerID select new { o.OrderID, o.CustomerID }).ToList() which gets rewritten to (simplified): from c in customers where c.CustomerID != "ALFKI" order by c.City desc, c.CustomerID asc select CorrelateSubquery( outerKey: new { c.CustomerID }, correlationPredicate: (outer, inner) => outer.GetValue(0) == null || inner.GetValue(0) == null ? false : outer.GetValue(0) == inner.GetValue(0) correlatedCollectionFactory: () => from o in orders where o.OrderID > 100 join _c in from c in customers where c.CustomerID != "ALFKI" select new { c.City, c.CustomerID } on o.CustomerID equals _c.GetValue(1) order by _c.GetValue(0) descending, _c.GetValue(1), o.EmployeeID select new { InnerResult = new { o.OrderID, o.CustomerID } InnerKey = new { o.CustomerID }, OriginKey = new { _c.GetValue(1) } }).ToList() CorrelateSubquery is the method that combines results of outer and inner queries. Because order for both queries is the same we can perform only one pass thru inner query. We use correlation predicate (between outerKey parameter passed to CorrelateSubquery and InnerKey which is part of the final result) to determine whether giver result of the inner query belongs to the outer. We also remember latest origin key (i.e. PK of the outer, which is not always the same as outer key). If the origin key changes, it means that all inners for that outer have already been encountered.
…ons, so that they don't result in N+1 database queries This feature optimizes a number of queries that project correlated collections. Previously those would produce N+1 queries. Now, we rewrite queries similarly to how Include pipeline does it, producing only two queries and correlating them on the client. To enable the feature the inner subquery needs to be wrapped around ToList() or ToArray() call. Current limitations: - only works for sync queries, - doesn't work if the parent query results in a CROSS JOIN, - doesn't work with result operators (i.e. Skip/Take/Distinct) - doesn't work if outer query needs client evaluation anywhere outside projection (e.g. order by or filter by NonMapped property) - doesn't work if inner query is correlated with query two (or more) levels up, (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name == c.Name).ToList()).ToList()) - doesn't work in nested scenarios where the outer collection is streaming (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name != "Foo").ToList())) - to make it work, outer collection must also be wrapped in ToList(). However it is OK to stream inner collection - in that case outer collection will take advantage of the optimization. Optimization process: original query: from c in ctx.Customers where c.CustomerID != "ALFKI" orderby c.City descending select (from o in c.Orders where o.OrderID > 100 orderby o.EmployeeID select new { o.OrderID, o.CustomerID }).ToList() nav rewrite converts it to: from c in customers where c.CustomerID != "ALFKI" order by c.City descending select (from o in orders where o.OrderID > 100 order by o.EmployeeID where c.CustomerID ?= o.CustomerID select new { o.OrderID, o.CustomerID }).ToList() which gets rewritten to (simplified): from c in customers where c.CustomerID != "ALFKI" order by c.City desc, c.CustomerID asc select CorrelateSubquery( outerKey: new { c.CustomerID }, correlationPredicate: (outer, inner) => outer.GetValue(0) == null || inner.GetValue(0) == null ? false : outer.GetValue(0) == inner.GetValue(0) correlatedCollectionFactory: () => from o in orders where o.OrderID > 100 join _c in from c in customers where c.CustomerID != "ALFKI" select new { c.City, c.CustomerID } on o.CustomerID equals _c.GetValue(1) order by _c.GetValue(0) descending, _c.GetValue(1), o.EmployeeID select new { InnerResult = new { o.OrderID, o.CustomerID } InnerKey = new { o.CustomerID }, OriginKey = new { _c.GetValue(1) } }).ToList() CorrelateSubquery is the method that combines results of outer and inner queries. Because order for both queries is the same we can perform only one pass thru inner query. We use correlation predicate (between outerKey parameter passed to CorrelateSubquery and InnerKey which is part of the final result) to determine whether giver result of the inner query belongs to the outer. We also remember latest origin key (i.e. PK of the outer, which is not always the same as outer key). If the origin key changes, it means that all inners for that outer have already been encountered.
…ons, so that they don't result in N+1 database queries This feature optimizes a number of queries that project correlated collections. Previously those would produce N+1 queries. Now, we rewrite queries similarly to how Include pipeline does it, producing only two queries and correlating them on the client. To enable the feature the inner subquery needs to be wrapped around ToList() or ToArray() call. Current limitations: - only works for sync queries, - doesn't work if the parent query results in a CROSS JOIN, - doesn't work with result operators (i.e. Skip/Take/Distinct) - doesn't work if outer query needs client evaluation anywhere outside projection (e.g. order by or filter by NonMapped property) - doesn't work if inner query is correlated with query two (or more) levels up, (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name == c.Name).ToList()).ToList()) - doesn't work in nested scenarios where the outer collection is streaming (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name != "Foo").ToList())) - to make it work, outer collection must also be wrapped in ToList(). However it is OK to stream inner collection - in that case outer collection will take advantage of the optimization. Optimization process: original query: from c in ctx.Customers where c.CustomerID != "ALFKI" orderby c.City descending select (from o in c.Orders where o.OrderID > 100 orderby o.EmployeeID select new { o.OrderID, o.CustomerID }).ToList() nav rewrite converts it to: from c in customers where c.CustomerID != "ALFKI" order by c.City descending select (from o in orders where o.OrderID > 100 order by o.EmployeeID where c.CustomerID ?= o.CustomerID select new { o.OrderID, o.CustomerID }).ToList() which gets rewritten to (simplified): from c in customers where c.CustomerID != "ALFKI" order by c.City desc, c.CustomerID asc select CorrelateSubquery( outerKey: new { c.CustomerID }, correlationPredicate: (outer, inner) => outer.GetValue(0) == null || inner.GetValue(0) == null ? false : outer.GetValue(0) == inner.GetValue(0) correlatedCollectionFactory: () => from o in orders where o.OrderID > 100 join _c in from c in customers where c.CustomerID != "ALFKI" select new { c.City, c.CustomerID } on o.CustomerID equals _c.GetValue(1) order by _c.GetValue(0) descending, _c.GetValue(1), o.EmployeeID select new { InnerResult = new { o.OrderID, o.CustomerID } InnerKey = new { o.CustomerID }, OriginKey = new { _c.GetValue(1) } }).ToList() CorrelateSubquery is the method that combines results of outer and inner queries. Because order for both queries is the same we can perform only one pass thru inner query. We use correlation predicate (between outerKey parameter passed to CorrelateSubquery and InnerKey which is part of the final result) to determine whether giver result of the inner query belongs to the outer. We also remember latest origin key (i.e. PK of the outer, which is not always the same as outer key). If the origin key changes, it means that all inners for that outer have already been encountered.
fixed in b95f23f |
Much thanks. Definitely looking forward to testing this one out. |
@jemiller0 please let us know about any issues you encounter. Given how complex your model/queries are, the feedback would be extremely helpful in finding bugs that slipped through our internal testing. |
@maumar I just tested the following query using
Unfortunately, it looks like it's still using N+1 queries for my query.
Can you tell me of the fix is included in the MyGet package that I'm using? |
@jemiller0 optimization cannot be applied if you are streaming a collection. @maumar while I understand the purpose of not applying optimization for streamed collections, in most cases that's not what people need. Sometimes you can just forget to add |
@SSkovboiSS Thanks for the tip. I was going to say, I actually don't mind the fact that you need to use I added
That works as far as preventing it from performing N+1 queries. However, if you have a
Without the filter, it looks like what it's doing is acceptable, and seems like it's probably better than what EF 6 did with the big single query. If I take off the
|
The following is what the query roughly looks like in EF 6. It looks horrid, but, actually works not that bad. If EF Core could just add the filters rather than doing N+1s, probably it would be better than EF 6. It seems like it would be nice if there was a way to specify that you want to use the one big query method though. People could use it for backwards compatibility if nothing else.
|
@SSkovboiSS the main reason why we don't apply optimization for streaming collection is practical - it get's quite tricky to do for some edge cases, e.g. when the inner streaming collection enumerator is enumerated multiple times, or out of order. We would have to keep track of quite a bit of data to determine when the inner query should be re-issued, and when it is safe to continue enumerating. We were considering making all collections buffered and apply the optimization globally, with a switch to disable it altogether but that would not allow for mixing buffering and streaming in a single query. |
@jemiller0 what you encountered looks a like a bug. I believe the reason why optimization is not being applied is Take() - optimization currently can't handle paging operations, the translation of a correlated paged query is much more complicated. However, we should have detected that and fallback to the old behavior - essentially there should be no difference between queries with ToArray() and without it. Can you double check that the Take() is culprit here? |
@jemiller0 I looked at the query you provided, and the problem is that adding filter after the projection makes it a much complicated query, from EF perspective - we create quite a complicated predicate, which gets client-evaluated and that disables correlated collection optimization and also this is why you don't see a WHERE clause in your query. You can modify the query slightly (move the filter before the projection): using (var lc = new LibraryContext())
{
var q = from oi in lc.OrderItems
where oi.OrderItemAccounts.Select(oia => oia.Account.Code).Any(s => s == "XPUTT")
select new
{
oi.Id,
Order = oi.Order.Id,
oi.OrderId,
oi.Order.CreationTime,
OrderStatus = oi.Order.OrderStatus.Name,
oi.Order.OrderStatusId,
OrderType = oi.Order.OrderType.Name,
oi.Order.OrderTypeId,
ReceiptStatus = oi.Holding.ReceiptStatus.Name,
oi.Holding.ReceiptStatusId,
Vendor = oi.Order.Vendor.Name,
oi.Order.Vendor.EmailAddress,
oi.Order.VendorId,
Accounts = oi.OrderItemAccounts.Select(oia => oia.Account.Code).ToArray(),
Donors = oi.OrderItemDonors.Select(oid => oid.Donor.Code).ToArray(),
oi.VendorNotes,
oi.SpecialNotes,
oi.MiscellaneousNotes,
oi.SelectorNotes,
oi.Order.VendorCustomerId,
DeliveryRoom = oi.Order.DeliveryRoom.Name,
oi.Order.DeliveryRoomId,
ExtendedCost = oi.UnitPrice * oi.Quantity,
oi.VendorItemId,
Bib = oi.Bib.Title,
oi.BibId,
oi.Bib.Author,
oi.Bib.OclcNumber,
OclcNumbers = oi.Bib.OclcNumbers.Select(n => n.Content).ToArray(),
Isbns = oi.Bib.Isbns.Select(n => n.Content),
Issns = oi.Bib.Issns.Select(n => n.Content),
OtherNumbers = oi.Bib.Numbers.Where(n => n.Type == "024").Select(n => n.Content).ToArray(),
PublisherNumbers = oi.Bib.Numbers.Where(n => n.Type == "028").Select(n => n.Content).ToArray(),
Editions = oi.Bib.Editions.Select(n => n.Content).ToArray(),
Publications = oi.Bib.Publications.Select(n => n.Content).ToArray()
};
var l = q.Take(5).ToArray();
foreach (var oi in l) oi.Accounts.ToArray();
} and you will get much better queries. (Disclaimer: I only tested it with a simplified model so there still might be more issues.) |
Thanks @maumar. I'm sorry I didn't get back to you yet. Some things came up a work and I haven't had a chance to test. The main reason I'm putting the filter at the bottom like that is that I'm using the query with a grid control (RadGrid) from a company called Telerik. The grid control allows you to enable filtering controls at the top of the grid and generates Dynamic LINQ filter strings. The output projection field names are what the grid knows. Otherwise, I agree, it would be best to do the filtering at the top. |
filed #10811 to track the RadGrid issue, perhaps we can do some QM rewrite to improve the queries, making them take advantage of N+1 optimization (and others) |
Hi, what about this query, which is still producing N+1 queries in EF2.1 preview: dbContext.ForumThreads.Include(t => t.Posts) https://stackoverflow.com/questions/49346001/how-to-avoid-n1-queries-in-ef-core-2-1/ |
I did not try this in 2.1, but I would like to mention another scenario (FirstOrDefault):
|
I have the code above. The table map has a one to many relation to MapPoints. So in this query I'm trying to list all maps and in the same query count the number of points from the MapPoints table. We are using EF core 2.2 and have problem with N+1. @maumar Is this supposed to be fixed in that version or is this specific issue that I have not fixed in 2.2? |
@patcor 2.1 and 2.2 fix has a number of limitations, using Skip/Take is one of them. We have addressed them all in 3.0 which now produces a single query for all nested navigation/include scenarios |
In #8584 we allowed include pipeline to be reused for queries that project collection navigations as long as they are not composed on. This results in generating much better queries for those cases - no longer producing N+1 queries.
This could be further enhanced to support collection navigations that are composed on (e.g. customers.Select(c => c.Orders.Where(o => o.City == "London")), so that those queries also stop producing N+1 queries.
The text was updated successfully, but these errors were encountered: