Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix to #9282 - Query: optimize queries projecting correlated collecti…
…ons, so that they don't result in N+1 database queries This feature optimizes a number of queries that project correlated collections. Previously those would produce N+1 queries. Now, we rewrite queries similarly to how Include pipeline does it, producing only two queries and correlating them on the client. To enable the feature the inner subquery needs to be wrapped around ToList() or ToArray() call. Current limitations: - only works for sync queries, - doesn't work if the parent query results in a CROSS JOIN, - doesn't work with result operators (i.e. Skip/Take/Distinct) - doesn't work if outer query needs client evaluation anywhere outside projection (e.g. order by or filter by NonMapped property) - doesn't work if inner query is correlated with query two (or more) levels up, (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name == c.Name).ToList()).ToList()) - doesn't work in nested scenarios where the outer collection is streaming (e.g. customers.Select(c => c.Orders.Select(o => o.OrderDetails.Where(od => od.Name != "Foo").ToList())) - to make it work, outer collection must also be wrapped in ToList(). However it is OK to stream inner collection - in that case outer collection will take advantage of the optimization. Optimization process: original query: from c in ctx.Customers where c.CustomerID != "ALFKI" orderby c.City descending select (from o in c.Orders where o.OrderID > 100 orderby o.EmployeeID select new { o.OrderID, o.CustomerID }).ToList() nav rewrite converts it to: from c in customers where c.CustomerID != "ALFKI" order by c.City descending select (from o in orders where o.OrderID > 100 order by o.EmployeeID where c.CustomerID ?= o.CustomerID select new { o.OrderID, o.CustomerID }).ToList() which gets rewritten to (simplified): from c in customers where c.CustomerID != "ALFKI" order by c.City desc, c.CustomerID asc select CorrelateSubquery( outerKey: new { c.CustomerID }, correlationPredicate: (outer, inner) => outer.GetValue(0) == null || inner.GetValue(0) == null ? false : outer.GetValue(0) == inner.GetValue(0) correlatedCollectionFactory: () => from o in orders where o.OrderID > 100 join _c in from c in customers where c.CustomerID != "ALFKI" select new { c.City, c.CustomerID } on o.CustomerID equals _c.GetValue(1) order by _c.GetValue(0) descending, _c.GetValue(1), o.EmployeeID select new { InnerResult = new { o.OrderID, o.CustomerID } InnerKey = new { o.CustomerID }, OriginKey = new { _c.GetValue(1) } }).ToList() CorrelateSubquery is the method that combines results of outer and inner queries. Because order for both queries is the same we can perform only one pass thru inner query. We use correlation predicate (between outerKey parameter passed to CorrelateSubquery and InnerKey which is part of the final result) to determine whether giver result of the inner query belongs to the outer. We also remember latest origin key (i.e. PK of the outer, which is not always the same as outer key). If the origin key changes, it means that all inners for that outer have already been encountered.
- Loading branch information