Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue when using multiple includes and QueryFilters #25731

Closed
sergiorpvn opened this issue Aug 26, 2021 · 5 comments
Closed

Performance issue when using multiple includes and QueryFilters #25731

sergiorpvn opened this issue Aug 26, 2021 · 5 comments
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported

Comments

@sergiorpvn
Copy link

I'm having a performance issue when using multiple includes with SplitQuery and QueryFilters.

When I do:

dbContext.Principal
   .Include(p => p.Entity1)
   .Include(p => p.Entity2)
  ....
   .Include(p => p.EntityN)
   .Include(p => p.RelatedList0)
   .Include(p => p.RelatedList1)
   ...

All tables have SoftDeletion, so I have a .HasQueryFilter("IsDeleted = 0") on them.

The generated queries for the RelatedLists are adding LEFT JOIN with ALL 'n' tables to filter the IsDeleted and then use its IDs in the ORDER BY clause like:

SELECT [tM].Columns, [p].[ID], [t0].[RelatedID0], [t1].[RelatedID1]... [tn].[RelatedIDN]
FROM [PrincipalTable] AS [p]
LEFT JOIN (
    SELECT [x].[RelatedID0]
    FROM [RelatedTable0] AS [x] 
    WHERE [x].[IsDeleted] = CAST(0 AS bit)
) AS [t0] ON [p].[RelatedID0] = [t0].[RelatedID0]
LEFT JOIN (
    SELECT [x].[RelatedID1]
    FROM [RelatedTable1] AS [x] 
    WHERE [x].[IsDeleted] = CAST(0 AS bit)
) AS [t0] ON [p].[RelatedID1] = [t1].[RelatedID0]
LEFT JOIN ...
INNER JOIN 
( table it's really loading the data) as [tM] ON [p].[PrincipalID] = [tM].[PrincipalID]
ORDER BY [p].[PrincipalID], [t0].[RelatedID0], [t1].[RelatedID1]... [tn].[RelatedIDN]

This is making the query too complex for the database, and if you use:

.Include(p => p.RelatedList1).ThenInclude(p => p.AnotherList1)
or
.Include(p => p.Entity1).ThenInclude(p => p.RelatedList1)

It adds more complexity.

The number of 'reads' in SQL Server can be really high

I believe that EF Core could improve the performance by avoiding some LEFT JOINs.
I'm not sure how it's implemented, but I think it could at least remove the Entities IDs from ORDER BY and SELECT clauses if the Entity doesn't have a '.ThenInclude', and we won't need the LEFT JOIN there.

And when we have the ThenInclude, check if we really need to do the LEFT JOIN or if it's ok to use the ID that's on the 'Principal' table, the difference is that it could have a deleted ID instead of NULL there, but if it was already filtered in the main query, I think we don't care if it will appear as an ID that will be ignored or if it's NULL and will be ignored anyway. But again, I didn't look at the EF Core code to check if this information is useful.

EF Core version: 6.0.0-preview.7.21378.4
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: NET6.0-preview7
Operating system: Windows 10
IDE: VS 2022 Preview 17.0.0 Preview 3.1

@roji
Copy link
Member

roji commented Aug 26, 2021

You should probably take a look at this documentation page; if you're using lots of collection joins, where the dependent table has high cardinality (many dependent rows for each principal rows on average), then you're probably running into the so-called "cartesian explosion" problem. This can be mitigated by using the split query feature of EF Core.

I'm not sure how it's implemented, but I think it could at least remove the Entities IDs from ORDER BY and SELECT clauses if the Entity doesn't have a '.ThenInclude', and we won't need the LEFT JOIN there.

The orderings are necessary in order for EF Core to properly load the entity graph. We've recently removed the last ORDER BY (#19828), but the others cannot be removed at this point (#19571). Here again, switching to split query may improve things.

And when we have the ThenInclude, check if we really need to do the LEFT JOIN or if it's ok to use the ID that's on the 'Principal' table, the difference is that it could have a deleted ID instead of NULL there, but if it was already filtered in the main query, I think we don't care if it will appear as an ID that will be ignored or if it's NULL and will be ignored anyway. But again, I didn't look at the EF Core code to check if this information is useful.

I'm not sure I follow, but the LEFT JOIN is necessary in order to load back the rows that do pass your query filter.

@sergiorpvn
Copy link
Author

Hi @roji thanks for the quick reply, I'm already using the SplitQuery() as default in my DBContext, and I'm not calling the AsSingleQuery() method anywhere, so it should be using the split query behavior.

I forgot to mention that I'm migrating my project from .netcore 2.1 to .net6-preview and testing now, and I thought that with this new SplitQuery behavior I could remove my custom loads and have some performance improvements (less roundtrips/connections open/close). On .netcore 2.1 I'm first loading the main entity with the related lists (it splits the query) and then loading the entities that aren't collections and so on, to not fall in this cartesian explosion issue.

This #19828 may (or could) fix the first point that I suggested (that you quoted), if I'm not loading anything in it, we don't need to order by its ID.

The other point, I understand that you use these order bys to load the data efficiently in the collection, but in that scenario I mentioned, we could maintain the 1st select as it is (maybe just switching from using the RelatedEntityID from the left joins and using from MainTable.RelatedEntityID in the order by):

1st select:
SELECT * FROM Main M 
LEFT JOIN (
   select * OtherTable where ...
) O ON O.OtherID = M.OtherID 
ORDER BY Main.MainID, Main.OtherID

other selects:
SELECT * FROM Main M
INNER JOIN AnotherTable A ON... ORDER BY M.MainID, M.OtherID

@ajcvickers
Copy link
Member

/cc @roji

@roji
Copy link
Member

roji commented Sep 6, 2021

@sergiorpvn sorry for not replying earlier; I've taken a look at your comments again, and I'm not sure I understand...

The query above with the multiple LEFT JOINs should get generated in single query mode: I'm assuming that RelatedTable0 and RelatedTable1 correspond to the two collection navigations on Principal, in which case they wouldn't be loaded via a join when using split query. I've done a quick repro based on your code (see below for the full code), and I see the following two queries when using split query:

-- EF Core loads the principals first
SELECT [p].[Id]
FROM [Principal] AS [p]
ORDER BY [p].[Id]

-- Since EF found at least one principal, it then loads the related entities:
SELECT [t].[RelatedID0], [t].[IsDeleted], [t].[PrincipalId], [p].[Id]
FROM [Principal] AS [p]
INNER JOIN (
    SELECT [r].[RelatedID0], [r].[IsDeleted], [r].[PrincipalId]
    FROM [Related0] AS [r]
    WHERE [r].[IsDeleted] = CAST(0 AS bit)
) AS [t] ON [p].[Id] = [t].[PrincipalId]
ORDER BY [p].[Id]

Note that there are no LEFT JOINs.

Can you please post an actual, runnable C# code sample, along with the SQL EF generates from it, and the SQL you'd like to see instead? This way we're sure we're talking about the same thing.

Repro code
await using (var ctx = new BlogContext())
{
    await ctx.Database.EnsureDeletedAsync();
    await ctx.Database.EnsureCreatedAsync();

    ctx.Principal.Add(new());
    await ctx.SaveChangesAsync();
}

await using (var ctx = new BlogContext())
{
    _ = await ctx.Principal
        .AsSplitQuery()
        .Include(p => p.RelatedList0)
        .Include(p => p.RelatedList1)
        .ToListAsync();
}

public class BlogContext : DbContext
{
    public DbSet<Principal> Principal { get; set; }

    static ILoggerFactory ContextLoggerFactory
        => LoggerFactory.Create(b => b.AddConsole().AddFilter("", LogLevel.Information));

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0")
            .EnableSensitiveDataLogging()
            .UseLoggerFactory(ContextLoggerFactory);

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Related0>().HasQueryFilter(r => !r.IsDeleted);
        modelBuilder.Entity<Related1>().HasQueryFilter(r => !r.IsDeleted);
    }
}

public class Principal
{
    public int Id { get; set; }

    public List<Related0> RelatedList0 { get; set; }
    public List<Related1> RelatedList1 { get; set; }
}

public class Related0
{
    [Key]
    public int RelatedID0 { get; set; }
    public bool IsDeleted { get; set; }

    public Principal Principal { get; set; }
}

public class Related1
{
    [Key]
    public int RelatedID1 { get; set; }
    public bool IsDeleted { get; set; }

    public Principal Principal { get; set; }
}

@AndriySvyryd AndriySvyryd added the closed-no-further-action The issue is closed and no further action is planned. label Sep 16, 2021
@sergiorpvn
Copy link
Author

Hi @roji ,

Sorry for the (too) late reply, I was working on other stuff, vacations, etc.. but now I can go back to this.
Thanks for sending this code, it works as you told in your reply.
However, after reviewing my code, I think I missed some data for you. My problem is a bit different, but I was able to modify your code a little bit to reproduce my issue.

Imagine that you have a class DataRequest that is a single data request that you can send via fax or email (or phone, whatsapp, pigeon, etc), and in each fax/email you can add multiple DataRequests. If I have some requests IDs and want to get all other requests that were sent together, I'd have the following code:

using Microsoft.Extensions.Logging;
using System.ComponentModel.DataAnnotations;

await using (var ctx = new BlogContext())
{
    await ctx.Database.EnsureDeletedAsync();
    await ctx.Database.EnsureCreatedAsync();
    var related0 = new EmailSent() { DatasRequested = new List<DataRequest>() { new() { PersonName = "X" }, new() { PersonName = "Z" } } };
    var related1 = new FaxSent() { DatasRequested = new List<DataRequest>() { new() { PersonName = "Y" } } };

    ctx.EmailsSent.Add(related0);
    ctx.FaxesSent.Add(related1);
    await ctx.SaveChangesAsync();
}

await using (var ctx = new BlogContext())
{
    _ = await ctx.DataRequests
        .AsSplitQuery()
        .Include(p => p.EmailSent).ThenInclude(p => p.DatasRequested)
        .Include(p => p.FaxSent).ThenInclude(p => p.DatasRequested)
        .Where(p => p.Id == 1)
        .ToListAsync();
}

public class BlogContext : DbContext
{
    public DbSet<DataRequest> DataRequests { get; set; }
    public DbSet<EmailSent> EmailsSent { get; set; }
    public DbSet<FaxSent> FaxesSent { get; set; }

    static ILoggerFactory ContextLoggerFactory
        => LoggerFactory.Create(b => b.AddConsole().AddFilter("", LogLevel.Information));

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(@"Server=localhost;Database=test;Integrated Security=True;Trusted_Connection=True;Connect Timeout=60;ConnectRetryCount=0")
            .EnableSensitiveDataLogging()
            .UseLoggerFactory(ContextLoggerFactory);

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<DataRequest>().HasQueryFilter(r => !r.IsDeleted);
        modelBuilder.Entity<EmailSent>().HasQueryFilter(r => !r.IsDeleted);
        modelBuilder.Entity<FaxSent>().HasQueryFilter(r => !r.IsDeleted);
    }
}

public class DataRequest
{
    public int Id { get; set; }
    public bool IsDeleted { get; set; }

    public string PersonName { get; set; }

    public int? EmailSentId { get; set; }
    public int? FaxSentId { get; set; }

    public EmailSent EmailSent { get; set; }
    public FaxSent FaxSent { get; set; }
}

public class EmailSent
{
    [Key]
    public int EmailSentId { get; set; }
    public bool IsDeleted { get; set; }

    public List<DataRequest> DatasRequested { get; set; }
}

public class FaxSent
{
    [Key]
    public int FaxSentId { get; set; }
    public bool IsDeleted { get; set; }

    public List<DataRequest> DatasRequested { get; set; }
}

This is generating the following selects using version 6.0.0-rc.1.21452.10:

SELECT [d].[Id], [d].[EmailSentId], [d].[FaxSentId], [d].[IsDeleted], [d].[PersonName], [t].[EmailSentId], [t].[IsDeleted], [t0].[FaxSentId], [t0].[IsDeleted]
FROM [DataRequests] AS [d]
LEFT JOIN (
    SELECT [e].[EmailSentId], [e].[IsDeleted]
    FROM [EmailsSent] AS [e]
    WHERE [e].[IsDeleted] = CAST(0 AS bit)
) AS [t] ON [d].[EmailSentId] = [t].[EmailSentId]
LEFT JOIN (
    SELECT [f].[FaxSentId], [f].[IsDeleted]
    FROM [FaxesSent] AS [f]
    WHERE [f].[IsDeleted] = CAST(0 AS bit)
) AS [t0] ON [d].[FaxSentId] = [t0].[FaxSentId]
WHERE ([d].[IsDeleted] = CAST(0 AS bit)) AND ([d].[Id] = 1)
ORDER BY [d].[Id], [t].[EmailSentId], [t0].[FaxSentId]
go

SELECT [t1].[Id], [t1].[EmailSentId], [t1].[FaxSentId], [t1].[IsDeleted], [t1].[PersonName], [d].[Id], [t].[EmailSentId], [t0].[FaxSentId]
FROM [DataRequests] AS [d]
LEFT JOIN (
    SELECT [e].[EmailSentId]
    FROM [EmailsSent] AS [e]
    WHERE [e].[IsDeleted] = CAST(0 AS bit)
) AS [t] ON [d].[EmailSentId] = [t].[EmailSentId]
LEFT JOIN (
    SELECT [f].[FaxSentId]
    FROM [FaxesSent] AS [f]
    WHERE [f].[IsDeleted] = CAST(0 AS bit)
) AS [t0] ON [d].[FaxSentId] = [t0].[FaxSentId]
INNER JOIN (
    SELECT [d0].[Id], [d0].[EmailSentId], [d0].[FaxSentId], [d0].[IsDeleted], [d0].[PersonName]
    FROM [DataRequests] AS [d0]
    WHERE [d0].[IsDeleted] = CAST(0 AS bit)
) AS [t1] ON [t].[EmailSentId] = [t1].[EmailSentId]
WHERE ([d].[IsDeleted] = CAST(0 AS bit)) AND ([d].[Id] = 1)
ORDER BY [d].[Id], [t].[EmailSentId], [t0].[FaxSentId]
go

SELECT [t1].[Id], [t1].[EmailSentId], [t1].[FaxSentId], [t1].[IsDeleted], [t1].[PersonName], [d].[Id], [t].[EmailSentId], [t0].[FaxSentId]
FROM [DataRequests] AS [d]
LEFT JOIN (
    SELECT [e].[EmailSentId]
    FROM [EmailsSent] AS [e]
    WHERE [e].[IsDeleted] = CAST(0 AS bit)
) AS [t] ON [d].[EmailSentId] = [t].[EmailSentId]
LEFT JOIN (
    SELECT [f].[FaxSentId]
    FROM [FaxesSent] AS [f]
    WHERE [f].[IsDeleted] = CAST(0 AS bit)
) AS [t0] ON [d].[FaxSentId] = [t0].[FaxSentId]
INNER JOIN (
    SELECT [d0].[Id], [d0].[EmailSentId], [d0].[FaxSentId], [d0].[IsDeleted], [d0].[PersonName]
    FROM [DataRequests] AS [d0]
    WHERE [d0].[IsDeleted] = CAST(0 AS bit)
) AS [t1] ON [t0].[FaxSentId] = [t1].[FaxSentId]
WHERE ([d].[IsDeleted] = CAST(0 AS bit)) AND ([d].[Id] = 1)
ORDER BY [d].[Id], [t].[EmailSentId], [t0].[FaxSentId]
go

I could group the Email/Fax/etc in another class, but the issue would remain if I had another entity linked to the DataRequest in the same way as Email/Fax classes are.

I hope this code helps you.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported
Projects
None yet
Development

No branches or pull requests

4 participants