Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute multiple LINQ queries in a single round-trip (aka Expose batching read API to users) #10879

Open
roji opened this issue Feb 5, 2018 · 34 comments

Comments

@roji
Copy link
Member

roji commented Feb 5, 2018

While users can execute multiple updates in a single ADO batch, no such feature currently exists for reading - each query is executed separately. For example, if a single web request requires multiple queries to fulfill (fairly typical case IMHO), then that many database roundtrips have to occur.

Reducing roundtrips can bring a very significant performance boost, so some sort of API which allows a user to defined multiple queries and have them executed together could be useful.

Note that there is a trade-off here: batching typically implies buffering the earlier query results, so the memory overhead is (potentially much) larger than executing different queries and streaming their results; if the results are small, batching makes sense, but if they're huge, multiple queries may make more sense.

The buffering overhead could mitigated by MARS, where supported. In other words, the user-defined "read batch" could actually execute multiple MARS queries (in one roundtrip, if supported), and allow the user to stream the results. Ideally, the user-facing "batching" API shouldn't expose this.

Note: this issue isn't about using batching in internally-generated queries (that's what #10878 is about).

@smitpatel
Copy link
Member

Related #8127

@popcatalin81
Copy link

This feature would be very useful to have for the apps I'm currently working on (online-offline sync by reading delta changes from a set of tables)

@NinoFloris
Copy link

Was just looking for this!

Our scenario absolutely demands collection based retrieval instead of joins as our joined tables have low cardinality and a lot of columns/data. However currently EF doesn't provide any other option than trading transfer time for roundtrips.

https://github.com/zzzprojects/EntityFramework-Plus/tree/master/src/shared/Z.EF.Plus.QueryFuture.Shared brings this functionality but at an extremely high cost, an 80ms request becomes 100ms so there is no win in small result sets at all.
Most likely this is because of all the ad-hoc reflection and reparsing of queries, it works but it's far from optimal.

I would love a read batching api.

@AndriySvyryd
Copy link
Member

Related: #18990

@roji
Copy link
Member Author

roji commented Aug 10, 2020

This seems to be a feature of EF Plus: https://entityframework-plus.net/query-future

@roji
Copy link
Member Author

roji commented Mar 3, 2021

Possibly consider batching updates as well.

@AndriySvyryd
Copy link
Member

Possibly consider batching updates as well.

Both bulk and regular ones.

Also consider FromSql and Find

@ziaulhasanhamim
Copy link

Will this feature ever arrive?

@roji
Copy link
Member Author

roji commented Sep 9, 2022

@ziaulhasanhamim this feature is in the backlog, and currently has only 21 votes. It's also quite complex to design and implement, so it may take some time before we get around to it.

@roji
Copy link
Member Author

roji commented Jul 27, 2023

As a possible, preliminary API shape (thanks @divega for bringing this up recently!):

var batchReader = await context.BatchAsync(
    ctx => ctx.Blogs.Where(b => b.Name.StartsWith("foo")),
    ctx => ctx.Users.Where(u => u.Username.EndsWith("bar")),
    ctx => ctx.Blogs.Where(b => b.Id > 3).ExecuteDelete());

var blogs = await batchReader.NextAsListAsync<Blog>();

foreach (var user in batchReader.NextAsEnumerable<Blog>())
{
    // ...
}

var numDeleted = await batchReader.NextAsync<int>();

public static Task<BatchResultsReader> BatchAsync<TContext>(
    this TContext context, params Expression<Func<TContext, object>>[] queries)
{
    throw new NotImplementedException();
}

Some notes/caveats:

  • I also considered an API which accepts a single Expression lambda that returns an anonymous type, and simply returns that anonymous type populated with the results of the different queries. While it's a simpler API shape, it wouldn't support any kind of streaming - BatchAsync would have to return after all results for all batch queries have completed. The above API does allow for full streaming.
  • Fortunately ExecuteUpdate/Delete (and any future ExecuteInsert) return the number of rows affected (so do the SQL APIs, e.g. ctx.Database.ExecuteSql). Otherwise, a void-returning method would not be usable with this API.

@bachratyg
Copy link

Would this support using IQueryables directly? I have a LOT of dynamically generated queries that would be difficult to create in an expression context.

var q1 = context.Table1.Where(...);
var q2 = context.Table2.Where(...);
var batchReader = await context.BatchAsync(ctx => q1, ctx => q2);

BTW it's nice that you can mix queries and updates or handle multiple updates with this. The current API shape is not really helpful in getting the command of a batch update (to combine them manually) like ToQueryString on queries.

@bachratyg
Copy link

A query with an unspeakable type won't work with this shape, at least not in a typed manner e.g.

var batchReader = await context.BatchAsync(
    ctx => ctx.Blogs.Select(b => new { PostCount = b.Posts.Count() })
);
var postCounts = await batchReader.NextAsListAsync<???>();

@roji
Copy link
Member Author

roji commented Jul 27, 2023

Would this support using IQueryables directly?

I think that could be doable, yes... Though ToQueryString doesn't seem that important a reason - after all you can always copy-paste the expression out temporarily and/or call ToQueryString in the debugger or something.

A query with an unspeakable type won't work with this shape

Yeah, that's true and is an unfortunate limitation. Projecting out an anonymous type would solve this, but wouldn't allow streaming, i.e.:

var batchReader = await context.BatchAsync(ctx => new
{
    Blogs = ctx.Blogs.Where(b => b.Name.StartsWith("foo")),
    Posts = ctx.Users.Where(u => u.Username.EndsWith("bar")).Select(p => new { ... }),
    Rows => ctx.Blogs.Where(b => b.Id > 3).ExecuteDelete()
});

Not sure if there's a single API that could support both...

@alrz
Copy link

alrz commented Jul 27, 2023

Couldn't return a generic type around the inferred anon type?

var batchReader = await context.BatchAsync(ctx => new { Blogs = .. , Posts = .., Rows = .. })
var blogs = await batchReader.NextAsListEtc(x => x.Blogs) // projection over inferred type argument

var batchReader = await context.BatchAsync(ctx =>  ctx.Blogs.Select(b => new { PostCount = b.Posts.Count()));
var projection = await batchReader.NextAsListEtc(); // the inferred type argument itself
// projection : IEnumerable<{int PostCount}>

@bachratyg
Copy link

I think that could be doable, yes... Though ToQueryString doesn't seem that important a reason - after all you can always copy-paste the expression out temporarily and/or call ToQueryString in the debugger or something.

My bad. It was actually CreateDbCommand. Right now that can be used to manually stitch together multiple queries but not updates. This feature would definitely supersede the manual approach.

@roji
Copy link
Member Author

roji commented Jul 27, 2023

@alrz what's the type returned by your BatchAsync? It's not possible to magically have the type of the expression assigned Blogs be somehow reflected in the type returned by BatchAsync.

If we want to go down this approach, though, then for the original proposal above:

var batchReader = await context.BatchAsync(
    ctx => ctx.Blogs.Where(b => b.Name.StartsWith("foo")),
    ctx => ctx.Users.Where(u => u.Username.EndsWith("bar")),
    ctx => ctx.Blogs.Where(b => b.Id > 3).ExecuteDelete());

BatchAsync could be generic over 3 types, which are inferred from the return types of the lambdas. It would then return a BatchResultReader that's also generic over those three types; we would no longer expose a simple "Next", but rather something like GetFirstResult, GetSecondResult etc., each returning the correct type based on the generic definition.

Of course, we'd need to create overloads of BatchAsync and BatchResultsReaders for each lambda count (for 2 batched lambdas, for 3 batched lambdas...), up to some arbitrary limit. It's not great but it does provide both streaming and the ability to use anonymous types.

@alrz
Copy link

alrz commented Jul 27, 2023

what's the type returned by your BatchAsync?

I think this could demonstrate the gist of the idea: sharplab.io - (add some Expression<..> where Func doesn't suffice.)

#nullable disable
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

var context = new MyContext();
var batchReader = await context.BatchAsync(ctx => new {
    Blogs = ctx.Blogs.Where(b => b.Name.StartsWith("foo")),
    Users = ctx.Users.Where(u => u.Username.EndsWith("bar")),
    //Deleted = ctx.Blogs.Where(b => b.Id > 3).ExecuteDelete()
});

var blogs = batchReader.GetResultAsync(x => x.Blogs);

static class ContextExt {
    public static Task<Reader<T>> BatchAsync<TContext, T>(this TContext context, Func<TContext, T> func) where TContext : DbContext => throw null;
}

class DbContext {}
class MyContext : DbContext {
    public List<Blog> Blogs { get;set; }
    public List<User> Users { get;set; }
}

class Reader<T> {
    public Task<TResult> GetResultAsync<TResult>(Func<T, TResult> func) => throw null;
    // more specific overloads as needed ..
    // public Task<TResult> GetResultAsync<TResult>(Func<T, IEnumerable<TResult>> func) => throw null;
}

record Blog(string Name, int Id);
record User(string Username);

GetFirstResult, GetSecondResult

I intend to avoid exactly that, so you'd need only one generic overload for all of these.

@bachratyg
Copy link

we would no longer expose a simple "Next", but rather something like GetFirstResult, GetSecondResult etc., each returning the correct type based on the generic definition

Not the best experience but if you split NextAs... to GetResult (ToListAsync etc) and MoveNext then you can slice off the results one by one e.g.

public class BatchResult<T1, T2, T3>
{
    public async Task<List<T1>> ToListAsync();
    public async Task<BatchResult<T2, T3>> GetNextResultAsync();
}

var blogs = batch.ToListAsync();
batch2 = batch.GetNextResultAsync();
var users = batch2.ToListAsync();
...

How about using some wrapper?

var blogsCmd = BatchItem.Create(ctx => ctx.Blogs);
var postsCmd = BatchItem.Create(ctx => ctx.Posts);
await context.BatchAsync(blogsCmd, postsCmd); // returns void
var blogs = await blogsCmd.GetResult(); // As long as you execute them in the original order it's streamed even without MARS
var posts = await postsCmd.GetResult();

@roji
Copy link
Member Author

roji commented Jul 27, 2023

@alrz oh I see now, yeah, that doesn't look too bad. It's a bit weird to provide temporary naming to each query result (Blog, Posts...) but it solves both streaming and unspeakable types nicely. What do you think @bachratyg?

@bachratyg
Copy link

bachratyg commented Jul 27, 2023

I'm a bit reluctant since these commands could require a specific order to execute and that's not communicated explicitly through an anonymous type. In practice the compiler emits them in order but I don't think that's a guarantee and we may or may not get an optimization in the future where {A=1, B=2} and {B=1,A=2} are the same type.
If the exact order can always be determined from the member init expression instead of the concrete type then this is of no concern.

@roji
Copy link
Member Author

roji commented Jul 27, 2023

If the exact order can always be determined from the member init expression instead of the concrete type then this is of no concern.

Yeah, we'd definitely not look at the type for this. For an anonymous type instantiation, we get a NewExpression where the Members is an ordered collection of MemberInfos, with the order corresponding to the Arguments. So the ordering written by the user should be fully preserved in the expression node that comes out.

In addition... The expressions representing the values to be assigned may have side effects; at that point, if the ordering weren't the same as what the user specified, that may change the actual values as there may be interactions between them.

I'm also not a huge fan of using an anonymous type here, but I think it should be safe and does solve the problem nicely.

@bachratyg
Copy link

It's a bit weird to provide temporary naming to each query result

If value tuple is allowed then you only need "names" when getting the result

var batchReader = await context.BatchAsync(ctx => (ctx.Blogs, ctx.Posts));
var blogs = await batchReader.GetResultAsync(x => x.Item1);

@alrz
Copy link

alrz commented Jul 27, 2023

Regarding tradeoffs:

// order guaranteed but NO type-safety and NO connection between the type and the actual query
var blogs = await batchReader.NextAsListAsync<Blog>();
foreach (var user in batchReader.NextAsEnumerable<User>())
var numDeleted = await batchReader.NextAsync<int>();
// type-safe but order-sensitive, and since queries are named it's clear what the result is
var blogs = await batchReader.GetResultAsync(x => x.Blogs);
foreach (var user in batchReader.GetEnumerable(x => x.Users))
var numDeleted = await batchReader.GetResultAsync(x => x.Deleted);

Perhaps an analyzer could warn where the result is fetched out of order.

@roji
Copy link
Member Author

roji commented Jul 27, 2023

BTW everyone, just to set expectations, this definitely isn't going to happen for EF Core 8.0, and I'm not sure it'll be high-priority enough for 9 - I mainly wanted to jot down some ideas on the API surface from a recent discussion.

@m-gasser
Copy link

I think the following API surface would be neat:

var blogsTask = ctx.Blogs.Where(b => b.Name.StartsWith("foo")).ToListLazyAsync();
var usersTask = ctx.Users.Where(u => u.Username.EndsWith("bar")).AsAsyncEnumerableLazy();
var deleteTask = ctx.Blogs.Where(b => b.Id > 3).ExecuteDeleteLazyAsync();

var blogs = await blogsTask; // all 3 commands get sent to the server here

await foreach (var user in users) // no IO here
{
    // ...
}

var deleteResult = await deleteTask; // no IO here

The synchronous APIs would work similarly:

var blogsLazy = ctx.Blogs.Where(b => b.Name.StartsWith("foo")).ToListLazy();
var usersLazy = ctx.Users.Where(u => u.Username.EndsWith("bar")).AsEnumerableLazy();
var deleteLazy = ctx.Blogs.Where(b => b.Id > 3).ExecuteDeleteLazy();

var blogs = blogsLazy.Value; // all 3 commands get sent to the server here

foreach (var user in users.Value) // no IO here
{
    // ...
}

var deleteResult = deleteLazy.Value; // no IO here

@roji
Copy link
Member Author

roji commented Jul 28, 2023

@m-gasser for one thing, it's important to be able to stream the results back, even though all commands get sent to the server together. In other words, the foreach enumerating over users should definitely do I/O when necessary (same for the deleteTask). Otherwise we're needlessly loading all the queries' result sets into memory.

Otherwise, I can't say I'm a fan of having to duplicate all the possible terminating operators (ToListLazyAsync, ExecuteDeleteLazyAsync)... Your proposed API also leaves it unclear what happens if a non-lazy query/update is executed in between the lazy calls - just having a single method (e.g. BatchAsync) that accepts all the lambdas together solves both of these issues.

@ArtemAvramenko
Copy link

ArtemAvramenko commented Jan 26, 2024

I used to use Entity Framework Plus, which has a very convenient API, but for reasons of high CPU load I gave up on it (zzzprojects/EntityFramework-Plus#779).

It would be great if one of the future EF versions would include a batch API that would not require explicitly specifying entity types when reading, as they could be inferred automatically.

Still a convenient, though compromised, solution might be something like this:

var blogs = ctx.Queue(ctx => ctx.Blogs.Where(b => b.Name.StartsWith("foo")).ToListAsync());
var users = ctx.Queue(ctx => ctx.Users.Where(u => u.Username.EndsWith("bar")).ToListAsync());
var numDeleted = ctx.Queue(ctx => ctx.Blogs.Where(b => b.Id > 3).ExecuteDeleteAsync());

await ctx.ProcessQueueAsync(); // Materialize everything in the queue

foreach (var blog in blogs.Value) // List<Blog>
{
    // ...
}

foreach (var user in users.Value) // List<User>
{
    // ...
}

Log($"Deleted {numDeleted.Value} blogs"); // int

or a variation of this solution with a fluent syntax (may be more difficult to implement and more ambiguous):

var blogs = ctx.Blogs.Where(b => b.Name.StartsWith("foo")).Queue(x => x.ToListAsync());
var users = ctx.Users.Where(u => u.Username.EndsWith("bar")).Queue(x => x.ToListAsync());
var numDeleted = ctx.Blogs.Where(b => b.Id > 3).Queue(x => x.ExecuteDeleteAsync());

@roji
Copy link
Member Author

roji commented Jan 26, 2024

@ArtemAvramenko thanks, that proposal does look pretty good.

One thing which I'm not a fan of, is that this API shape seems to be incompatible with streaming; in other words, it should be possible to not have ProcessQueryAsync buffer and materialize everything, but rather allow users to first stream the blogs, then the users, etc. But I think the API shape can be tweaked to allow this too.

@ArtemAvramenko
Copy link

@roji Do I understand correctly that if there is no materialization, the navigation properties will not be populated after the batch is processed? So we will have to use lookup dictionaries or something similar to link multiple entities together?

@roji
Copy link
Member Author

roji commented Feb 2, 2024

@ArtemAvramenko well first, there may not be any related entities, right? Like in the above sample you posted, Blogs and Users are just loaded without anything related to them.

But even when there are related entities, for a regular single query, EF currently injects orderings into the query, to ensure that all the rows belonging to a single Blog are grouped together as a single group, so that EF can stream the results (i.e. it returns the Blog with all its Posts once it sees a row for a new Blog). While this allows streaming and is useful for huge resultsets, we plan to look at removing those orderings and use lookup dictionaries instead, since the orderings make the queries a lot heavier on the database side (see #29171).

In any case, I'm just saying that whatever API shape we draw up here shouldn't be fundamentally incompatible with streaming rows from the database as they arrive without buffering the entire resultset in memory.

This could be done by having Queue() simply return an IEnumerable<Blog>, which the user could enumerate directly, streaming results. If in the middle of enumerating the blogs, the code starts enumerating users, this would consume and discard all remaining blogs and start enumerating the users which come afterwards. However, in this scenario Queue() over ExecuteDeleteAsync obviously cannot return a simple int - we could have a different method returning a Task<int> instead, or something similar.

@ArtemAvramenko
Copy link

Hopefully it will be at least IAsyncEnumerable, otherwise the threads will be locked, which will negatively affect performance. Intuitively it seems to me ToListAsync is faster than iterating row by row, but I haven't examined how ToListAsync is implemented under the hood, maybe it's the same iterating there.

@roji
Copy link
Member Author

roji commented Feb 3, 2024

Hopefully it will be at least IAsyncEnumerable, otherwise the threads will be locked, which will negatively affect performance.

Absolutely, EF supports async everywhere.

Intuitively it seems to me ToListAsync is faster than iterating row by row, but I haven't examined how ToListAsync is implemented under the hood, maybe it's the same iterating there.

ToList and ToListAsync do nothing more than enumerate over the rows and insert them into a list - they're definitely slower than simply enumerating directly over the results; especially considering that the list being populated may need to be resized many times (and therefore copied) during this process.

@davejitsu

This comment was marked as duplicate.

@roji

This comment was marked as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests