-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Holistic end-to-end pagination feature #33160
Comments
If done, would this be a part of Microsoft.EntityFrameworkCore or some kind of additional official EF Core pagination package? I'm asking because this seems to be the first "end-to-end" api in EF Core that is this high level (requires configuration params such as whether we need the additional info, and returns a custom model). A few quick thoughts, and raising some questions. If this single abstract api is to support the different pagination techniques, how would we differentiate between wanting an offset or a keyset query (especially in the first call where you don't have a continuation token)? In addition, these two techniques feel a bit different from Cosmos, which is an altogether different db and has an adapter, and you could just use keyset pagination with it anyway (I'm assuming). How would you call this api with the intention of using keyset pagination, but your adapter happens to be Cosmos? (I'm not really experienced with Cosmos though so I might be missing something) Opaque tokens are great, it was something I was investigating in KeysetPagination. In the case of keyset pagination, if the keyset includes a string column then this token could be large in size. I think it's ok and that it's the user's responsibility (in the first place, an unbounded string column in the keyset is orthogonal to using keyset pagination for perf), but worth noting especially that this token could end up being a part of an http header. For offset pagination, the abstract api would also have to support the ability to do random access through specifying a certain page directly (the only advantage to offset), without having a token first. Going back to my prev question about how to choose what technique we want in this api, it seems to me that reconciling these different techniques needs either some kind of an involved options object as a param, or more than one api. In the case of MR.AspNetCore.Pagination I have different apis, but in this case I would prefer some kind of options param if possible. |
Note that we've implemented a 1st-class query pagination API for Cosmos in 9.0 (#24513), and discussed the possibility of doing something similar at a more global level, including relational. |
|
@glen-84 Keyset pagination also has functional disadvantages in certain cases (random page access). It's not a good idea to enforce such a decision on the framework level. |
I'm actually generally leaning towards indeed only supporting keyset pagination via this API... That would simplify things (e.g. we wouldn't need to design an API which allows the users to choose the pagination technique), would indeed guide the user in the right direction, and offset pagination is simple enough to implement that it doesn't seem like this higher-level API would be that useful for it... But that's just my current state of thinking, nothing definitive. I also agree that we should consider injecting the PK into the ordering as needed, although this may be a bit trickier than it looks. For example, if the user orders by some property in the model, we can indeed find out from the model if it's unique, but they may order by arbitrarily complex expressions containing a column (or several) - I'm not sure it's feasible for EF to calculate whether such arbitrary expressions represent a unique value or not. It may be better to simply warn about the possible non-uniqueness and let the user react explicitly (though here again, I'm not sure). |
If this high level api only supports keyset pagination, wouldn't that be a problem in terms of discoverability especially for new users? Dotting into the query and finding a That said, I find it a bit of a waste for a new convenient api with the ability to get total number of rows without additional roundtrips (I'm assuming this is possible using window functions), etc, to be limited to keyset. While offset itself is simple, these additional requirements are not, especially if done in a performant way. This would be giving the more complicated method (which also has more caveats) more convenience in the framework, while the easier more common method remains barebones. Feels a bit unbalanced to me (even in my most complicated app data wise, I use both methods, so they both have a place in my opinion), although I get that supporting multiple pagination methods would complicate the design and impl, and that they're too different (offset doesn't need opaque tokens, nor has prev/next). |
That's a good question, and I don't have a fully firm idea of what we should do here yet. I do know that we want to gently push users towards doing the right thing, i.e. keyset pagination. For example, one possible user flow would be the following:
True, though the point here for me is also to make the more complicated method less complicated for the user, by implementing it under a nicer ToPageAsync API; in other words, the point here is to remove the complexity barrier to keyset pagination.
That's a good point - allowing easily getting of total rows (and also possibly HasNext/HasPrev?) could be a good reason to support offset pagination as well. Note that the ultimate goal here is for users to be able to express all of these queries themselves, via LINQ - including using window functions for the row count; ideally ToPageAsync() should only be a bit of sugar, to generate SQL that can also be generated via regular LINQ queries. So even if we don't support offset pagination with this new feature, long-term users would still be able to do offset pagination with row count - just without the nice sugar (just like they can manually do Skip()/Take() today).
I do actually think that offset pagination could be implemented via the same API shape, exposing opaque tokens and prev/next: the opaque token would simply encode th offset, and passing that token back into the next query would simply increment/decrement it. In other words, unless I'm missing something, we could come up with a scheme where the only difference between the two pagination techniques would be in what's actually stored in the opaque token (EF would of course recognize the different token types and produce the appropriate query). My main reason for not loving this is that it requires the user to specify which pagination method they want, e.g. via some parameter to ToPageAsync(), or entirely different operators (ToOffsetPageAsync()), or similar; but maybe that's OK. |
Right, that sounds good and does solve a big part of my concern if this particular api becomes keyset only.
Regarding opaque tokens and offset pagination, I assume what you have in mind, if we pivot to the frontend part, is something like the following: But the most commonly used paginator, and the one that makes use of what little advantage offset pagination has over keyset, is this: I'm referring in particular to the many page buttons. This paginator is usually handled in the front using the following props:
Since opaque tokens get created inside the framework, that means you would need to configure this api further with a window size (or similar) to get all the required opaque tokens for the page buttons above. But that's still not enough to be able to render the above paginator (2), because the frontend won't know what tokens belong to what page and also won't be able to render boundaries (...). This is why I said "offset doesn't need opaque tokens, nor has prev/next" (has prev/next can also be handled from just knowing the number of pages), and is why I said they're too different. Does that make sense? |
As mentioned by Shay, there's nothing stopping you from continuing to use skip/take, but why make it easier for developers to do something less performant?
I agree with this. If none of the order-by expressions are simple property accesses on a unique column, then log a warning.
I had the same thought, but I still believe that it should be limited to keyset pagination. Offset pagination is simply inefficient, "go to page 20", and you have to load all 20 pages just to drop the first 19. Why make that easier to do? If anything,
While I'm still against it, I'm not sure why it would require specifying the pagination method, since this is encoded in the token. // The tokens would be opaque, just showing the decoded values here.
query.ToPageAsync(token: "offset=50", pageSize: 10); // offset=n -> use offset pagination
query.ToPageAsync(token: "age:21,id:123", pageSize: 10); // prop1:val1,... -> use keyset pagination Utility methods to create paging tokens: Page.CreatePagingToken(offset: 50);
Page.CreatePagingToken(keySet: new Dictionary<string, object> { { "age", 21 }, { "id", 123 } }); |
Yeah, I generally agree with what you're saying here. If what the user wants is a paginator with a list of all/many pages (your 2nd image - sequential pagination UI), then they're looking for offset pagination. The ToPageAsync API shape we've been discussing here is indeed geared for sequential pagination rather than random access, and so has little to offer as an API for people wanting random access... People are better off just doing manual Skip/Take themselves and managing things themselves. But there's a different way to look at this... An EF pagination API could support both pagination methods. That is, as long as the user clicks next/perv (sequential pagination), we use the regular keyset pagination already discussed. But EF's API would also allow obtaining an opaque token given an offset via a specific API; you'd then be able to pass that token to ToPageAsync to perform a random access jump. And the client code obtaining the token can know which page they're on, since they've calculated the offset and the page size is known. This "hybrid" pagination API would allow both efficient keyset pagination when the user clicks next/prev, and random access pagination when the user clicks on an arbitrary page - assuming the developer chooses to expose that via the UI (and understands the perf implications). Now, re perf implications... I do think there may be cases where someone may want to use offset pagination even for next/prev (i.e. no keyset pagination at all); EF could implement that by having even the next/prev opaque tokens EF exposes backed by offset pagination. One possible scenario here is for UIs which allow pagination over arbitrarily-sorted data - the user chooses which field to sort by; in these scenarios, it's very unlikely there will be database indexes for all sorting combinations. Of course, without an index, keyset pagination probably doesn't work better than offset pagination (maybe even worse?). So to support this, there would need to be some way for the user to indicate which pagination method to use (keyset vs. offset). The above is of course what we could do, rather than hard priorities etc. If it all makes sense (please let me know your thoughts), we could split things down to the following features:
In your example above, the pagination method is specified in the parameter passed to Page.CreatePagingToken (rather than to ToPageAsync) - I'm not sure that's how it's going to work (or whether it's even possible)... For keyset pagination, I don't think we want a weakly-typed dictionary as in your example; ideally we'd simply infer the keyset (and so the token) from the LINQ query itself, by looking at the orderings of the top-level SELECT; this way, users simply append a simple ToPageAsync() on a query and things just work. If we go with that kind of API, then some user gesture would be necessary to allow users to tell us that they want offset pagination instead (assuming we want to implement this, feature 4 in my list above). |
That's not really "specifying the pagination method", but rather "generating a pagination token", which is opaque and understood by EF internals. If tokens are taken from the It seems cleaner to use the token to identify the type of pagination, as opposed to a different method name or argument: query.ToPageAsync(token: "", pageSize: 10);
query.ToKeySetPageAsync(token: "", pageSize: 10);
query.ToOffsetPageAsync(token: "", pageSize: 10);
query.ToPageAsync(token: "", pageSize: 10, type: KEY_SET);
query.ToPageAsync(token: "", pageSize: 10, type: OFFSET);
Even if you infer, you still need to generate next/prev tokens, which would need to encode the first/last values, right? The |
Right, I understand that but my comment here was particularly in reply to "offset pagination could be implemented via the same API shape, exposing opaque tokens". I was trying to emphasize that there's a difference between using offset pagination sequentially (same keyset api shape possible) and using it for random access (common use case, same api shape not possible).
But from what you say above in your 2nd para and the list after, we're on the same page, as what we're describing here isn't just the same API shape anymore, but an additional offset specific api that complements that.
Looks good to me. But off the top of my head, there might be a missing piece with how to create the tokens for first/last pages for example. That sounds very similar to the offset specific api, and could be a similar api? (also, creating first/last pages tokens is very different between keyset and offset)
Again, offset has valid functional advantages. That's a good enough reason to make it more performant if it's a low hanging fruit. What does "higher values" mean? When do you cross that threshold? Skipping time varies widely depending on the db engine, the machine it's running on, the indexing employed, the number of columns in the table, the query, etc. Acceptable query time also varies widely based on requirements. It's hard to objectively determine when a warning log is warranted without confusing users.
Not sure about this. That's way too loosely typed and isn't idiomatic C#. Looks closer to a frontend representation. |
I'm curious how often people use random page access. I don't even use the 2nd page of Google results most of the time. 😆
How can you make it more performant?
There's some data here, and it may also increase memory usage. But yes, the threshold would be somewhat arbitrary, so EF is unlikely to add this warning.
You have to take the offset or keyset from the front end/client, how else will the server know what to fetch?
What API did you have in mind? One issue with using the same You could do something like |
By querying the total count without an additional round trip using window functions. Casual users are unlikely to know or care enough to use window functions even if they get linq expr support.
Since it is an opaque token and the idea is to have a single In order to be able to do that though, you need specific token creation methods for the different cases (we already discussed one for random access offset, but this in general can also solve the first/last page tokens for keyset too). So that you either start by creating a token (first/last page) or you get an already created token from the frontend, and then call |
To summarize/comment on the above comments... These are mainly concerned with allowing the use of the basic, sequential paging API (ToPageAsync with prev/next opaque tokens) over offset pagination under the hood; this is case (4) in my comment above. Note that this is distinct from allowing "occasional" random access (via offset) when using keyset pagination - that's case (3) above and is (mostly) unrelated to this discussion. Obviously, if we want the same API (ToPageAsync) to be usable with different actual techniques under the hood (keyset, offset), the user must have some way of telling us which technique they want. From a user-facing API standpoint, the differentiation can be done in two places (this mainly summarizes @mrahhal and @glen-84 above):
First, I do feel that it's important to just allow users to do ToPageAsync() with a null token, and for that to start paginating from the start, using the default/best technique - I don't think we should force users to jump through API hoops and manually create "start of pagination" tokens which they then pass to ToPageAsync(), only because some users may want to explicitly choose a different technique (presumably an edge case). Second, as has been discussed above, no part of this API should be weakly-typed, so for me an API to create a keyset pagination token with a Dictionary isn't a good direction (as in #33160 (comment)). We should try to make this all work with EF itself getting all it needs from the LINQ query itself. This makes it difficult to imagine a "token creation API" (Page.CreateToken()) that's separate from queries. This all nudges me in the direction of making everything work with ToPageAsync (in the direction of option 1) rather than a token creation API (option 2). Rather than making the distinction in the overload names (ToKeySetPageAsync, ToOffsetPageAsync), I'd rather keep the name simple (ToPageAsync) and make the distinction via some parameter: this allows for an implicit default, and doesn't confront the user with the keyset/offset distinction unless they explicitly want that. @mrahhal you've also mentioned the need to start pagination from the end; I think this is similar in that it could be addressed either via a separate token creation API, or via a parameter to ToPageAsync() (fromEnd: true). As an additional note, the Cosmos provider has its own, built-in form of forward-only continuation token pagination; this was implemented for EF 9.0 via #24513. This could be considered a 3rd pagination technique alongside keyset and offset, where the server itself is the one generating the opaque continuation token; and it should be default technique when using Cosmos. All that could be another motivation for integrating the concept of a "pagination technique" into the API - but also raises the complication of supporting additional, provider-specific pagination techniques (rather than just keyset/offset). |
Very much agree with this.
Agreed. |
I think I was in the mindset of how this all works internally, what actual information is required to make keyset pagination work, but of course the ideal is to automate as much as possible, and in fact this is probably how it already works in Hot Chocolate. So I guess the complete API might look something like: Page page = query.ToPageAsync(
string? pagingToken,
int pageSize,
includeTotalCount = true,
forward = true,
method = PagingMethod.KeySet,
CancellationToken cancellationToken); |
EF currently provides various primitives for performing pagination (Take/Skip as well as keyset pagination - see docs). However, we regularly see a need for a higher-level, more end-to-end solution that would provide easy, efficient pagination capabilities without having to manually work everything together each time.
The basic API could be something like a terminating ToPageAsync() call (proposed by @AndriySvyryd in #24513, as well as by @michaelstaib in his work on GraphQL EF integration). The query would return a page containing the rows in the page, an easy means for fetching the next/previous page, and some additional data (HasNextPage, HasPreviousPage, TotalCount).
Following is a summary of general notes/requirements (thanks @michaelstaib for the useful discussion!):
context.Blogs.Include(b => b.Posts.OrderBy(...).ThenBy(...).Page(...)
). This has particular importance to GraphQL (/cc @michaelstaib).Prior art: MR.EntityFrameworkCore.KeysetPagination, Pagination.EntityFrameworkCore.Extensions
/cc @michaelstaib @mrahhal @SitholeWB
The text was updated successfully, but these errors were encountered: