Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to express SQL row value comparison syntax #26822

Open
Tracked by #27047 ...
mrahhal opened this issue Nov 24, 2021 · 29 comments
Open
Tracked by #27047 ...

Allow users to express SQL row value comparison syntax #26822

mrahhal opened this issue Nov 24, 2021 · 29 comments
Labels
area-query customer-reported punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. type-enhancement
Milestone

Comments

@mrahhal
Copy link

mrahhal commented Nov 24, 2021

Row value is part of the SQL standard -though not supported by every database in all places- and it allows using the following syntax (for example in a where clause):

WHERE (timestamp, id) > ('2019-03-10 00:00:00+00:00', 1032749689)

This is in particular a highly valuable syntax when doing keyset/seek/cursor pagination, and being able to use it with databases that support it will be a big win for seek pagination in particular, but also composite conditions that can gain from this syntax in general.

The row value syntax above is equivalent, in theory, to something like this:

WHERE
    (timestamp > '2019-03-10 00:00:00+00:00') OR
    (timestamp = '2019-03-10 00:00:00+00:00' AND id > 1032749689)

But, for one, this logic condition becomes more complex the more columns you have. In addition, the db implementation of row value will usually be much more efficient than this (there are a few things you can do that can help the db engine in the above condition, but it doesn't look nice or readable).

More info about row value here: https://use-the-index-luke.com/sql/partial-results/fetch-next-page#sb-row-values

My use case is a package I worked on that fully implements keyset pagination with EF Core, but does so by building the expressions so that it translates to the logical condition syntax above. I could use a row value translator.

For a related issue to keyset pagination see this: #9115

Proposed solution

What I would love to see is a db function that EF Core understands and translates into the row value syntax for db providers that support it. Something like this:

var timestamp = ...;
var id = 1032749689;

// One option:
.Where(b => EF.Functions.GreaterThan(new[] { b.Column1, b.Column2 }, new[] { 1, 2 }))
// Another option with a tuple override:
.Where(b => EF.Functions.GreaterThan((b.Column1, b.Column2), new[] { 1, 2 }))

// Translates to sql:
// WHERE (timestamp, id) > ('2019-03-10 00:00:00+00:00', 1032749689)

One thing I'm not sure of is if there's a system in place that would allow a consumer to know if the current database provider supports this syntax or not. Because I want to be able to fallback to my implementation in such a case.

If this seems like an interesting and feasible addition, I'm willing to work on it.

@roji
Copy link
Member

roji commented Nov 24, 2021

This is supported by PostgreSQL, SQLite, and MySQL (though your linked article claims that it isn't index compatible - that may have changed); it's not supported by SQL Server. I've just been looking into this recently, with regards to keyset pagination (see this deck, slides 22-23).

First, since row value comparison ((x, y) > (1, 2)) is logically the same as the decomposed version (x > 1 OR x = 1 AND y > 2), there may not be any actual value in specifying the former rather than the latter. The SQL is indeed terser/nicer, but the C# to express something like this wouldn't be (since C# doesn't support comparison over tuples). So if this has no actual perf benefits, I'd say it's better for users to simply write out their keyset pagination clause in C# as above. Are you aware of any such perf benefit?

Note that in the PostgreSQL, I recently implemented pattern matching over the decomposed version, identifying it and generating row value comparison in SQL instead (see npgsql/efcore.pg#2111, this was mostly a fun little project without any specific benefits in mind). This means that the C# side stays ugly, but the SQL becomes terser/nicer. We could promote this to relational, but I do think that should be done if there's a tangible benefit to doing so (beyond terser SQL).

@mrahhal
Copy link
Author

mrahhal commented Nov 25, 2021

Are you aware of any such perf benefit?

I've investigated this, and yes I'm sure of at least one perf benefit. In PostgreSQL, I created a table with a million records, and I have an index on created + id. I'm doing a simple seek + limit on the 2 columns. Here's the execution plan for a logical condition (x > a OR x = a AND y > b):

image

There's an Index Scan with a filter operation.

In contrast, here's the execution plan when using row value instead:

image
There's an Index Condition instead, which at least utilizes an access predicate on created, and maybe other columns if there's a valid index to use.

The difference stems from the fact that no db will be able to realize that it can use an access predicate on the 1st column when using the logical condition, but it understands exactly what access predicates to use when using row value. One way to force all dbs to recognize this without using row value is to add a redundant clause as following:

(x >= 1) AND
(x > 1 OR (x = 1 AND y > 2))

The first line above results in the db using an access predicate before filtering:

image
This optimization is something I also implemented here.

I'm not sure if this exactly matches in perf a db implemented row value for 2 columns, but it definitely won't match it for more than 2 columns (it'll be harder to keep generating these access predicate hints in the logic conditions properly then).

(There's one alternative form to do this without using a redundant clause, but it's much much harder for a human to understand.)

My test table is extremely simple (which is also why the filter had nothing to do after the access predicate), I'm sure the perf benefit will be more apparent in the real world.


I want to emphasize again how much easier it is to form the row value syntax than a generalized logical condition when we're dealing with more than a few columns. Granted, it's rare to have a query doing this over more than 2-3 columns, but anyway, the generalized condition expression for this is:

(x > a) OR
(x = a AND y > b) OR
(x = a AND y = b AND z > c) OR...

It's very easy to get wrong (and a horror to maintain).


npgsql/efcore.pg#2111

This is great. Trying to translate into row value from a db function instead of pattern matching will obviously be much easier too.

Also, when using efcore.pg, I'm wondering (haven't checked much of the code) if this will pick up the generated condition expressions I dynamically form in my package, since there's the additional optimization I have with the access predicate clause which might throw your pattern matcher off. (But with a quick look, it seems that it needs a very specific form when there's 3 columns, which wouldn't work with the generalized logical condition, and no support for more than that?).


In conclusion, I do think it has performance benefits as the db can optimize the execution because the intent is clear, and it would still be much easier to write in C# (even if uglier than others) than forming the error-prone logical condition ourselves.

I think there's also a lot to gain if the EFCore db provider that's being used automatically either choses to translate it to row value when supported and applicable, or form the logical condition otherwise (for reference, I do this here).

@roji
Copy link
Member

roji commented Nov 25, 2021

@mrahhal thank you for the details and for the perf investigation! I got slightly different-looking query plans in my test (see full details below), but the end result is the same; row values do indeed improve performance in PostgreSQL. This shows once again that it's always worth investigating generating simpler SQL.

Perf investigation

Database setup

CREATE TABLE products
(
    id INT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    sort1 INT,
    sort2 INT,
    name TEXT
);
CREATE INDEX IX_name ON products(sort1, sort2);

DO $$BEGIN
FOR i IN 1..5000000 LOOP
    INSERT INTO products (sort1, sort2, name) VALUES (i % 10, i % 5, 'Product' || i);
END LOOP;
END$$;

With row value comparison

EXPLAIN SELECT id, name FROM products WHERE (sort1, sort2) > (5, 2);

Resulting plan:

Bitmap Heap Scan on products  (cost=15989.82..73511.39 rows=1384438 width=36)
  Recheck Cond: (ROW(sort1, sort2) > ROW(5, 2))
  ->  Bitmap Index Scan on ix_name  (cost=0.00..15643.72 rows=1384438 width=0)
        Index Cond: (ROW(sort1, sort2) > ROW(5, 2))

Without row value comparison

EXPLAIN SELECT id, name FROM products WHERE sort1 > 5 OR sort1 = 5 AND sort2 > 2;

Resulting plan:

Bitmap Heap Scan on products  (cost=24815.80..99807.51 rows=2108806 width=18)
  Recheck Cond: ((sort1 > 5) OR ((sort1 = 5) AND (sort2 > 2)))
  ->  BitmapOr  (cost=24815.80..24815.80 rows=2184955 width=0)
        ->  Bitmap Index Scan on ix_name  (cost=0.00..21247.50 rows=1994009 width=0)
              Index Cond: (sort1 > 5)
        ->  Bitmap Index Scan on ix_name  (cost=0.00..2513.89 rows=190946 width=0)
              Index Cond: ((sort1 = 5) AND (sort2 > 2))

On the question of how users can express this in C#...

  • One way is to use the approach from npgsql/efcore.pg#2111, i.e. require users to write the decomposed, complicated version and pattern-match it into row value comparisons in EF Core.
    • I agree this gets complicated as more keys are added
    • The pattern matching approach indeed requires a specific tree structure. My work in npgsql/efcore.pg#2111 should pick up an arbitrary number of keys and combine them into a single row value comparison, but there may be query shape variations which wouldn't be picked up - I haven't done a full analysis there. Though we'd document the recommend query shape properly, so this may not be a concern.
  • The other way is to add a special EF Core API
    • The natural way to represent a row value in C# is value tuples, but unfortunately those have no comparison operators.
    • We can allow users to use arrays to express row values, e.g. ctx.Blogs.Where(b => EF.Functions.GreaterThan(new[] { b.Column1, b.Column2 }, new[] { 1, 2 }). This seems like an easier way to express things than the proposal above which chains Add. If the values are heterogeneous, users would need to specify new object[].
    • We could optionally also provide specialized overloads for some small value tuple sizes, which would be slightly more natural, and also avoid the problem of heterogeneity (e.g. ctx.Blogs.Where(b => EF.Functions.GreaterThan((b.Column1, b.Column2), (1, 2))). Though this may not be worth the effort.
  • Since row values are standard across almost all database types (except SQL Server), it makes sense to introduce something into relational; but SQL Server would need a way to opt out of this: EF.Functions.GreaterThan should need to throw an informative message pointing users to the decomposed form instead. We can translate to the more complex expanded form which doesn't use row values.

In any case, I've opened dotnet/EntityFramework.Docs#3582 to track adding docs on good pagination practices.

Thanks again for opening this issue!

@mrahhal
Copy link
Author

mrahhal commented Nov 25, 2021

Here's my own code for the table I created in my perf investigation for completeness.
CREATE TABLE t1 (
  id SERIAL PRIMARY KEY,
  created timestamp with time zone
);

CREATE INDEX IX_created ON t1 (
  created
);

CREATE INDEX IX_created_id ON t1 (
  created,
  id
);

-- Generates about 1.5 million records
INSERT INTO t1 (created)
SELECT g.created FROM generate_series(
  timestamp with time zone '2019-01-01',
  timestamp with time zone '2021-11-17',
  '1 minute'
) AS g (created);

And the particular queries (originally from the ef log when I was testing inside a sample app so forgive the verboseness):

-- Logical condition
SELECT "u"."id", "u"."created"
FROM t1 AS "u"
WHERE ("u"."created" < '2020-11-24 18:51:23.831772') OR (("u"."created" = '2020-11-24 18:51:23.831772') AND ("u"."id" < 21))
ORDER BY "u"."created" DESC, "u"."id" DESC
LIMIT 20

-- Logical condition with an added clause to hint for access predicate
SELECT "u"."id", "u"."created"
FROM t1 AS "u"
WHERE ("u"."created" <= '2020-11-24 18:51:23.831772') AND (("u"."created" < '2020-11-24 18:51:23.831772') OR (("u"."created" = '2020-11-24 18:51:23.831772') AND ("u"."id" < 21)))
ORDER BY "u"."created" DESC, "u"."id" DESC
LIMIT 20

-- Row value
SELECT "u"."id", "u"."created"
FROM t1 AS "u"
WHERE ("u"."created", "u"."id") < ('2020-11-24 18:51:23.831772', 21)
ORDER BY "u"."created" DESC, "u"."id" DESC
LIMIT 20

The other way is to add a special EF Core API

The natural way to represent a row value in C# is value tuples, but unfortunately those have no comparison operators.

Now that you mention this, it would be great having this in C# itself with tuples...

But otherwise, I really like this one you mentioned:

ctx.Blogs.Where(b => EF.Functions.GreaterThan(new[] { b.Column1, b.Column2 }, new[] { 1, 2 })

Better than what I proposed, looks great and very readable. The tuple overload is a good idea too, otherwise maybe always having it new object[] { ... } is better for maintainability (it's an expression so no boxing to worry about).

Since row values are standard across almost all database types (except SQL Server), it makes sense to introduce something into relational; but SQL Server would need a way to opt out of this: EF.Functions.GreaterThan should need to throw an informative message pointing users to the decomposed form instead.

What do you think about EFCore translating it to conditional logic (as I do in my package) in such a case? I think that's a pretty valid alternative for row value (still efficient + supported in all dbs) and would save consumers, especially package authors who want to use this and don't have control over the provider, from having to worry about whether it'll be supported or not. Otherwise, there will need to be some discoverability method to know beforehand whether we can use it and then use an alternative.

And thanks for the doc issue!

@roji
Copy link
Member

roji commented Nov 25, 2021

Better than what I proposed, looks great and very readable. The tuple overload is a good idea too, otherwise maybe always having it new object[] { ... } is better for maintainability (it's an expression so no boxing to worry about).

Yep, there's definitely no efficiency issue here. I do suspect that in 90% of cases, users will have 2, maybe 3 keys at most, so a shorter value tuple construct would be a bit nicer... But definitely not critical.

What do you think about EFCore translating it to conditional logic (as I do in my package) in such a case?

Absolutely. Assuming the team agrees, the thing I have in mind is a general, relational translator which produces a new RowValueExpression, which then gets generated in SQL. SQL Server (and possibly other databases which don't support rowsets) would instead opt into translating to the decomposed conditional logic variant instead. Since it's always possible to translate this and the question is only whether to choose between a more efficient variant and a less efficient one, that should be an internal detail controlled by the provider.

@roji roji changed the title Db function to translate row value syntax Allow users to express SQL row value comparison syntax Nov 25, 2021
@mrahhal
Copy link
Author

mrahhal commented Nov 25, 2021

Awesome! If possible I'd love to work on this when a design is decided.

@roji
Copy link
Member

roji commented Dec 6, 2021

Design decision: we think it's a good idea to introduce this as a building block.

Here's what this would introduce:

  • A new RowValueExpression expression type in EFCore.Relational.
  • Support for it in all the relevant places (QuerySqlGenerator, SqlExpressionFactory, SqlNullabilityProcessor).
  • EF.Functions extensions in RelationalDbFunctionsExtensions for GreaterThan, LessThan, GreatThanOrEqual, LessThanOrEqual.
  • A method translator for the above extensions in RelationalMethodCallTranslatorProvider which produces comparisons over RowValueExpressions.
  • A fallback method translator for SQL Server which translates to the expanded comparisons without RowValueExpressions.

@smitpatel does all the above sound right? If so, @mrahhal we'd be happy to accept a PR for this.

/cc @michaelstaib, once this is implemented, you probably want to implement your paging support with this.

@mrahhal
Copy link
Author

mrahhal commented Dec 6, 2021

@roji Thanks for the details. Will give this a go soon and keep you updated.

@roji roji self-assigned this Dec 6, 2021
@ajcvickers ajcvickers added this to the 7.0.0 milestone Dec 10, 2021
@roji
Copy link
Member

roji commented Dec 14, 2021

Some more examples of row value usage across databases, to help inform our decision on how to design the support:

PostgreSQL allows referring to an actual table row as a row value, comparing it to another literal row value:

CREATE TABLE foo (bar1 INT, bar2 INT);
INSERT INTO foo VALUES (1, 2), (3, 4);

SELECT * FROM foo WHERE foo = (1, 2);

PostgreSQL also has composite types, like a UDT (class/struct type definition in the database). The literal for those is also a row value:

CREATE TYPE complex AS (r INT, i INT);
CREATE TABLE bar (id INT, complex complex);
INSERT INTO bar (id, complex) VALUES (1, (1, 2)), (2, (3, 4));

SELECT * FROM bar WHERE complex = (1, 2);

However, PostgreSQL also has 1st-class arrays:

CREATE TABLE baz (id INT, arr INT[]);
INSERT INTO baz (id, arr) VALUES (1, ARRAY[1, 2]), (2, ARRAY[3, 4]);

SELECT * FROM baz WHERE arr = ARRAY[1, 2];

So expression tree-wise, I think row values and arrays probably need to be two different node types (row values in relational, arrays in PG only).

Note that SQLite also supports various cool things with row values:

SELECT * FROM info WHERE (year,month,day) BETWEEN (2015,9,12) AND (2016,9,12);

SELECT ordid, prodid, qty
FROM item
WHERE (prodid, qty) IN (SELECT prodid, qty FROM item WHERE ordid = 365);

UPDATE tab1 SET (a,b)=(b,a);

MySQL/MariaDB also supports at least some forms of row values:

CREATE TABLE foo (bar1 INT, bar2 INT);

SELECT * FROM foo WHERE (bar1, bar2) = (3, 4);
SELECT * FROM foo WHERE (bar1, bar2) > (3, 4);
SELECT * FROM foo WHERE (bar1, bar2) IN ((1, 2), (3, 4));

@mrahhal
Copy link
Author

mrahhal commented Dec 15, 2021

Since this is being given more time in design. I can't help but think about how C# tuples would have been perfect to represent row value in several places, if there was a way to give them comparison operators that is (only equality operators are defined at the moment). This isn't possible without custom operator overloading (extension everything), tracked here: dotnet/csharplang#192. Would have allowed to represent all kinds of row value you want to support cleanly in C#:

.Where(b => (b.Column1, b.Column2) > (1, 2))
// instead of
.Where(b => EF.Functions.GreaterThan(new[] { b.Column1, b.Column2 }, new[] { 1, 2 }))

There doesn't seem to be progress at all in this proposal (dotnet/csharplang#192) unfortunately, but it might be good to keep it in mind.

@roji
Copy link
Member

roji commented Dec 15, 2021

@mrahhal you're right that value tuple comparison would have been great here - but I think this is trickier than it looks:

  • It only makes sense to do comparison over tuples if all the contained item types themselves support comparison. There isn't any way to express that in C# AFAIK (e.g. no generic constraints for "comparable" in the sense of "having comparison operators defined").
  • Value tuples with different numbers of items are different types. So any extension over tuples (including comparison operators) would have to be defined for ValueTuple<T1>, for ValueTuple<T1, T2>, and so on.

To summarize, I don't think this would be very feasible with extension operators; ideally this would simply be a native capability in C#. However, I'm not aware of many uses for tuple comparisons beyond pagination, which is more of a database thing. In other words, I'm not sure how useful this would be as a C# feature beyond helping us express SQL in EF Core LINQ queries.

@mrahhal
Copy link
Author

mrahhal commented Dec 15, 2021

Yes there are a lot of tricky aspects to it. There's also no good namespace to add these operator extensions to since a lot of EF consumer code is tied to IQueryable and not to an EF Core namespace. As for the tuple types, I thought it'll be feasible to overload some of the common scalar types for 1-3 arguments, but yeah not very pretty. In any case this isn't doable for now yeah.

@roji
Copy link
Member

roji commented May 7, 2022

Note that this was done for PostgreSQL in npgsql/efcore.pg#2350, using ITuple (ValueTuple/Tuple) to represent a row value. There's nothing PG-specific in that implementation, so if we want to, we should be able to bring at least some of it over to relational.

Note that since SQL Server doesn't support row value comparisons, we could have a post-processing step that converts row value comparisons to the expanded form (i.e. instead of (a, b) > (3, 4), SQL Server would generate a > 3 || (a == 3 && b > 4).

@roji roji removed their assignment May 7, 2022
@roji roji removed the blocked label May 7, 2022
@mrahhal
Copy link
Author

mrahhal commented May 8, 2022

Looks good! And I agree that using the currently uglier value tuples is better than arrays. Hopefully they get added to expression trees soon.

Note that since SQL Server doesn't support row value comparisons, we could have a post-processing step that converts row value comparisons to the expanded form (i.e. instead of (a, b) > (3, 4), SQL Server would generate a > 3 || (a == 3 && b > 4).

I would also want to consider generating some redundant clauses that help get us closer to row value perf (as discussed here. I implement a simple optimization on the 1st column here). Although I haven't actually done perf comparisons with large data in the real world. This is purely based on the observed query plans.

@roji
Copy link
Member

roji commented May 8, 2022

Good suggestion - we should definitely keep this in mind if/when we get around to implementing this for SQL Server (and analyze the perf to make sure).

Beyond the perf aspect, hopefully this improvement (currently in PG only) helps people adopt keyset pagination as it removes the need to deal with the complex expanded comparisons...

@mrahhal
Copy link
Author

mrahhal commented May 11, 2022

Hey @roji. You removed the blocked tag, so do you think I can start working on this? I'll take your work in pg for reference and then implement the expansion part depending on the provider (we can discuss any possible additional optimizations after).
I don't have much time lately so it might take a bit of time though.

@roji
Copy link
Member

roji commented May 11, 2022

Introducing my PG work into EF Core's relational layer would include considerable work, and @smitpatel would have to devote quite a bit of time to reviewing etc. So I think it's up to him to decide if it's a good idea to work on this.

@Eli-Black-Work
Copy link

Perhaps this is a dumb question, but what about pagination where we're sorting one column in ASC order and another column in DESC order (e.g. SORT BY name ASC id DESC)? Is there a way to implement pagination using row value comparison (e.g.(name, id) > ("Joe", 1234)) for that situation?

Or do most pagination systems just not support this?

Sorry, I know this is kind of tangential to the topic, but I've been implementing LINQ-based keyset pagination for an arbitrary number of columns and having been struggling with this 😄

@roji
Copy link
Member

roji commented Aug 29, 2022

@Bosch-Eli-Black not a dumb questions at all... AFAIK this isn't something you can do with row value comparison (and it seems to be generally a pretty exotic thing with pagination systems...)

@mrahhal
Copy link
Author

mrahhal commented Aug 29, 2022

Yes this isn't something you can do with row value, I do support it in MR.EntityFrameworkCore.KeysetPagination though.

@Eli-Black-Work
Copy link

@roji Thanks 🙂 I was trying to make my pagination system be general-purpose, but maybe supporting something this exotic is a bit too general purpose 😅

@mrahhal Thanks! I've starred your repo. After taking another look at our requirements, I've decided to go with offset pagination, though, in part because it was easier to implement but also in part because our company's style guide mandates that users should be able to navigate to arbitrary pages of results 🙂

@roji
Copy link
Member

roji commented Aug 31, 2022

@Bosch-Eli-Black you can still implement a hybrid system where clicking "next/prev page" uses row set navigation, and only random-access jumping to arbitrary pages uses offset navigation.

@Eli-Black-Work
Copy link

@roji Ooooh, hadn't thought of that! Thanks! 🙂

@ryanelian
Copy link

ryanelian commented Oct 12, 2022

Hello, I just want do dump this code here just in case someone needs keyset pagination using EF Core 6 (strictly for PostgreSQL) with the row value comparison syntax (for the perf benefit), since unfortunately the EF.Functions.GreaterThan on ValueTuple only available in Npgsql EF Core 7. (Can't upgrade due to company policy sticking to .NET 6 LTS)

Show code:

Code has only been lightly tested. Use it at your own risk.

PaginatePostgreSqlDbContextExtensions.cs

using Microsoft.EntityFrameworkCore.Metadata;
using System.Linq.Expressions;
using System.Reflection;

namespace Microsoft.EntityFrameworkCore
{
    /// <summary>
    /// Contains methods for querying <typeparamref name="TEntity"/> using keyset pagination logic
    /// </summary>
    /// <typeparam name="TEntity"></typeparam>
    public class DbSetPagination<TEntity> where TEntity : class
    {
        private readonly DbSet<TEntity> _dbSet;
        private readonly IEntityType _entityType;
        private readonly string _tableName;
        private readonly string? _schema;

        /// <summary>
        /// Constructs an instance of <see cref="DbSetPagination{TEntity}"/>
        /// </summary>
        /// <param name="dbSet"></param>
        /// <param name="entityType"></param>
        /// <exception cref="InvalidOperationException"></exception>
        public DbSetPagination(DbSet<TEntity> dbSet, IEntityType entityType)
        {
            _dbSet = dbSet;
            _entityType = entityType;
            _tableName = entityType.GetTableName() ??
                throw new InvalidOperationException($"Cannot get table name for {typeof(TEntity).Name}");
            _schema = entityType.GetSchema();
        }

        /// <summary>
        /// Gets a fully-formatted schema + table name for querying in PostgreSQL
        /// </summary>
        /// <returns></returns>
        private string GetTableFullName()
        {
            // https://www.npgsql.org/efcore/modeling/table-column-naming.html
            if (_schema != null)
            {
                return $"{PaginatePostgreSqlDbContextExtensions.Quote(_schema)}.{PaginatePostgreSqlDbContextExtensions.Quote(_tableName)}";
            }

            return PaginatePostgreSqlDbContextExtensions.Quote(_tableName);
        }

        /// <summary>
        /// Get the PostgreSQL column name for property <typeparamref name="TColumn"/>
        /// </summary>
        /// <typeparam name="TColumn"></typeparam>
        /// <param name="columnSelector"></param>
        /// <returns></returns>
        /// <exception cref="InvalidOperationException"></exception>
        private string GetColumnName<TColumn>(Expression<Func<TEntity, TColumn>> columnSelector)
        {
            if (columnSelector.Body is not MemberExpression memex)
            {
                throw new InvalidOperationException($"Selected column is not a valid MemberExpression");
            }
            if (memex.Member is not PropertyInfo propertyInfo)
            {
                throw new InvalidOperationException($"Selected member is not a valid Property");
            }

            var soid = StoreObjectIdentifier.Table(_tableName, _schema);
            var columnName = _entityType.GetProperty(propertyInfo.Name)?.GetColumnName(soid) ??
                throw new InvalidOperationException($"Cannot get column name for {propertyInfo.Name}");

            return columnName;
        }

        /// <summary>
        /// Query <typeparamref name="TEntity"/> using keyset pagination logic on a column with read-only result. <br></br>
        /// <paramref name="column1Selector"/> <b>must be the unique column of the table</b>, such as Primary Key or Unique Index.
        /// </summary>
        /// <typeparam name="TColumn1"></typeparam>
        /// <param name="column1Selector"></param>
        /// <param name="lastColumn1"></param>
        /// <param name="descending"></param>
        /// <returns></returns>
        public IQueryable<TEntity> With<TColumn1>(
            Expression<Func<TEntity, TColumn1>> column1Selector,
            TColumn1 lastColumn1,
            bool descending = false)
        {
            if (lastColumn1 is null)
            {
                if (descending)
                {
                    return _dbSet.AsNoTracking().OrderByDescending(column1Selector);
                }

                return _dbSet.AsNoTracking().OrderBy(column1Selector);
            }

            var comparer = descending ? '<' : '>';
            var column1Name = GetColumnName(column1Selector);
            var sql = $"SELECT * FROM {GetTableFullName()} WHERE {PaginatePostgreSqlDbContextExtensions.Quote(column1Name)} {comparer} {{0}}";

            var query = _dbSet.FromSqlRaw(sql, lastColumn1).AsNoTracking();
            if (descending)
            {
                return query.OrderByDescending(column1Selector);
            }
            return query.OrderBy(column1Selector);
        }

        /// <summary>
        /// Query <typeparamref name="TEntity"/> using keyset pagination logic on two columns with read-only result. <br></br>
        /// <paramref name="column1Selector"/> is usually a non-unique column of the table. <br></br>
        /// <paramref name="column2Selector"/> <b>must be the unique column of the table</b>, such as Primary Key or Unique Index.
        /// </summary>
        /// <typeparam name="TColumn1"></typeparam>
        /// <typeparam name="TColumn2"></typeparam>
        /// <param name="column1Selector"></param>
        /// <param name="column2Selector"></param>
        /// <param name="lastColumn1"></param>
        /// <param name="lastColumn2"></param>
        /// <param name="descending"></param>
        /// <returns></returns>
        public IQueryable<TEntity> With<TColumn1, TColumn2>(
            Expression<Func<TEntity, TColumn1>> column1Selector,
            Expression<Func<TEntity, TColumn2>> column2Selector,
            TColumn1 lastColumn1,
            TColumn2 lastColumn2,
            bool descending = false)
        {
            if (lastColumn1 is null || lastColumn2 is null)
            {
                if (descending)
                {
                    return _dbSet.AsNoTracking()
                        .OrderByDescending(column1Selector)
                        .ThenByDescending(column2Selector);
                }

                return _dbSet.AsNoTracking()
                    .OrderBy(column1Selector)
                    .ThenBy(column2Selector);
            }

            var comparer = descending ? '<' : '>';
            var column1Name = GetColumnName(column1Selector);
            var column2Name = GetColumnName(column2Selector);

            // https://learn.microsoft.com/en-us/ef/core/querying/pagination#multiple-pagination-keys
            // https://github.com/dotnet/efcore/issues/26822
            var sql = $"SELECT * FROM {GetTableFullName()} WHERE ({PaginatePostgreSqlDbContextExtensions.Quote(column1Name)}, {PaginatePostgreSqlDbContextExtensions.Quote(column2Name)}) {comparer} ({{0}}, {{1}})";

            var query = _dbSet.FromSqlRaw(sql, lastColumn1, lastColumn2).AsNoTracking();
            if (descending)
            {
                return query.OrderByDescending(column1Selector).ThenByDescending(column2Selector);
            }
            return query.OrderBy(column1Selector).ThenBy(column2Selector);
        }
    }

    /// <summary>
    /// Contains extension methods for querying keyset pagination against <see cref="DbContext"/>
    /// </summary>
    public static class PaginatePostgreSqlDbContextExtensions
    {
        /// <summary>
        /// Begin pagination query for entity <typeparamref name="TEntity"/>
        /// </summary>
        /// <typeparam name="TEntity"></typeparam>
        /// <param name="dbContext"></param>
        /// <returns></returns>
        /// <exception cref="InvalidOperationException"></exception>
        public static DbSetPagination<TEntity> Paginate<TEntity>(this DbContext dbContext) where TEntity : class
        {
            var entityType = dbContext.Model.FindEntityType(typeof(TEntity)) ??
                throw new InvalidOperationException($"Cannot get entity type for {typeof(TEntity).Name}");

            return new DbSetPagination<TEntity>(dbContext.Set<TEntity>(), entityType);
        }

        /// <summary>
        /// Put quote around the string. Used to place quotation marks around identifiers used in PostgreSQL raw SQL query. 
        /// Read more: <br></br> https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
        /// </summary>
        /// <param name="value"></param>
        /// <returns></returns>
        internal static string Quote(string value)
        {
            return $"\"{value}\"";
        }
    }
}

Usage

var result = _db.Paginate<User>().With(Q => Q.GivenName, Q => Q.Id, request.PreviousGivenName, request.PreviousId)
    .Where(Q => Q.SearchVector.Matches(request.Search))
    .Select(Q => new ListUserResponse
    {
        Id = Q.Id,
        GivenName = Q.GivenName,
        FamilyName = Q.FamilyName,
        Email = Q.Email,
    }).Take(10).AsAsyncEnumerable();

Generated SQL Example

SELECT a."Id", a."GivenName", a."FamilyName", a."Email"
FROM (
    SELECT * FROM "AspNetUsers" WHERE ("GivenName", "Id") > (@p0, @p1)
) AS a
WHERE a."SearchVector" @@ plainto_tsquery(@__request_Search_1)
ORDER BY a."GivenName", a."Id"
LIMIT @__p_2
SELECT a."Id", a."GivenName", a."FamilyName", a."Email"
FROM (
    SELECT * FROM "AspNetUsers" WHERE ("GivenName", "Id") < (@p0, @p1)
) AS a
WHERE a."SearchVector" @@ plainto_tsquery(@__request_Search_1)
ORDER BY a."GivenName" DESC, a."Id" DESC
LIMIT @__p_2

@tnlthanzeel
Copy link

@roji Ooooh, hadn't thought of that! Thanks! 🙂

you are working on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query customer-reported punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants