Partitioned collections

A very common use case, and question, is how to handle collections where you don't want to download all of it. Such as issues in an issue tracker, downloading by project/status/createdData etc. The current solution is to have separate collections for each part you would want to download, managing this all yourself.

I would like to propose "Partitioned Collections", these are collections that are automatically partitioned in the background, progressively downloading data as you query them. With the issue track example you would have a single `issuesCollection` and as you query it (or preload it), it will download the partitions - however many they are - that are covered by the query.

For example say it's partitioned by `archived` and the "month" from its `created` value when you perform this query:

```ts
await todos.preload((q) =>
  q.from({ t: todos })
   .where(({ t }) =>
     and(
       eq(t.archived, true),
       and(
         gte(t.created, new Date('2025-06-01')),
         lt(t.created, new Date()) // today
       )
     )
   )
)
```
the collection will see that the collection is partitioned by both `archived` and `created` (by month) and ensure that those covered partitions are downloaded.

I have been considering the API for defining these collections, with these objectives:
1. must be as similar to the existing api as possible
2. reuse the same collection option factories
3. have a simple way to define the partitioning, ideally using the same expression api we have for queries and the new indexing functionality (aside: this needs documenting).

This is what I am thinking:

```ts
import { createPartitionedCollection, datePart } from '@tanstack/db'
import { queryCollectionOptions } from '@tanstack/query-db-collection'
import { todoSchema } from './schema'

// datePart('day', date) -> Date at 00:00:00 of that day
// this is a new expression function, we would model this and similar additions on the standard SQL functions

export const todos = createPartitionedCollection<Todo>({
  id: 'todos',

  partitionBy: (row) => ({
    archived: row.archived,
    createdDay: datePart('day', row.created),
  }),

  partitionOptions: ({ partitionKey, parentId }) =>
    queryCollectionOptions({  // <=== this is a standard queryCollectionOptions but inside a callback
      queryKey: [parentId, partitionKey.archived, partitionKey.createdDay.getTime()],
      queryFn: async () => {
        const from = partitionKey.createdDay
        const to = addDays(from, 1)

        const res = await fetch(
          `/api/todos?archived=${partitionKey.archived}` +
          `&from=${from.toISOString()}` +
          `&to=${to.toISOString()}`
        )
        return res.json()
      },
      getKey: (item) => item.id,
      schema: todoSchema,
    }),
})
```

For an Electric collection it would look like:

```ts
export const todos = createPartitionedCollection<Todo>({
  id: 'todos',

  partitionBy: (row) => ({
    archived: row.archived,
    createdDay: datePart('day', row.created),
  }),

  partitionOptions: ({ partitionKey, parentId }) =>
    electricCollectionOptions({  // <=== this is a standard electricCollectionOptions but inside a callback
      id: `${parentId}:${partitionKey.archived}:${partitionKey.createdDay}`
      shapeOptions: {
        url: 'https://example.com/v1/shape',
        params: {
          table: 'todos',
          where: 'archived = $1 and created >= $2 and created < $3',
          params: [
            key.archived,
            key.createdDay,
            addDays(key.createdDay, 1),
          ],
        },
      },
      getKey: (t) => t.id,
      schema: todoSchema,
    }),
})
```

There are a few things to consider here:
1. the `getKey` and `schema` likely really want to be part of the parent collection config, not the partition.
2. type inference on the `partitionBy` expression builder would require the outer config options being typed with the schema - this may be complex, or super simple...
3. what do we do when a query doesn't fully provide the bounds to infer all partitions covered? can we infer an upper/lower bound? Provide defaults?
4. complex queries with multiple `and` and `or` could result in some complex code to work out the partitions covered - although I think this is all very doable.

There is also a cross over with the functionality of [React Query infinite queries](https://tanstack.com/query/latest/docs/framework/react/guides/infinite-queries). These provide a way to page through an infinite list of results, but they don't necessarily define a firm start/end of a page that can be inferred from a query. I think there are two ways we could use these:
1. We could use them to support progressively loading results when used with an orderBy, starting loading from the beginning and paging through until we reach the limit or bound that was set in the query.
2. For standard collections support them as a way to just load very large collections, split across multiple requests.

React Query infinite queries his need more thought.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Partitioned collections #315

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Partitioned collections #315

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions