Skip to content

Partitioned collections #315

@samwillis

Description

@samwillis

A very common use case, and question, is how to handle collections where you don't want to download all of it. Such as issues in an issue tracker, downloading by project/status/createdData etc. The current solution is to have separate collections for each part you would want to download, managing this all yourself.

I would like to propose "Partitioned Collections", these are collections that are automatically partitioned in the background, progressively downloading data as you query them. With the issue track example you would have a single issuesCollection and as you query it (or preload it), it will download the partitions - however many they are - that are covered by the query.

For example say it's partitioned by archived and the "month" from its created value when you perform this query:

await todos.preload((q) =>
  q.from({ t: todos })
   .where(({ t }) =>
     and(
       eq(t.archived, true),
       and(
         gte(t.created, new Date('2025-06-01')),
         lt(t.created, new Date()) // today
       )
     )
   )
)

the collection will see that the collection is partitioned by both archived and created (by month) and ensure that those covered partitions are downloaded.

I have been considering the API for defining these collections, with these objectives:

  1. must be as similar to the existing api as possible
  2. reuse the same collection option factories
  3. have a simple way to define the partitioning, ideally using the same expression api we have for queries and the new indexing functionality (aside: this needs documenting).

This is what I am thinking:

import { createPartitionedCollection, datePart } from '@tanstack/db'
import { queryCollectionOptions } from '@tanstack/query-db-collection'
import { todoSchema } from './schema'

// datePart('day', date) -> Date at 00:00:00 of that day
// this is a new expression function, we would model this and similar additions on the standard SQL functions

export const todos = createPartitionedCollection<Todo>({
  id: 'todos',

  partitionBy: (row) => ({
    archived: row.archived,
    createdDay: datePart('day', row.created),
  }),

  partitionOptions: ({ partitionKey, parentId }) =>
    queryCollectionOptions({  // <=== this is a standard queryCollectionOptions but inside a callback
      queryKey: [parentId, partitionKey.archived, partitionKey.createdDay.getTime()],
      queryFn: async () => {
        const from = partitionKey.createdDay
        const to = addDays(from, 1)

        const res = await fetch(
          `/api/todos?archived=${partitionKey.archived}` +
          `&from=${from.toISOString()}` +
          `&to=${to.toISOString()}`
        )
        return res.json()
      },
      getKey: (item) => item.id,
      schema: todoSchema,
    }),
})

For an Electric collection it would look like:

export const todos = createPartitionedCollection<Todo>({
  id: 'todos',

  partitionBy: (row) => ({
    archived: row.archived,
    createdDay: datePart('day', row.created),
  }),

  partitionOptions: ({ partitionKey, parentId }) =>
    electricCollectionOptions({  // <=== this is a standard electricCollectionOptions but inside a callback
      id: `${parentId}:${partitionKey.archived}:${partitionKey.createdDay}`
      shapeOptions: {
        url: 'https://example.com/v1/shape',
        params: {
          table: 'todos',
          where: 'archived = $1 and created >= $2 and created < $3',
          params: [
            key.archived,
            key.createdDay,
            addDays(key.createdDay, 1),
          ],
        },
      },
      getKey: (t) => t.id,
      schema: todoSchema,
    }),
})

There are a few things to consider here:

  1. the getKey and schema likely really want to be part of the parent collection config, not the partition.
  2. type inference on the partitionBy expression builder would require the outer config options being typed with the schema - this may be complex, or super simple...
  3. what do we do when a query doesn't fully provide the bounds to infer all partitions covered? can we infer an upper/lower bound? Provide defaults?
  4. complex queries with multiple and and or could result in some complex code to work out the partitions covered - although I think this is all very doable.

There is also a cross over with the functionality of React Query infinite queries. These provide a way to page through an infinite list of results, but they don't necessarily define a firm start/end of a page that can be inferred from a query. I think there are two ways we could use these:

  1. We could use them to support progressively loading results when used with an orderBy, starting loading from the beginning and paging through until we reach the limit or bound that was set in the query.
  2. For standard collections support them as a way to just load very large collections, split across multiple requests.

React Query infinite queries his need more thought.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions