-
Notifications
You must be signed in to change notification settings - Fork 104
Description
A very common use case, and question, is how to handle collections where you don't want to download all of it. Such as issues in an issue tracker, downloading by project/status/createdData etc. The current solution is to have separate collections for each part you would want to download, managing this all yourself.
I would like to propose "Partitioned Collections", these are collections that are automatically partitioned in the background, progressively downloading data as you query them. With the issue track example you would have a single issuesCollection
and as you query it (or preload it), it will download the partitions - however many they are - that are covered by the query.
For example say it's partitioned by archived
and the "month" from its created
value when you perform this query:
await todos.preload((q) =>
q.from({ t: todos })
.where(({ t }) =>
and(
eq(t.archived, true),
and(
gte(t.created, new Date('2025-06-01')),
lt(t.created, new Date()) // today
)
)
)
)
the collection will see that the collection is partitioned by both archived
and created
(by month) and ensure that those covered partitions are downloaded.
I have been considering the API for defining these collections, with these objectives:
- must be as similar to the existing api as possible
- reuse the same collection option factories
- have a simple way to define the partitioning, ideally using the same expression api we have for queries and the new indexing functionality (aside: this needs documenting).
This is what I am thinking:
import { createPartitionedCollection, datePart } from '@tanstack/db'
import { queryCollectionOptions } from '@tanstack/query-db-collection'
import { todoSchema } from './schema'
// datePart('day', date) -> Date at 00:00:00 of that day
// this is a new expression function, we would model this and similar additions on the standard SQL functions
export const todos = createPartitionedCollection<Todo>({
id: 'todos',
partitionBy: (row) => ({
archived: row.archived,
createdDay: datePart('day', row.created),
}),
partitionOptions: ({ partitionKey, parentId }) =>
queryCollectionOptions({ // <=== this is a standard queryCollectionOptions but inside a callback
queryKey: [parentId, partitionKey.archived, partitionKey.createdDay.getTime()],
queryFn: async () => {
const from = partitionKey.createdDay
const to = addDays(from, 1)
const res = await fetch(
`/api/todos?archived=${partitionKey.archived}` +
`&from=${from.toISOString()}` +
`&to=${to.toISOString()}`
)
return res.json()
},
getKey: (item) => item.id,
schema: todoSchema,
}),
})
For an Electric collection it would look like:
export const todos = createPartitionedCollection<Todo>({
id: 'todos',
partitionBy: (row) => ({
archived: row.archived,
createdDay: datePart('day', row.created),
}),
partitionOptions: ({ partitionKey, parentId }) =>
electricCollectionOptions({ // <=== this is a standard electricCollectionOptions but inside a callback
id: `${parentId}:${partitionKey.archived}:${partitionKey.createdDay}`
shapeOptions: {
url: 'https://example.com/v1/shape',
params: {
table: 'todos',
where: 'archived = $1 and created >= $2 and created < $3',
params: [
key.archived,
key.createdDay,
addDays(key.createdDay, 1),
],
},
},
getKey: (t) => t.id,
schema: todoSchema,
}),
})
There are a few things to consider here:
- the
getKey
andschema
likely really want to be part of the parent collection config, not the partition. - type inference on the
partitionBy
expression builder would require the outer config options being typed with the schema - this may be complex, or super simple... - what do we do when a query doesn't fully provide the bounds to infer all partitions covered? can we infer an upper/lower bound? Provide defaults?
- complex queries with multiple
and
andor
could result in some complex code to work out the partitions covered - although I think this is all very doable.
There is also a cross over with the functionality of React Query infinite queries. These provide a way to page through an infinite list of results, but they don't necessarily define a firm start/end of a page that can be inferred from a query. I think there are two ways we could use these:
- We could use them to support progressively loading results when used with an orderBy, starting loading from the beginning and paging through until we reach the limit or bound that was set in the query.
- For standard collections support them as a way to just load very large collections, split across multiple requests.
React Query infinite queries his need more thought.