Document snapshots & synchronisation algorithm #27

ThisIsMissEm · 2024-12-23T05:37:48Z

we don't want to necessarily fetch the entire history of the dataset just to subscribe and consume it
Therefore we have snapshots, which are essentially the latest change for each (entity_kind, entity_key) pair — effectively a group by those columns ordered by change_id
However, calculating a snapshot maybe expensive, so we should cache it to avoid possible denial of service attacks through requesting too many snapshots.
caching the snapshot means our initial fetch may actually already be out of date by the maximum cache time on a snapshot.
Therefore after receiving a snapshot, the consumer must/should perform the sync algorithm using the maximum change_id in the snapshot.
Snapshots could also be set to not expire, effectively acting as a checkpoint for the dataset.
relatedly, we need to document the synchronisation algorithm. Ideally being able to make use of http caching.

ThisIsMissEm added the documentation Improvements or additions to documentation label Dec 23, 2024

ThisIsMissEm self-assigned this Dec 23, 2024

Provide feedback