Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document snapshots & synchronisation algorithm #27

Open
ThisIsMissEm opened this issue Dec 23, 2024 · 0 comments
Open

Document snapshots & synchronisation algorithm #27

ThisIsMissEm opened this issue Dec 23, 2024 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@ThisIsMissEm
Copy link
Contributor

  • we don't want to necessarily fetch the entire history of the dataset just to subscribe and consume it
  • Therefore we have snapshots, which are essentially the latest change for each (entity_kind, entity_key) pair — effectively a group by those columns ordered by change_id
  • However, calculating a snapshot maybe expensive, so we should cache it to avoid possible denial of service attacks through requesting too many snapshots.
  • caching the snapshot means our initial fetch may actually already be out of date by the maximum cache time on a snapshot.
  • Therefore after receiving a snapshot, the consumer must/should perform the sync algorithm using the maximum change_id in the snapshot.
  • Snapshots could also be set to not expire, effectively acting as a checkpoint for the dataset.
  • relatedly, we need to document the synchronisation algorithm. Ideally being able to make use of http caching.
@ThisIsMissEm ThisIsMissEm added the documentation Improvements or additions to documentation label Dec 23, 2024
@ThisIsMissEm ThisIsMissEm self-assigned this Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant