-
Notifications
You must be signed in to change notification settings - Fork 121
Local‐First Sync Engine
em is designed to be local-first, allowing the user to work fully offline, without giving up collaborative editing and multi-device use when they are online. As it turns out, this is a harder problem than one might expect! After failed attempts with Firebase and YJS, it's time to take a step back and evaluate the available options based on lessons learned and a clarification of requirements. There have been fundamental advancements in CRDT and related technology since 2018, when the project started. A new movement has been catalyzed to solve the challenges of local-first.
This document describes the requirements for local-first functionality in em, and contains notes on several existing local-first sync engines. The goal is to identify the sync engine that can best support the desired functionality, and craft a plan to implement it.
Read an introduction to the philosophy of local-first software.
- Full replica on every device
- Not just a cache.
- Partial replication
- Share a subtree for collaborative editing.
- Replication progress
- Provide user with % completed.
- Maintain the same data across clients (eventual consistency).
- Update multiple docs atomically (transactions).
- Or have a way to repair the tree if some updates fail.
- Conflict resolution: Unobtrusively handle conflicts if two nodes are edited concurrently.
-
Maintain doubly linked parent-child connection.
-
One parent per node.
-
No cycles.
-
Prevent subtrees becoming orphaned and inaccessible from the root node.
-
Preserve ancestors (See: Deleted parent example below)
Details
In the example, Client A and Client B are assumed to both be offline when they do the addition and delete, respectively. The child created by Client A should never be deleted indirectly as a consequence of Client B deleting one of its ancestors. Otherwise Client A could go to the trouble of adding an entire subtree only for it to get deleted accidentally by Client B.A consequence of this, I believe, is that a thought must be explicitly deleted, never deleted as a byproduct of its parent being deleted. If a user wants to delete a subtree of 10 thoughts, the system need to issue 10 deletes. Thoughts are explicitly deleted by the system, but the UI still offers the convenience of deleting all known descendants. When the user deletes a thought, we can lazily iterate the subtree and delete all descendants in the background.
Note that a thought may be a leaf on one client, but a parent on a different client before they have synced up. So we can never know for sure if a thought is a leaf on all clients. There may always be descendants from other clients that just haven't synced yet.
One idea I had was to automatically un-tombstone all ancestors whenever a thought is added or moved. That ensures it stays connected to the tree in the face of concurrent deletes. I haven't thought through how to get this to converge though.
The downside of this approach is the re-appearance of a parent after its deletion. In order to be conservative with user data and avoid thoughts being deleted unintentionally by other clients, there will sometimes be cases where you have to re-delete a thought that you already deleted. This seems like a small price to pay for ensuring that your new thoughts are never deleted when you go back online.
- Client A: Adds child
a
tox
. - Client B: Adds child
b
intox
. - When synced, parent
x
needs to contain botha
andb
.
- Client A moves
x
intoa
.x.parentId = a
a.children = [x, ...]
- Client B moves
x
intob
.x.parentId = b
b.children = [x, ...]
- Either operation is valid, as long as the parentId and children remain consistent.
-
x
should not end up in botha
andb
's children. - If
x
ends up ina
, thenx.parentId = a
. - If
x
ends up inb
, thenx.parentId = b
.
- Client A: Adds
y
as a child ofx
. - Client B: Deletes
x
. - When synced,
x
must be restored, otherwisey
will be orphaned.
- getById
- Efficiently access a document by id.
- getByProp
- Efficiently access a document by an indexed property.
- delete
- Efficiently delete a large subtree.
- import
- Efficiently import 100k+ nodes.
- replicate
- Efficiently replicate 100k+ nodes to a new device.
- longevity
- Should not slow down after 1M+ operations
- Subscribe to remote changes over websocket
- Web: IndexedDB
- Mobile: SQLite (capacitor app)
- Per document permissions
Some handy lists of sync engines:
These sync engines are active candidates for use in em. Some of them have not been researched at all.
-
CozoDB
- Graph db
- Embeddable
-
Ditto
- Control replication with queries.
- No built-in synced event or progress percentage.
- DXOS
- Electric-next
- Fireproof
- Jazz
- Liveblocks
-
Loro
-
Entire Doc loaded into memory
- Then how can there be lazy loading?
- For applications requiring directory-like data manipulation, Loro utilizes the algorithm from A Highly-Available Move Operation for Replicated Trees, simplifying moving and reorganizing hierarchical data structures.
- Alpha software
- Similar to YJS
-
Entire Doc loaded into memory
- NextGraph
- Partykit
-
PowerSync
- Partial replication provided at compile-time on the server
- Partial replication
- Sync buckets
- E2E Encryption
- For end-to-end encryption, the encrypted data can be synced using PowerSync. The data can then either be encrypted and decrypted directly in memory by the application, or a separate local-only table can be used to persist the decrypted data — allowing querying the data directly.
-
RxDB
- No transactions
- Conflict resolution through revisions
- SQLite plugin
- SyncProxy
- Tinybase
- Triplit
- WatermelonDB
These sync engines have been considered and ruled out for reasons described below.
-
BlockSuite
- Built on YJS
-
ElectricSQL
- Basically deprecated as they transition to electric-next
- Sync required before query
- Once the initial data has synced, queries can run against it.
- But shapes can be subscribed and unsubscribed on the fly.
- Dynamic partial replication
- There is no support for where clauses to filter the initial target rows or select clauses to filter the include tree. As a result, current calls to db.tablename.sync({...}) will "over sync" additional data onto the device.
- Capacity
- thousands of rows are definitely viable, and we have that working with our own internal development tests. Hundreds of thousands or millions may cause issues right now, but are something we do want to support.
- Size
- ~1.1MB WASM Download
- https://news.ycombinator.com/item?id=37584522
- Faster than 1MB JS since no parsing
- ~1.1MB WASM Download
-
Evolu
- Too alpha
-
Gun.js
-
No full replica
- The local instance only replicates the parts of the graph that you’ve accessed, so when you’re offline you can only read things you’ve already read. But you can update & add things without limit.
- Known syncing issues
- Known codebase quality issues
-
No full replica
-
Instant
- No full replica
- More offline-cached than offline-first
-
Levelgraph
- No replication
- Levelgraph itself does not support replication in any way, as that is not part of the scope of the library (triple storage and queries).
- https://github.com/levelgraph/levelgraph/issues/195#issuecomment-997205147
-
Replicache
- To be replaced by ZeroSync, which is cache-based
- Downloads everything
-
SurrealDB
- Not offline-first
- surrealdb.wasm
- Replication bridge?
- Need conflict resolution
-
Verdant
- No partial replication
- Too alpha
- No encryption
-
YJS
- Entire DB loaded into memory
-
ZeroSync
- No full replica (cache-based)