Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(electric): Proposed speedup of Sent Rows Graph popping method (#…
…1389) Addresses [VAX-1977](https://linear.app/electric-sql/issue/VAX-1977/unsubscribes-really-slow-due-to-sent-rows-graph-popping-implementation) While running Linearlite multi-project, I noticed unsubscribes took a very long time, and I found that the `Graph.delete_vertices` operation is what takes the most time to run by far, not scaling very well. It doesn't make sense to me that deleting a bunch of vertices from a tree graph would take this long especially when traversing the whole graph is fast, so reconstructing it should really not be too much trouble (?), so my conjecture is that [the `delete_vertices` implementation](https://github.com/bitwalker/libgraph/blob/460cdfd9a163a533bdab0a160ba7ccf888047927/lib/graph.ex#L890-L926) which interacts with a map quite a lot is just suboptimal. The proposed change is a minimal diff to do the same thing which with the same workload I'm observing >100x improvement in speed with much better scaling properties, but obviously when you get this kind of speedup it casts doubt on its correctness. Instead of deleting vertices, we keep track of all edges to be removed, as well as vertices to be removed - then run `Graph.remove_edge` which seems to have a fairly optimal implementation for _all_ edges to be popped, and instead of removing the vertices with `Graph.delete_vertices` we use `Graph.subgraph` with a diff of the graph's existing vertices minus the ones to be removed, which should generate a maximally connected subgraph containing the specified vertices. Since the request related edges have been removed, along with the disconnected vertices, the subgraph should be (?) the same as what we were producing before, but I suspect because of [the implementation of `Graph.subgraph`](https://github.com/bitwalker/libgraph/blob/460cdfd9a163a533bdab0a160ba7ccf888047927/lib/graph.ex#L2224-L2260) being more aimed at recreating a graph from scratch through traversing the existing one rather than modifying the existing one it is more efficient. edit: I suspect also that since once a vertex that forms a subtree that is going to be removed entirely is marked as removed, constructing a subgraph ignores the whole subtree whereas `delete_vertices` will try to clean up the subtree that we're going to entirely ignore anyway (?) Even if there is an issue with the correctness of this approach, I believe we should be able to traverse the graph from the root to recreate a graph without the specified edges and vertices much faster than with `Graph.delete_vertices`, even if it requires a more custom implementation. Testing with Linearlite varying number of issues and comments, split over 3 projects/shapes. The unsubscribe `Shapes.SentRowsGraph.pop_by_request_ids` call takes: #### For 3k rows / shape (single shape) - old ~1200ms - new ~ 30ms #### For 10k rows / shape (single shape) - old ~11000ms - new ~50ms #### For 20k rows / shape (multi-shape) ##### Old - sub 1 shape unsub 1 shape ~ 46000ms - sub 2 shapes unsub 1 shape ~ 180000ms - ( didn't bother with 3 shapes) ##### New - sub 1 shape unsub 1 shape ~100ms - sub 2 shapes unsub 1 shape ~ 300ms - sub 3 shapes unsub 1 shape ~ 400ms --------- Co-authored-by: Ilia Borovitinov <[email protected]>
- Loading branch information