Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting rid of the replication "stream" tables #13456

Open
matrixbot opened this issue Dec 20, 2023 · 0 comments
Open

getting rid of the replication "stream" tables #13456

matrixbot opened this issue Dec 20, 2023 · 0 comments

Comments

@matrixbot
Copy link
Collaborator

matrixbot commented Dec 20, 2023

This issue has been migrated from #13456.


Currently we have a number of tables in the database that exist only to record data for the replication streams. These include:

  • cache_invalidation_stream_by_instance
  • current_state_delta_stream
  • ex_outlier_stream
  • presence_stream
  • push_rules_stream

(and there may well be others).

Essentially, whenever we record a change to a table that needs to be replicated to workers, we add a row to one of these tables; the rows are then used in one of two ways:

  • ReplicationStreamer._run_notifier_loop regularly polls them (via the *Stream._update_function methods) and sends out a NOTIFY over Redis pubsub with the data from the table.
  • If a worker gets disconnected from Redis (so misses notifications), it can catch up with any missed notification by reading the relevant table itself.

The reason we use this arrangement is twofold:

  1. It allows workers which miss the memo (because they were disconnected from Redis) to catch up with anything they missed.
  2. Since the Redis notifications are sent out asynchronously by _run_notifier_loop, it is possible for the "writing" process to abort between updating the database and sending the notification to Redis. Persisting the data in postgres ensures that we can replay anything that wasn't sent when we restart.

However, I assert that these extra tables are a source of complexity, as well as increased database I/O and storage (not least because we never clear them out (matrix-org/synapse#5888)). Worse, whenever we need to add a new type of replication stream, we have to add a load of extra paraphenalia in the shape of a new stream table. It would be good to consider how to get rid of them.

@matrixbot matrixbot changed the title Dummy issue getting rid of the replication "stream" tables Dec 21, 2023
@matrixbot matrixbot reopened this Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant