-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Remove old rows from the cache_invalidation_stream_by_instance
table automatically. (This table is not used when Synapse is configured to use SQLite.)
#15868
Conversation
Signed-off-by: Olivier Wilkinson (reivilibre) <[email protected]>
d8e5721
to
fe7c6a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor things
# Determine whether we are caught up or still catching up | ||
txn.execute( | ||
""" | ||
SELECT invalidation_ts FROM cache_invalidation_stream_by_instance | ||
WHERE stream_id > ? | ||
ORDER BY stream_id ASC | ||
LIMIT 1 | ||
""", | ||
(cutoff_stream_id,), | ||
) | ||
row = txn.fetchone() | ||
if row is None: | ||
in_backlog = False | ||
else: | ||
# We are in backlog if the next row could have been deleted | ||
# if we didn't have such a small batch size | ||
in_backlog = row[0] <= delete_up_to_millisec | ||
|
||
txn.execute( | ||
""" | ||
DELETE FROM cache_invalidation_stream_by_instance | ||
WHERE ? <= stream_id AND stream_id <= ? | ||
""", | ||
(earliest_stream_id, cutoff_stream_id), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could easily determine the outcome of the deletion.
There is this to see how many rows were deleted if we're interested in that boilerplate.
(not saying we have to use it as the current way is simple, just cumbersome).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered checking the count of deleted rows but these streams can technically have gaps so you don't actually know how many rows are to be deleted, only the upper bound.
DELETE ... RETURNING ...
was an option too but it felt less obvious to me and I wanted it to be 'obviously correct', whereas I wouldn't have as much confidence writing something that way
Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Have you had a chance to test this on your homeserver?
Only two other thoughts spring to mind is:
- is it worth adding logging for this, so you can see the cleanup is making forward progress? I guess not; you can always
SELECT min(stream_id) FROM ...
the table. - Is this a generic mechanism/are there other streams whose data is safe to clean out periodically? I suggest you ignore this question so we can land this improvement as-is.
def _clean_up_cache_invalidation_wrapper(self) -> None: | ||
async def _clean_up_cache_invalidation_background() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be slightly clearer if we just had one function that was decorated with @wrap_as_background_process
? (Is that even equivalent? 🤷)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only learnt about this wrapper very recently, I will give it a go :)
# Determine whether we are caught up or still catching up | ||
txn.execute( | ||
""" | ||
SELECT invalidation_ts FROM cache_invalidation_stream_by_instance | ||
WHERE stream_id > ? | ||
ORDER BY stream_id ASC | ||
LIMIT 1 | ||
""", | ||
(cutoff_stream_id,), | ||
) | ||
row = txn.fetchone() | ||
if row is None: | ||
in_backlog = False | ||
else: | ||
# We are in backlog if the next row could have been deleted | ||
# if we didn't have such a small batch size | ||
in_backlog = row[0] <= delete_up_to_millisec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, but I probably would have queried for something like
SELECT stream_id FROM cache_invalidation_stream_by_instance
WHERE stream_id > ? OR invalidation_ts > ?
ORDER BY stream_id ASC
LIMIT 1
i.e. to see if the complement of rows covered by the previous query is nonempty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I genuinely do only want to look at the first row after our current marker: adding more conditions like you suggest is either going to cause more work for the DB as it'd have to scan all the remaining rows until finding one (I think?) or in the case of the OR
condition, it's going to not be able to use any index for that part of the condition.
Have to be honest I don't quite follow your suggestion's query, at first I was thinking you wanted WHERE stream_id > ? AND invalidation_ts <= ?
but altogether not sure. Ignoring that I don't fully grok what you're saying, I think OR invalidation_ts > ?
is potentially problematic as we don't have an index on invalidation_ts
— I can't see how the query planner would make a generally efficient plan for the query :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I thought you were deleting things based on both stream_id and invalidation_ts. If so, it seemed odd to not take both into account when working out if had deleted everything that you wanted to.
I see now that you're only deleting things based on stream_id, using the invalidation_ts only to select a sensible upper limit on stream_id.
Paranoid note: it is not necessarily the case that x.stream_id > y.stream_id implies x.invalidation_ts > y.invalidation_ts (consider e.g. one worker is slow to complete transactions or has a laggy clock). I wouldn't expect this to have any meaingful effect in practice because any drifts should be small.
yup! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy iff CI is.
366154b
to
ca52419
Compare
No significant changes since 1.90.0rc1. - Scope transaction IDs to devices (implement [MSC3970](matrix-org/matrix-spec-proposals#3970)). ([\matrix-org#15629](matrix-org#15629)) - Remove old rows from the `cache_invalidation_stream_by_instance` table automatically (this table is unused in SQLite). ([\matrix-org#15868](matrix-org#15868)) - Fix a long-standing bug where purging history and paginating simultaneously could lead to database corruption when using workers. ([\matrix-org#15791](matrix-org#15791)) - Fix a long-standing bug where profile endpoint returned a 404 when the user's display name was empty. ([\matrix-org#16012](matrix-org#16012)) - Fix a long-standing bug where the `synapse_port_db` failed to configure sequences for application services and partial stated rooms. ([\matrix-org#16043](matrix-org#16043)) - Fix long-standing bug with deletion in dehydrated devices v2. ([\matrix-org#16046](matrix-org#16046)) - Add `org.opencontainers.image.version` labels to Docker containers [published by Matrix.org](https://hub.docker.com/r/matrixdotorg/synapse). Contributed by Mo Balaa. ([\matrix-org#15972](matrix-org#15972), [\matrix-org#16009](matrix-org#16009)) - Add a internal documentation page describing the ["streams" used within Synapse](https://matrix-org.github.io/synapse/v1.90/development/synapse_architecture/streams.html). ([\matrix-org#16015](matrix-org#16015)) - Clarify comment on the keys/upload over replication enpoint. ([\matrix-org#16016](matrix-org#16016)) - Do not expose Admin API in caddy reverse proxy example. Contributed by @NilsIrl. ([\matrix-org#16027](matrix-org#16027)) - Remove support for legacy application service paths. ([\matrix-org#15964](matrix-org#15964)) - Move support for application service query parameter authorization behind a configuration option. ([\matrix-org#16017](matrix-org#16017)) - Update SQL queries to inline boolean parameters as supported in SQLite 3.27. ([\matrix-org#15525](matrix-org#15525)) - Allow for the configuration of the backoff algorithm for federation destinations. ([\matrix-org#15754](matrix-org#15754)) - Allow modules to check whether the current worker is configured to run background tasks. ([\matrix-org#15991](matrix-org#15991)) - Update support for [MSC3958](matrix-org/matrix-spec-proposals#3958) to match the latest revision of the MSC. ([\matrix-org#15992](matrix-org#15992)) - Allow modules to schedule delayed background calls. ([\matrix-org#15993](matrix-org#15993)) - Properly overwrite the `redacts` content-property for forwards-compatibility with room versions 1 through 10. ([\matrix-org#16013](matrix-org#16013)) - Fix building the nix development environment on MacOS systems. ([\matrix-org#16019](matrix-org#16019)) - Remove leading and trailing spaces when setting a display name. ([\matrix-org#16031](matrix-org#16031)) - Combine duplicated code. ([\matrix-org#16023](matrix-org#16023)) - Collect additional metrics from `ResponseCache` for eviction. ([\matrix-org#16028](matrix-org#16028)) - Fix endpoint improperly declaring support for MSC3814. ([\matrix-org#16068](matrix-org#16068)) - Drop backwards compat hack for event serialization. ([\matrix-org#16069](matrix-org#16069)) * Update PyYAML to 6.0.1. ([\matrix-org#16011](matrix-org#16011)) * Bump cryptography from 41.0.2 to 41.0.3. ([\matrix-org#16048](matrix-org#16048)) * Bump furo from 2023.5.20 to 2023.7.26. ([\matrix-org#16077](matrix-org#16077)) * Bump immutabledict from 2.2.4 to 3.0.0. ([\matrix-org#16034](matrix-org#16034)) * Update certifi to 2023.7.22 and pygments to 2.15.1. ([\matrix-org#16044](matrix-org#16044)) * Bump jsonschema from 4.18.3 to 4.19.0. ([\matrix-org#16081](matrix-org#16081)) * Bump phonenumbers from 8.13.14 to 8.13.18. ([\matrix-org#16076](matrix-org#16076)) * Bump regex from 1.9.1 to 1.9.3. ([\matrix-org#16073](matrix-org#16073)) * Bump serde from 1.0.171 to 1.0.175. ([\matrix-org#15982](matrix-org#15982)) * Bump serde from 1.0.175 to 1.0.179. ([\matrix-org#16033](matrix-org#16033)) * Bump serde from 1.0.179 to 1.0.183. ([\matrix-org#16074](matrix-org#16074)) * Bump serde_json from 1.0.103 to 1.0.104. ([\matrix-org#16032](matrix-org#16032)) * Bump service-identity from 21.1.0 to 23.1.0. ([\matrix-org#16038](matrix-org#16038)) * Bump types-commonmark from 0.9.2.3 to 0.9.2.4. ([\matrix-org#16037](matrix-org#16037)) * Bump types-jsonschema from 4.17.0.8 to 4.17.0.10. ([\matrix-org#16036](matrix-org#16036)) * Bump types-netaddr from 0.8.0.8 to 0.8.0.9. ([\matrix-org#16035](matrix-org#16035)) * Bump types-opentracing from 2.4.10.5 to 2.4.10.6. ([\matrix-org#16078](matrix-org#16078)) * Bump types-setuptools from 68.0.0.0 to 68.0.0.3. ([\matrix-org#16079](matrix-org#16079))
No significant changes since 1.90.0rc1. - Scope transaction IDs to devices (implement [MSC3970](matrix-org/matrix-spec-proposals#3970)). ([\matrix-org#15629](matrix-org#15629)) - Remove old rows from the `cache_invalidation_stream_by_instance` table automatically (this table is unused in SQLite). ([\matrix-org#15868](matrix-org#15868)) - Fix a long-standing bug where purging history and paginating simultaneously could lead to database corruption when using workers. ([\matrix-org#15791](matrix-org#15791)) - Fix a long-standing bug where profile endpoint returned a 404 when the user's display name was empty. ([\matrix-org#16012](matrix-org#16012)) - Fix a long-standing bug where the `synapse_port_db` failed to configure sequences for application services and partial stated rooms. ([\matrix-org#16043](matrix-org#16043)) - Fix long-standing bug with deletion in dehydrated devices v2. ([\matrix-org#16046](matrix-org#16046)) - Add `org.opencontainers.image.version` labels to Docker containers [published by Matrix.org](https://hub.docker.com/r/matrixdotorg/synapse). Contributed by Mo Balaa. ([\matrix-org#15972](matrix-org#15972), [\matrix-org#16009](matrix-org#16009)) - Add a internal documentation page describing the ["streams" used within Synapse](https://matrix-org.github.io/synapse/v1.90/development/synapse_architecture/streams.html). ([\matrix-org#16015](matrix-org#16015)) - Clarify comment on the keys/upload over replication enpoint. ([\matrix-org#16016](matrix-org#16016)) - Do not expose Admin API in caddy reverse proxy example. Contributed by @NilsIrl. ([\matrix-org#16027](matrix-org#16027)) - Remove support for legacy application service paths. ([\matrix-org#15964](matrix-org#15964)) - Move support for application service query parameter authorization behind a configuration option. ([\matrix-org#16017](matrix-org#16017)) - Update SQL queries to inline boolean parameters as supported in SQLite 3.27. ([\matrix-org#15525](matrix-org#15525)) - Allow for the configuration of the backoff algorithm for federation destinations. ([\matrix-org#15754](matrix-org#15754)) - Allow modules to check whether the current worker is configured to run background tasks. ([\matrix-org#15991](matrix-org#15991)) - Update support for [MSC3958](matrix-org/matrix-spec-proposals#3958) to match the latest revision of the MSC. ([\matrix-org#15992](matrix-org#15992)) - Allow modules to schedule delayed background calls. ([\matrix-org#15993](matrix-org#15993)) - Properly overwrite the `redacts` content-property for forwards-compatibility with room versions 1 through 10. ([\matrix-org#16013](matrix-org#16013)) - Fix building the nix development environment on MacOS systems. ([\matrix-org#16019](matrix-org#16019)) - Remove leading and trailing spaces when setting a display name. ([\matrix-org#16031](matrix-org#16031)) - Combine duplicated code. ([\matrix-org#16023](matrix-org#16023)) - Collect additional metrics from `ResponseCache` for eviction. ([\matrix-org#16028](matrix-org#16028)) - Fix endpoint improperly declaring support for MSC3814. ([\matrix-org#16068](matrix-org#16068)) - Drop backwards compat hack for event serialization. ([\matrix-org#16069](matrix-org#16069)) * Update PyYAML to 6.0.1. ([\matrix-org#16011](matrix-org#16011)) * Bump cryptography from 41.0.2 to 41.0.3. ([\matrix-org#16048](matrix-org#16048)) * Bump furo from 2023.5.20 to 2023.7.26. ([\matrix-org#16077](matrix-org#16077)) * Bump immutabledict from 2.2.4 to 3.0.0. ([\matrix-org#16034](matrix-org#16034)) * Update certifi to 2023.7.22 and pygments to 2.15.1. ([\matrix-org#16044](matrix-org#16044)) * Bump jsonschema from 4.18.3 to 4.19.0. ([\matrix-org#16081](matrix-org#16081)) * Bump phonenumbers from 8.13.14 to 8.13.18. ([\matrix-org#16076](matrix-org#16076)) * Bump regex from 1.9.1 to 1.9.3. ([\matrix-org#16073](matrix-org#16073)) * Bump serde from 1.0.171 to 1.0.175. ([\matrix-org#15982](matrix-org#15982)) * Bump serde from 1.0.175 to 1.0.179. ([\matrix-org#16033](matrix-org#16033)) * Bump serde from 1.0.179 to 1.0.183. ([\matrix-org#16074](matrix-org#16074)) * Bump serde_json from 1.0.103 to 1.0.104. ([\matrix-org#16032](matrix-org#16032)) * Bump service-identity from 21.1.0 to 23.1.0. ([\matrix-org#16038](matrix-org#16038)) * Bump types-commonmark from 0.9.2.3 to 0.9.2.4. ([\matrix-org#16037](matrix-org#16037)) * Bump types-jsonschema from 4.17.0.8 to 4.17.0.10. ([\matrix-org#16036](matrix-org#16036)) * Bump types-netaddr from 0.8.0.8 to 0.8.0.9. ([\matrix-org#16035](matrix-org#16035)) * Bump types-opentracing from 2.4.10.5 to 2.4.10.6. ([\matrix-org#16078](matrix-org#16078)) * Bump types-setuptools from 68.0.0.0 to 68.0.0.3. ([\matrix-org#16079](matrix-org#16079)) # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE8SRSDO7gYkSP4chELS76LzL74EcFAmTbUOEACgkQLS76LzL7 # 4Efskw/+J4X30PoqSvBWbilr8kTzwNSDXrkefYXR2sLburgowCyuAtKtCdbvnZUX # 3KRwii5/GDsduXiNY836oRxO/KWE43b1ce9C9qJM7V6NmgkJBgHRvnh69wdlmBqt # 6b6TQHoEByYS7yK90+QsRm1Bqrw7eoVO9oxcZ+4lb7Mjswf491Pga8kFJqdvjtTX # UXo4vAqYyP6Yn7sUrQmXy0N8gZ5ZFHhZEvZZ8+iEsNjPO468cSVGq8/iPB1EwBm2 # nbfZWMDnD2p7plJezXOPEBxnVR3RrWbCbK08SiiNMcQynCvBgAUfkd3GnsO726jb # 19i8p6tjuj2r41UgqYCTG2i2ij6uJquA/qq3rIiVNQVKG9aPHQ8hJfu9XOdEvaJe # Je9H6QFrU/hR640tFvb5Hdc/4ThabvtC5xgl4ZGT6y6I0s5LNwk8fJiz3sDFt0i1 # XKsqGVemBGopZnwjQIPFJaHjPT7of33PXLE/hf1vX+oXU/6MNbFYkDLY9nnnQeOx # 0GEbiYaxrj8SfxNmEykMLNCfxwJ719cSR1q8vPYn6r6TOS1pJMV0SgciXoaQ/VW6 # WlRpZolvXYSye34JW8Rg4ojAz0oYfJ2IiUpwY7eSEq4DtuosTjEECKrgB8DLqKlM # +qDd4/yeqVN5/sui5oGsR71aTMy/jdnzqmdmuFvsSwz9/7PfMEU= # =FWp5 # -----END PGP SIGNATURE----- # gpg: Signature made Tue Aug 15 11:18:09 2023 BST # gpg: using RSA key F124520CEEE062448FE1C8442D2EFA2F32FBE047 # gpg: Can't check signature: No public key # Conflicts: # .github/workflows/docker.yml # CHANGES.md # Cargo.lock # debian/changelog # docs/upgrade.md # flake.lock # poetry.lock # pyproject.toml # rust/src/push/base_rules.rs # synapse/events/utils.py # synapse/handlers/pagination.py # synapse/http/site.py # synapse/rest/client/devices.py # synapse/storage/databases/main/roommember.py # synapse/storage/schema/__init__.py # tests/rest/client/test_devices.py # tests/rest/client/test_redactions.py
I propose a mechanism to clean up old (> 7d old, to be very generous) rows from the
cache_invalidation_stream_by_instance
table.On LPnet, this table is 700 MB (4e6 rows) and has never been cleaned before. On Morg, this table is 191 GB (8e8 rows) but seems to have been manually cleaned in 2020.
These rows are only needed for mere instants whilst other workers catch up with the stream, so there is no utility in keeping them around for longer than a few instants.
However, to allow for clock drift, debugging and just generally out of caution, I set the cut-off to 7 days.
Fixes: #3665
Base:
develop
Original commit schedule, with full messages:
Add a cache invalidation clean-up task
Run the cache invalidation stream clean-up on the background worker
Tune down
call_later is in millis!