Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Race condition with replication means that publishing room aliases lacks read-after-write consistency between workers #14210

Open
DMRobertson opened this issue Oct 17, 2022 · 0 comments
Labels
A-Testing Issues related to testing in complement, synapse, etc A-Workers Problems related to running Synapse in Worker Mode (or replication) O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. Z-Read-After-Write A lack of read-after-write consistency, usually due to cache invalidation races with workers

Comments

@DMRobertson
Copy link
Contributor

DMRobertson commented Oct 17, 2022

Consider the following sequence of events:

  1. Alice creates a room without any aliases.
  2. Alice lists aliases for that room.
  3. Alice sets an alias for that room.
  4. Alice lists aliases for that room.

If the alias writes occur on a separate worker to the reads, this is vulnerable to a classic worker cache invalidation race:

  • (2) succeeds because the reader has no cached alias information for the room. It queries the database (which is written before (1) completes) and caches the result.
  • (3) succeeds on the writer, which fires off a message telling readers to invalidate their caches.
  • ⚠️ If request (4) arrives before the reader has received and processed the invalidation, the reader will return the (now stale) data in its cache. This means Alice has failed to read her own write.

I don't think actual humans edit and then immediately list aliases that often, so I suggest we don't worry about fixing this. (i.e. I think this only manifests as test flakes). But I wanted to write this up as a reference. (It would be nice to have a catalogue of known races like this).

History:

See issues labeled with Z-Read-After-Write A lack of read-after-write consistency, usually due to cache invalidation races with workers

And previous related history specifically around aliases:

@DMRobertson DMRobertson added A-Workers Problems related to running Synapse in Worker Mode (or replication) S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. O-Uncommon Most users are unlikely to come across this or unexpected workflow Z-Read-After-Write A lack of read-after-write consistency, usually due to cache invalidation races with workers labels Oct 17, 2022
@MadLittleMods MadLittleMods added the A-Testing Issues related to testing in complement, synapse, etc label Oct 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Testing Issues related to testing in complement, synapse, etc A-Workers Problems related to running Synapse in Worker Mode (or replication) O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. Z-Read-After-Write A lack of read-after-write consistency, usually due to cache invalidation races with workers
Projects
None yet
Development

No branches or pull requests

2 participants