Skip to content
This repository has been archived by the owner on Sep 10, 2024. It is now read-only.

Fully sync the devices with the homeserver #2971

Merged
merged 6 commits into from
Jul 16, 2024
Merged

Conversation

sandhose
Copy link
Member

@sandhose sandhose commented Jul 12, 2024

Fixes #2626

When we end session, instead of imperatively delete devices, we now sync the device list with Synapse.
This means that if two sessions have the same device, it will not wrongly delete the device.
It also means it's less likely for the device list to get out of sync.

This also makes all the devices creation synchronous, to make sure they exist on Synapse before we start using the new sessions.

This also uses per-user PG advisory locks around device syncing, to make sure we don't have race conditions when syncing those

Old rant about locking kept for posterity

One thing which I'm very not sure about is that there is potential races between the device creations and the device sync.
If there is an ongoing sync job, the sync job might build a list of devices from the database that doesn't have a new device. That device may already have been created from somewhere else on the homeserver, but haven't been committed yet to the database.
If this is the case, the device syncing job will delete from the homeserver the newly added device.

This is very unlikely to happen though, as we would need the user to login and logout at the same time. I'm not sure what the best way would be to make sure it never happens. A few possibilities I have in mind:

  • use a PG advisory lock around any device list modification
  • make device syncing/deletion only consider devices that were recently deleted (looking at the finished_at timestamp)
  • put the devices to delete in a staged zone, so that they get deleted a bit later on (30s-1min), confirming that they are indeed not in the database anymore
  • rework the job queue, so that we can schedule jobs in the future (like in 30s) and dedupe them. This way, any modification to the device list would trigger a full sync a little later, and those sync would happen only if there was no sync for that user in the last 30s
  • revert the 'synchronous' part of device creation, so that we do them through device syncs, which, assuming jobs are run one after another, should mean we don't have devices flipping from one state to another

Copy link

cloudflare-workers-and-pages bot commented Jul 12, 2024

Deploying matrix-authentication-service-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: dd2d73c
Status: ✅  Deploy successful!
Preview URL: https://84d4f02c.matrix-authentication-service-docs.pages.dev
Branch Preview URL: https://quenting-sync-devices.matrix-authentication-service-docs.pages.dev

View logs

@sandhose sandhose marked this pull request as draft July 12, 2024 16:27
sandhose added 4 commits July 15, 2024 10:19
This means Synapse won't have to provision them on the fly anymore
Because we now provision devices synchronously, we need to update the
tests so that the users exist on the fake homeserver.
@sandhose sandhose marked this pull request as ready for review July 15, 2024 12:14
@sandhose sandhose requested a review from reivilibre July 15, 2024 12:14
Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems reasonable.

I'm not super convinced about the pagination given we have to fetch all the rows anyway, I guess I'd normally fetch the rows in a streaming style but maybe that's just hard to do here?

@sandhose sandhose merged commit 3eab106 into main Jul 16, 2024
16 checks passed
@sandhose sandhose deleted the quenting/sync-devices branch July 29, 2024 12:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Devices can get out of sync between Synapse and MAS
2 participants