Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix bug in account data replication stream. #7656

Merged
merged 4 commits into from
Jun 9, 2020

Conversation

erikjohnston
Copy link
Member

This (hopefully) fixes both the root cause of stream IDs being allocated more than once and handling such cases in the replication code.

See the two commits for details.

Fixes #7617

The account data stream is shared between three tables, and the maximum
allocated ID was tracked in a dedicated table. Updating the max ID
happened outside the transaction that allocated the ID, leading to a
race where if the server was eg restarted then the same ID could be
allocated but the max ID failed to be updated, leading it to be reused.

The ID generators have support for tracking across multiple tables, so
we may as well use that instead of a dedicated table.
If the same stream ID was used in both global and room account data then
the getting udpates for the replication stream would fail due to
`heapq.merge(..)` trying to compare a `str` with a `None`. (This is
because you'd have two rows like `(534, '!room')` and `(534, None)` from
the room and global account data tables).

Fix is just to order by stream ID, since we don't rely on the ordering
beyond that. The bug where stream IDs can be reused should be fixed now,
so this case shouldn't happen going forward.

Fixes #7617
@erikjohnston erikjohnston requested a review from a team June 8, 2020 16:58
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Do we need to remove any database tables/rows as a database migration if things were previously being tracked in a dedicated table?

@erikjohnston
Copy link
Member Author

I was going to leave them in so to allow for fallback, but that won't actually work here.

@erikjohnston
Copy link
Member Author

I've added the updates back so that its possible for server admins to roll the synapse version back. Next time we update the database version we can remove the code and drop the table. We did something similar for #7436

Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good to know! Lgtm :)

@erikjohnston erikjohnston merged commit 664409b into develop Jun 9, 2020
@erikjohnston erikjohnston deleted the erikj/fix_account_data branch June 9, 2020 15:28
babolivier added a commit that referenced this pull request Jun 10, 2020
…rg-hotfixes

Synapse 1.15.0rc1 (2020-06-09)
==============================

Features
--------

- Advertise support for Client-Server API r0.6.0 and remove related unstable feature flags. ([\#6585](#6585))
- Add an option to disable autojoining rooms for guest accounts. ([\#6637](#6637))
- For SAML authentication, add the ability to pass email addresses to be added to new users' accounts via SAML attributes. Contributed by Christopher Cooper. ([\#7385](#7385))
- Add admin APIs to allow server admins to manage users' devices. Contributed by @dklimpel. ([\#7481](#7481))
- Add support for generating thumbnails for WebP images. Previously, users would see an empty box instead of preview image. ([\#7586](#7586))
- Support the standardized `m.login.sso` user-interactive authentication flow. ([\#7630](#7630))

Bugfixes
--------

- Allow new users to be registered via the admin API even if the monthly active user limit has been reached. Contributed by @dkimpel. ([\#7263](#7263))
- Fix email notifications not being enabled for new users when created via the Admin API. ([\#7267](#7267))
- Fix str placeholders in an instance of `PrepareDatabaseException`. Introduced in Synapse v1.8.0. ([\#7575](#7575))
- Fix a bug in automatic user creation during first time login with `m.login.jwt`. Regression in v1.6.0. Contributed by @olof. ([\#7585](#7585))
- Fix a bug causing the cross-signing keys to be ignored when resyncing a device list. ([\#7594](#7594))
- Fix metrics failing when there is a large number of active background processes. ([\#7597](#7597))
- Fix bug where returning rooms for a group would fail if it included a room that the server was not in. ([\#7599](#7599))
- Fix duplicate key violation when persisting read markers. ([\#7607](#7607))
- Prevent an entire iteration of the device list resync loop from failing if one server responds with a malformed result. ([\#7609](#7609))
- Fix exceptions when fetching events from a remote host fails. ([\#7622](#7622))
- Make `synctl restart` start synapse if it wasn't running. ([\#7624](#7624))
- Pass device information through to the login endpoint when using the login fallback. ([\#7629](#7629))
- Advertise the `m.login.token` login flow when OpenID Connect is enabled. ([\#7631](#7631))
- Fix bug in account data replication stream. ([\#7656](#7656))

Improved Documentation
----------------------

- Update the OpenBSD installation instructions. ([\#7587](#7587))
- Advertise Python 3.8 support in `setup.py`. ([\#7602](#7602))
- Add a link to `#synapse:matrix.org` in the troubleshooting section of the README. ([\#7603](#7603))
- Clarifications to the admin api documentation. ([\#7647](#7647))

Internal Changes
----------------

- Convert the identity handler to async/await. ([\#7561](#7561))
- Improve query performance for fetching state from a PostgreSQL database. ([\#7567](#7567))
- Speed up processing of federation stream RDATA rows. ([\#7584](#7584))
- Add comment to systemd example to show postgresql dependency. ([\#7591](#7591))
- Refactor `Ratelimiter` to limit the amount of expensive config value accesses. ([\#7595](#7595))
- Convert groups handlers to async/await. ([\#7600](#7600))
- Clean up exception handling in `SAML2ResponseResource`. ([\#7614](#7614))
- Check that all asynchronous tasks succeed and general cleanup of `MonthlyActiveUsersTestCase` and `TestMauLimit`. ([\#7619](#7619))
- Convert `get_user_id_by_threepid` to async/await. ([\#7620](#7620))
- Switch to upstream `dh-virtualenv` rather than our fork for Debian package builds. ([\#7621](#7621))
- Update CI scripts to check the number in the newsfile fragment. ([\#7623](#7623))
- Check if the localpart of a Matrix ID is reserved for guest users earlier in the registration flow, as well as when responding to requests to `/register/available`. ([\#7625](#7625))
- Minor cleanups to OpenID Connect integration. ([\#7628](#7628))
- Attempt to fix flaky test: `PhoneHomeStatsTestCase.test_performance_100`. ([\#7634](#7634))
- Fix typos of `m.olm.curve25519-aes-sha2` and `m.megolm.v1.aes-sha2` in comments, test files. ([\#7637](#7637))
- Convert user directory, state deltas, and stats handlers to async/await. ([\#7640](#7640))
- Remove some unused constants. ([\#7644](#7644))
- Fix type information on `assert_*_is_admin` methods. ([\#7645](#7645))
- Convert registration handler to async/await. ([\#7649](#7649))
phil-flex pushed a commit to phil-flex/synapse that referenced this pull request Jun 16, 2020
* Ensure account data stream IDs are unique.

The account data stream is shared between three tables, and the maximum
allocated ID was tracked in a dedicated table. Updating the max ID
happened outside the transaction that allocated the ID, leading to a
race where if the server was restarted then the same ID could be
allocated but the max ID failed to be updated, leading it to be reused.

The ID generators have support for tracking across multiple tables, so
we may as well use that instead of a dedicated table.

* Fix bug in account data replication stream.

If the same stream ID was used in both global and room account data then
the getting updates for the replication stream would fail due to
`heapq.merge(..)` trying to compare a `str` with a `None`. (This is
because you'd have two rows like `(534, '!room')` and `(534, None)` from
the room and global account data tables).

Fix is just to order by stream ID, since we don't rely on the ordering
beyond that. The bug where stream IDs can be reused should be fixed now,
so this case shouldn't happen going forward.

Fixes matrix-org#7617
babolivier pushed a commit that referenced this pull request Sep 1, 2021
…dinsic-release-v1.15.x

* 'release-v1.15.0' of github.com:matrix-org/synapse: (55 commits)
  1.15.0
  Fix some attributions
  Update CHANGES.md
  1.15.0rc1
  Revert "1.15.0rc1"
  1.15.0rc1
  Fix bug in account data replication stream. (#7656)
  Convert the registration handler to async/await. (#7649)
  Accept device information at the login fallback endpoint. (#7629)
  Convert user directory handler and related classes to async/await. (#7640)
  Add an option to disable autojoin for guest accounts (#6637)
  Clarifications to the admin api documentation (#7647)
  Update to the stable SSO prefix for UI Auth. (#7630)
  Fix type information on `assert_*_is_admin` methods (#7645)
  Remove some unused constants. (#7644)
  Typo fixes.
  Allow new users to be registered via the admin API even if the monthly active user limit has been reached (#7263)
  Add device management to admin API (#7481)
  Attempt to fix PhoneHomeStatsTestCase.test_performance_100 being flaky. (#7634)
  Support CS API v0.6.0 (#6585)
  ...
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to handle stream account_data: TypeError: '<' not supported between instances of 'str' and 'NoneType'
2 participants