Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Huge database queries from POST /_matrix/client/v3/keys/query #13580

Closed
richvdh opened this issue Aug 22, 2022 · 2 comments · Fixed by #13956
Closed

Huge database queries from POST /_matrix/client/v3/keys/query #13580

richvdh opened this issue Aug 22, 2022 · 2 comments · Fixed by #13956
Assignees
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@richvdh
Copy link
Member

richvdh commented Aug 22, 2022

We frequently see a rapid sequence of calls to POST /_matrix/client/v3/keys/query from a single device (usually Element/iOS, afaict from eyeballing the logs).

Each request typically contain queries for a large number of user ids (like, tens of thousands), of which many/most can be on the local server. We then translate that into a single SELECT query on devices and e2e_device_keys_json with a huge WHERE clause (here). This can then consume vast amounts of disk space on the database server, because it logs the query thousands of times (it creates many temporary files, and the query is logged each time).

There are several easy improvements to be made here:

  • Firstly, the query should be split up into much smaller batches.
  • Secondly, we should use an IN query rather than a huge list of (user_id = foo) OR (user_id = bar). This is more complicated than normal because we need to query on user_id and device_id, however I believe that postgres at least supports IN for tuples. It's more complicated still because we may have a mix of queries, some which match only user_id and some which match both user_id and device_id, so we may need to split the query list in two.
  • I don't really know why we query against devices at all. AFAICT the only thing that we pull out of it is the display_name, which is not in the spec for POST /_matrix/client/v3/keys/query (though it is mentioned in the examples and currently returned in practice. Still, I don't think we should return other peoples' device display names here.) Related: Device names are returned over federation by /keys/query even if allow_device_name_lookup_over_federation is false #13114, which discusses this further.
@richvdh
Copy link
Member Author

richvdh commented Aug 22, 2022

Related: #12890

@anoadragon453 anoadragon453 added S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Aug 22, 2022
@richvdh
Copy link
Member Author

richvdh commented Aug 23, 2022

no no, it's definitely not tolerable.

@richvdh richvdh added S-Major Major functionality / product severely impaired, no satisfactory workaround. and removed S-Tolerable Minor significance, cosmetic issues, low or no impact to users. labels Aug 23, 2022
@reivilibre reivilibre added O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db labels Aug 24, 2022
squahtx pushed a commit that referenced this issue Sep 29, 2022
Instead of running a single large query, run a single query for
user-only lookups and additional queries for batches of user device
lookups.

Resolves #13580.

Signed-off-by: Sean Quah <[email protected]>
squahtx pushed a commit that referenced this issue Sep 29, 2022
Instead of running a single large query, run a single query for
user-only lookups and additional queries for batches of user device
lookups.

Resolves #13580.

Signed-off-by: Sean Quah <[email protected]>
squahtx added a commit that referenced this issue Oct 3, 2022
Instead of running a single large query, run a single query for
user-only lookups and additional queries for batches of user device
lookups.

Resolves #13580.

Signed-off-by: Sean Quah <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants