Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Experimental Federation Speedup #9702

Merged
merged 27 commits into from
Apr 14, 2021

Conversation

ShadowJonathan
Copy link
Contributor

@ShadowJonathan ShadowJonathan commented Mar 26, 2021

Bit of a continuation from #9639's speedup intent and #9651's confusion.

This basically speeds up federation by "squeezing" each individual dual database call (to destinations and destination_rooms), which previously happened per every event, into one call for an entire batch (100 max).

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
  • Pull request includes a sign off
  • Code style is correct (run the linters)

Signed-off-by: Jonathan de Jong <[email protected]>

@ShadowJonathan
Copy link
Contributor Author

(Note: Some CI tests are failing with SyntaxError: 'await' expressions in comprehensions are not supported, however, they are supported in 3.6+, with 1.31.0 being the last release that supports 3.5 (for which a release candidate is already out), i'm ignoring these tests.)

@ShadowJonathan ShadowJonathan marked this pull request as ready for review March 31, 2021 11:20
@clokep clokep requested a review from a team March 31, 2021 11:32
# Handle all events, skip if handle_event returns None
handled_events = (
ret
for ret in [await handle_event(event) for event in events]
Copy link
Contributor Author

@ShadowJonathan ShadowJonathan Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small warning: if we change this [] comprehension into a generator (()), python will start throwing errors about async iterators, because to work with an async iterator/generator in a sub-comprehension, you have to prefix every for with async for.

I had split it before i realised it didn't solve the problem, and then i set this small generator to [], which fixed it, but kept the split.

This'll be a small memory tradeoff for readability, as this'd create a list of len(events), but then the surrounding generator will filter out all None values, which will have itself take up no space until it is consumed by the dict comprehension

so this is a len(events) list of Optional[Tuple[Eventbase, Collection[str]], then comprehended down into Dict[EventBase, Collection[str]]

synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
)

# Send corresponding events to each destination queue
self._distribute_pdus(event_to_dests)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distribution happens after database access, because then a database-access-crash-loop will cause no events to be spammed to other servers, but a crash loop will also cause federation to stall, which is much more noticeable in the logs (as only the errors will be visible)

also, i think it's "safer" to not have a race condition where destination_room values are set after pdus are actually distributed

anywho, if either of these functions throw, it wont reach update_federation_out_pos, and it'll loop around with finally: and the next poke from a database sequence

room_with_dest_stream_ordering = {} # type: Dict[Tuple[str, str], int]

# List of events to send to each destination
dest_to_events = {} # type: Dict[str, List[EventBase]]
Copy link
Contributor Author

@ShadowJonathan ShadowJonathan Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually fairly cheap, the memory for EventBase is already allocated, and it being an object, it's simply referenced.

Each list takes 64 bytes up-front*, and then 8 bytes with each pointer, this means that for the absolute worst-case scenario (100 events destined for #matrix, with around 2k servers), it'll be about 1728000 bytes (2000 * (64 + (8*100))), 1.7 Mb, excluding memory use for destination strings (which are immutably referenced) and dict internals for mapping.

(sending 1 event to 2000 servers that way is 144000 bytes, 144kb, btw)

(* the 64 bytes is the up-front cost for a python list, python's objects are quite large that way. That number is for python 3.6, and in python 3.7 they're 72 bytes, but in python 3.8 they're 58 bytes. Lists use C pointers, which are 8 bytes each.)

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few initial thoughts here. It seems generally plausible.

synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/storage/databases/main/transactions.py Outdated Show resolved Hide resolved
synapse/storage/databases/main/transactions.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
Co-authored-by: Richard van der Hoff <[email protected]>
@ShadowJonathan ShadowJonathan requested a review from richvdh March 31, 2021 19:54
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/per_destination_queue.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
@ShadowJonathan ShadowJonathan requested a review from richvdh April 2, 2021 08:35
@ShadowJonathan
Copy link
Contributor Author

check-style is currently borked due to #9734 :/

@ShadowJonathan
Copy link
Contributor Author

Its fixed on develop, merged it into here to ensure that CI runs again

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is looking good, save a few outstanding nits.

Have you tried it in practice? Does it make much difference?

synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
synapse/federation/sender/per_destination_queue.py Outdated Show resolved Hide resolved
synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
@ShadowJonathan
Copy link
Contributor Author

Have you tried it in practice? Does it make much difference?

I haven't, but i also need to set up a generic "dev server" somewhere, and i may do so after/during this PR

ShadowJonathan and others added 2 commits April 9, 2021 10:11
Co-authored-by: Richard van der Hoff <[email protected]>
@ShadowJonathan ShadowJonathan requested a review from richvdh April 9, 2021 08:28
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good other than the below. I'm going to try running it on sw1v.org.

synapse/federation/sender/__init__.py Outdated Show resolved Hide resolved
@richvdh
Copy link
Member

richvdh commented Apr 14, 2021

🤷‍♂️

@richvdh richvdh enabled auto-merge (squash) April 14, 2021 15:49
@richvdh richvdh merged commit 05e8c70 into matrix-org:develop Apr 14, 2021
anoadragon453 added a commit that referenced this pull request Apr 27, 2021
anoadragon453 added a commit that referenced this pull request Apr 28, 2021
anoadragon453 added a commit that referenced this pull request Apr 28, 2021
@anoadragon453
Copy link
Member

anoadragon453 commented Apr 29, 2021

We reverted this PR before cutting 1.33.0rc1 due to some performance regressions in sending PDUs over federation. The details are in #9863.

ShadowJonathan added a commit to ShadowJonathan/synapse that referenced this pull request Apr 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants