Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Handling state_ids responses loads huge amounts of events into memory at once, causing OOMs #6597

Open
richvdh opened this issue Dec 24, 2019 · 6 comments
Labels
A-Federation A-Performance Performance, both client-facing and admin-facing O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@richvdh
Copy link
Member

richvdh commented Dec 24, 2019

For example: the response to matrix://matrix.org/_matrix/federation/v1/state_ids/%21jWRVQAlVGjigKCRGwS%3Amatrix.org?event_id=%241577162865947686RYsuB%3Amatrix.org includes an auth chain with 22327 events. surely there can't be that many events in $1577162865947686RYsuB:matrix.org's auth chain?

@richvdh
Copy link
Member Author

richvdh commented Dec 25, 2019

I wonder if this has been caused (or exacerbated) by us deploying #6556

@richvdh
Copy link
Member Author

richvdh commented Jan 2, 2020

right, so the problem here is that synapse is returning the auth_events for each of the events in the state of the room at the given event_id. This is consistent with the spec; however it seems to be utterly pointless. For a given event in the state, either:

  • we already have it, and will therefore know its auth_events
  • we don't have it, so will have to request it via /event which will tell us its auth events.

@richvdh
Copy link
Member Author

richvdh commented Jul 18, 2020

we don't have it, so will have to request it via /event which will tell us its auth events.

unfortunately, as of #7817, synapse won't recursively fetch auth events in this way :/

@richvdh richvdh changed the title matrix.org is serving insane state_ids responses which causes OOMs Handling state_ids responses loads huge amounts of events into memory at once, causing OOMs Jul 19, 2020
@richvdh
Copy link
Member Author

richvdh commented Jul 19, 2020

On reflection, it's correct that we return all these auth events. The problem is that we load them all into memory at once.

@richvdh
Copy link
Member Author

richvdh commented Mar 19, 2021

this might be a bit better since #9601, but it's still problematic

@MadLittleMods MadLittleMods added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. A-Federation labels Aug 24, 2022
@DMRobertson
Copy link
Contributor

To summarise:

  • /state_ids gives you two lists of strings: auth event ids and state event ids
  • We think that is completely sensible.
  • But loading all of these event ids into memory in order to provide that list to someone who is requesting /state_ids can be expensive.
  • If we had some way to stream in event ids from the DB and stream them out as part of the response, we could smooth out the spike in memory usage
    • Do we have an example of this? IIRC ijson is just for parsing?

Relevant bits of source:

class FederationStateIdsServlet(BaseFederationServerServlet):
PATH = "/state_ids/(?P<room_id>[^/]*)/?"
async def on_GET(
self,
origin: str,
content: Literal[None],
query: Dict[bytes, List[bytes]],
room_id: str,
) -> Tuple[int, JsonDict]:
return await self.handler.on_state_ids_request(
origin,
room_id,
parse_string_from_args(query, "event_id", None, required=True),
)

@trace
@tag_args
async def on_state_ids_request(
self, origin: str, room_id: str, event_id: str
) -> Tuple[int, JsonDict]:
if not event_id:
raise NotImplementedError("Specify an event")
await self._event_auth_handler.assert_host_in_room(room_id, origin)
origin_host, _ = parse_server_name(origin)
await self.check_server_matches_acl(origin_host, room_id)
resp = await self._state_ids_resp_cache.wrap(
(room_id, event_id),
self._on_state_ids_request_compute,
room_id,
event_id,
)
return 200, resp

@trace
@tag_args
async def _on_state_ids_request_compute(
self, room_id: str, event_id: str
) -> JsonDict:
state_ids = await self.handler.get_state_ids_for_pdu(room_id, event_id)
auth_chain_ids = await self.store.get_auth_chain_ids(room_id, state_ids)
return {"pdu_ids": state_ids, "auth_chain_ids": list(auth_chain_ids)}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federation A-Performance Performance, both client-facing and admin-facing O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

4 participants