Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Faster joins: check for partial state when handling backfill #13002

Closed
richvdh opened this issue Jun 9, 2022 · 4 comments · Fixed by #13355
Closed

Faster joins: check for partial state when handling backfill #13002

richvdh opened this issue Jun 9, 2022 · 4 comments · Fixed by #13355
Assignees
Labels
A-Federated-Join joins over federation generally suck T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@richvdh
Copy link
Member

richvdh commented Jun 9, 2022

Suppose:

  • We have two chunks of room timeline, each with partial state.
  • We attempt to backfill backwards from the later chunk, and fill the gap (so we reach the earlier chunk)

I think there is a bug here whereby we will incorrectly assume we have full state at the start of the backfilled chunk, which sounds messy.

state_ids = await self._resolve_state_at_missing_prevs(origin, event)
# TODO(faster_joins): make sure that _resolve_state_at_missing_prevs does
# not return partial state
# https://github.com/matrix-org/synapse/issues/13002
await self._process_received_pdu(
origin, event, state_ids=state_ids, backfilled=backfilled
)

Part of #12646

@richvdh richvdh added A-Federated-Join joins over federation generally suck T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Jun 9, 2022
@squahtx
Copy link
Contributor

squahtx commented Jul 5, 2022

A proposed fix which calculates the partial state flag for the backfilled event:
https://github.com/matrix-org/synapse/tree/squah/faster_room_joins_calculate_partial_state_flag_for_backfill

I'll try writing a test for it, but it looks slightly tricky to set up.

@squahtx
Copy link
Contributor

squahtx commented Jul 8, 2022

It turns out that _resolve_state_at_missing_prevs would block instead of returning partial state. So I modified it to return partial state and a partial_state flag instead of blocking, in the hope that we would be able to accept incoming and backfilled federation events before we had full state.

Except now _check_for_soft_fail is blocking:

File "/usr/local/lib/python3.9/site-packages/synapse/federation/federation_server.py", line 1158, in _process_incoming_pdus_in_room_inner
  await self._federation_event_handler.on_receive_pdu(
File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 238, in on_receive_pdu
  await self._get_missing_events_for_pdu(
File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 714, in _get_missing_events_for_pdu
  await self._process_pulled_events(origin, missing_events, backfilled=False)
File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 752, in _process_pulled_events
  await self._process_pulled_event(origin, ev, backfilled=backfilled)
File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 806, in _process_pulled_event
  await self._process_received_pdu(
File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 1174, in _process_received_pdu
  await self._check_for_soft_fail(event, state_ids, origin=origin)
File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 1696, in _check_for_soft_fail
  state_sets_d = await self._state_storage_controller.get_state_groups_ids(
File "/usr/local/lib/python3.9/site-packages/synapse/storage/controllers/state.py", line 105, in get_state_groups_ids
  event_to_groups = await self.get_state_group_for_events(
File "/usr/local/lib/python3.9/site-packages/synapse/storage/controllers/state.py", line 343, in get_state_group_for_events
  await self._partial_state_events_tracker.await_full_state(event_ids)
File "/usr/local/lib/python3.9/site-packages/synapse/storage/util/partial_state_events_tracker.py", line 78, in await_full_state
  logger.info( 

and I'm stuck because it's unclear to me what should happen when we have partial state there.

@richvdh
Copy link
Member Author

richvdh commented Jul 19, 2022

and I'm stuck because it's unclear to me what should happen when we have partial state there.

I think we discussed this in a DM. To record our conclusions: for now at least, we should just skip the soft-fail check when we have partial state. Hopefully state will sync up quick enough that it won't open too much of an abuse vector, but in any case we can improve this in the future.

@squahtx
Copy link
Contributor

squahtx commented Jul 21, 2022

Associated complement tests: matrix-org/complement#419

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federated-Join joins over federation generally suck T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants