Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Filter out auth chain queries that don't exist #16552

1 change: 1 addition & 0 deletions changelog.d/16552.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Reduce a little database load while processing state auth chains.
8 changes: 6 additions & 2 deletions synapse/storage/databases/main/event_federation.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that chain IDs are never 0. They come from a DB-backed sequence:

self.event_chain_id_gen = build_sequence_generator(
db_conn,
database.engine,
get_chain_id_txn,
"event_auth_chain_id",
table="event_auth_chains",
id_column="chain_id",
)

CREATE SEQUENCE IF NOT EXISTS event_auth_chain_id;

which per https://www.postgresql.org/docs/16/sql-createsequence.html#id-1.9.3.81.6 should start at 1 and increase by 1.

Therefore querying for events in chain 0 will never return anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, your point is that the sequence_number in the chains is never 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these are also seeded at 1 and grow by 1:

# We found a chain ID/sequence number candidate, check its
# not already taken.
proposed_new_id = existing_chain_id[0]
proposed_new_seq = existing_chain_id[1] + 1
if chain_to_max_seq_no[proposed_new_id] < proposed_new_seq:
new_chain_tuple = (
proposed_new_id,
proposed_new_seq,
)

Original file line number Diff line number Diff line change
Expand Up @@ -299,9 +299,13 @@ def _get_auth_chain_ids_using_cover_index_txn(
)

# Add the initial set of chains, excluding the sequence corresponding to
# initial event.
# initial event. Ultimately, the chains dict will be what is pulled from the
# database, there is a chance that a sequence number here will end up being a
# '0', which doesn't exist. Don't bother sending that to the database query.
for chain_id, seq_no in event_chains.items():
chains[chain_id] = max(seq_no - 1, chains.get(chain_id, 0))
max_sequence_result = max(seq_no - 1, chains.get(chain_id, 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we understand when max_sequence_result is 0?

I think I've convinced myself that sequence numbers are strictly positive, and chains is a map to sequence numbers. Therefore we would need to have seq_no == 1 and chain_id not in chains. Is there any particular meaning to this situation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this corresponds to a brand-new chain that isn't the target of some other chain. Which sounds like the kind of thing that only happens at the start of a room? (cc @erikjohnston: is any of this sane?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically handling the case where an initial event is the first thing in the chain (and nothing else references it). This is actually pretty common, as we start a new chain whenever we see a new type / state key pair.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think I'd replace this change with something like:

        for chain_id, seq_no in event_chains.items():
            # Check if the initial event is the first item in the chain. If so, then
            # there is nothing new to add from this chain.
            if seq_no == 1:
                continue
                
            chains[chain_id] = max(seq_no - 1, chains.get(chain_id, 0))

Maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that is clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced logic in 94e22c2

Would you like me to lose that comment above the condition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yeah, I think that comment is now redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 3eaeaff

if max_sequence_result > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if comparing to != 0 would be clearer?

Suggested change
if max_sequence_result > 0:
if max_sequence_result != 0:

chains[chain_id] = max_sequence_result
Copy link
Contributor

@DMRobertson DMRobertson Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your change, is it possible that chains can be empty after this loop? If so, will the query below fall over if we pass in an empty chains.items() to execute_values?


# Now for each chain we figure out the maximum sequence number reachable
# from *any* event ID. Events with a sequence less than that are in the
Expand Down
Loading