Initial /sync isn't as fast as it should be when LL is enabled #3720

ara4n · 2018-08-20T15:40:56Z

I haven't had a chance to profile on my account using the test synchrotron yet, but using sync7 it takes ~3m to calculate a normal initial sync but ~4.5m to calculate one with the LL filter turned on.

I'm going to assume this is because we blow out the state group cache by forcing it reload everything from the DB because the cache isn't able to handle wildcards (i.e. the bit where we say "get any state.. other than members, which should be limited to this subset") due to #3383 (i think?).

I had an idea for a solution to this though which I wanted to jot down here for feedback:

AIUI, DictionaryCache currently lets you query a subset of dict_keys for a given cache entry - e.g. for state_group_cache the entry key would be a group ID, and the value will would be a dict of form { (m.room.member, user_id1): { state_event1 }, (m.room.member, user_id2): { state_event2 } }. If you try to get the cache for a given group ID with dict_keys of [(m.room.member, user_id1)] then you'll just get that cache entry.

However: can we fix this by just splitting the cache into two halves - one for members, and the other for non-members state events? Then our query becomes one request to load all state (for non-member state, which can be cached efficiently) - and another to load specific members keys (which lacks a wildcard, and so will also be cached efficiently).

@richvdh, does this sound plausible?

The text was updated successfully, but these errors were encountered:

ara4n · 2018-08-20T23:53:10Z

I went ahead anyway and implemented a WIP of this over at #3726.

richvdh · 2018-08-21T11:08:54Z

e.g. for state_group_cache the entry key would be a group ID, and the value will would be a dict of form { (m.room.member, user_id1): { state_event1 }, (m.room.member, user_id2): { state_event2 } }

Fwiw the value will be a DictionaryEntry (whose value will be what you describe): it just allows the cache to store some extra metadata.

ara4n · 2018-08-22T11:49:49Z

#3726 seems to have sped things up so that LL is now about 30% faster than non-LL (based on my primary matrix.org account). However, it should be way faster than this, so i suspect something is still going wrong - it needs to be profiled.

ara4n · 2018-08-22T12:00:14Z

some datapoints: testing against sync7 with limit 1 and LL on/off - @matthew:matrix.org took

@matthew2:matrix.org: 0:03s for LL (3.2MB); 0:08s for non-LL (18.7MB)
@matthew:matrix.org: 2:57 for LL (23.3MB); 4:05 for non-LL (142MB)

I think @matthew:matrix.org might be blowing out the state event cache though for the whole synchrotron. I suspect we can improve this by reducing RAM of the sync (i.e. py3) and then tuning the cache sizing.

ara4n · 2018-08-22T12:03:02Z

(these figures should be taken with a pinch of salt as they were done against the live sync7, and there will be large caching artefacts). I just reran the LL one for matthew2 and it took 18s, for instance.

ara4n · 2018-08-22T12:14:11Z

uncached (by waiting long enough between syncs to let the cache churn), i'm seeing 23s for non-LL and 17s for LL on matthew2, which matches the same 30% faster i got for the matthew account.

ara4n · 2018-09-04T23:45:30Z

tested again against sync-test; the original #2970 code ran 5x faster in LL than in full. However, by now, it runs same speed with & without LL.

It looks like one major factor is that:

        # FIXME: it feels very heavy to load up every single membership event
        # just to calculate the counts.
        member_events = yield self.store.get_events(member_ids.values())

is a massive issue for the room summary behaviour as it loads all events from the DB and so eliminates all our benefits in trying to avoid loading membership events from the DB.

Separately, incremental syncs seem to be much slower than they should be too.

It feels like something else has slowed things down during the review process of #2970 - possibly the split cache isn't helping either in terms of txn count.

ara4n · 2018-09-04T23:45:58Z

re-triaging given the feature doesn't work otherwise.

ara4n mentioned this issue Aug 21, 2018

Lazyload findings element-hq/element-web#7182

Closed

8 tasks

ara4n changed the title ~~Initial /sync is about 1.5x slower when LL is enabled~~ Initial /sync isn't as fast as it should be when LL is enabled Aug 22, 2018

neilisfragile added p1 z-minor (Deprecated Label) labels Aug 28, 2018

ara4n added z-major (Deprecated Label) and removed z-major (Deprecated Label) labels Sep 4, 2018

ara4n added z-major (Deprecated Label) and removed z-minor (Deprecated Label) labels Sep 4, 2018

hawkowl closed this as completed in bb81e78 Sep 7, 2018

DMRobertson mentioned this issue Sep 26, 2022

Initial and incremental syncs with presence enabled eat up RAM until an OOM hits #13901

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial /sync isn't as fast as it should be when LL is enabled #3720

Initial /sync isn't as fast as it should be when LL is enabled #3720

ara4n commented Aug 20, 2018 •

edited

Loading

ara4n commented Aug 20, 2018

richvdh commented Aug 21, 2018

ara4n commented Aug 22, 2018

ara4n commented Aug 22, 2018

ara4n commented Aug 22, 2018

ara4n commented Aug 22, 2018

ara4n commented Sep 4, 2018

ara4n commented Sep 4, 2018

Initial /sync isn't as fast as it should be when LL is enabled #3720

Initial /sync isn't as fast as it should be when LL is enabled #3720

Comments

ara4n commented Aug 20, 2018 • edited Loading

ara4n commented Aug 20, 2018

richvdh commented Aug 21, 2018

ara4n commented Aug 22, 2018

ara4n commented Aug 22, 2018

ara4n commented Aug 22, 2018

ara4n commented Aug 22, 2018

ara4n commented Sep 4, 2018

ara4n commented Sep 4, 2018

ara4n commented Aug 20, 2018 •

edited

Loading