-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Synchroton causes increased CPU usage requesting non existing users devices #11928
Comments
Since restarting might fix it, I will just let the worker keep on running for now and see, if it recovers on its own. |
I did now delete those users from the
|
Could you try to profile the synchrotron using py-spy? It would help figuring out where the CPU is spending its time. Alternately, you could try removing the users that are causing issues from the |
I actually already removed those users. It seems that only a specific client of mine causes the issues because it sets the presence in the sync call. I'm still digging why though. |
Since I don't think any update caused it, could it be postgres switching to a different query plan? |
So when you close this client the issue disappears?
Maybe. Looking at both your Grafana screenshots and the flame graph, it looks like it's struggling on this request: synapse/synapse/storage/databases/main/roommember.py Lines 187 to 194 in 10a88ba
So it could be worth doing an |
I'll do that later today, the CPU usage persists across restarts anyway...
Yes. |
Oh interesting. In which case I would suggest turning on debug logging for SQL requests on for this synchrotron (see this log config doc) and hunting down the room ID (and if there's just one room being problematic or multiple ones). With debug logging on Synapse logs the request's string on one line and the arguments on another, so that's how you retrieve them. Using the room ID with
Just to be clear, this is the only client setting the presence in the sync request? Also, re
Sorry, I completely missed you mentioned it when I first read your first few messages 😅 |
No, there are a few other clients too, but not for that account, I stopped them all for debugging |
Also, that client saw no code changes related to presence in the last few months, so while it could be a client side bug, I don't think it was an issue before. |
I got a few device_list updates and a few thousand presence updates at that time and then it stopped doing that, so that probably unbricked it. |
This is the event creator log from around that time:
I think one event over federation caused this to fix itself and it seemed the event creator fixed it: Could it have been the missing chain ids? |
So it started happening again all of a sudden and I think I have a useful logfile, but as it contains my whole roomlist, I would prefer to not post it publicly. Feel free to dm me for the file and I also sent it to @babolivier already. But the gist of it is, that it generates a room entry for pretty much my whole room list (and also fetches all of their users). My assumption is that maybe a state reset happened in one of the rooms, which confused the sync worker and causes it to fetch more rooms than necessary. It is basically entering synapse/synapse/handlers/sync.py Line 1579 in 64ec45f
|
FTR #11974 may help with |
I guess I should close this, since it got fixed by just increasing the size for that specific cache. Having autoadapting caches in the future would be great though :3 |
Description
My synapse shows increased load on its sync worker:
In the logs it repeats the following:
master.log:
Synchrotron logs just show normal sync traffic:
Steps to reproduce
I have no idea, but it might be related to the spam attack servers getting deleted?
Version information
If not matrix.org:
Version: 1.52.0_rc1
Install method: ebuild
The text was updated successfully, but these errors were encountered: