-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Synapse makes postgres to eat all ram until OOM #9182
Comments
I've seen similar things on my homeserver before switching my db over to much faster storage and I believe @jcgruenhage is also currently encountering this. |
Storage is Upcloud MaxIOPS, so it shouldn't be exactly sluggish, especially at this lever of synapse, still relateively small synapse instance (few 10's of users and bunch of bridges). |
Yeah, no, it's definitely not storage related in my case either. My postgres DB is on an NVMe SSD, so IOPS shouldn't be a problem. I've first observed this issue when migrating from Fedora (pg in docker) to Void (pg still in docker) on the host machine. Hardware didn't change, but the SSD switched from being mounted directly to having a LUKS encryption layer in between. Worth noting: all RAM means 32GB of RAM and 16GB of SWAP in my case. All other services stay under 8GB (total, not each) |
Throwing my voice into the ring, since upgrading to 1.25.0 I've had 5 instances of out-of-memory killer going after Synapse. Not exactly sure how to go about debugging it on my end. Its a low spec VPS, 2 vCPU and 4GB vRAM but I didn't have a high frequency of oom kills prior to 1.25.0, at least not enough to require quickly writing up a monitoring script to detect synapse being down and restarting it if it is. |
I'm not even sure when this exactly begun, it has been quite long time now and it is not getting better on its own :7 |
Added all used bridges to issue post, just to list everything possible, while rootisue still is synapses postgre threads growing. :) |
I've recently seen this on my homeserver (intelfx.name), yet to reproduce. |
Upcloud has some of the best storage performance in the business (even better than AWS EC2's solution), it really shouldn't be sluggish. And yeah, synapse 1.25 (and even prior) absolutely guzzles down working memory. with the more active rooms (both local and remotely accessed) and users on the server. |
@kaijuagenda This problem is not synapse eating ram itself, but synapse making postgre to eat all ram. |
A couple of things here:
|
I'd like to reiterate that for any odd reason it is specifically synapse related postgres threads that are growing uncontrollably, other ones behave. And more so only at hacklab.fi.. I'll try to dig around more on point 3 to see if I can muster something meaningful out from anywhere. |
Now I, too, have started experiencing this regularly on my homeserver, intelfx.name. It needs to be noted that it runs on an absolute monster of a machine, with 128Gi RAM — which does not really help. Over the course of one or two days, the machine eats all available RAM and OOMs. This is situation after 13h uptime:
per-unit memory usage
systemd does not show per-unit swap usage, so here it is: per-unit swap usage
Basically, postgres causes so much memory pressure that almost everything else spills into swap. PostgreSQL instance itself has been configured using PGTune with 32 GiB of RAM as the usage target because I'm dumb and don't know anything better ¯\_(ツ)_/¯: postgresql.conf excerpt
Now, you can say "you told it to use 32 GiB so here it is, using 32 GiB". But, looking at memory.stat for postgres cgroup, most of that memory is used for anonymous memory (i. e. in-process allocations) rather than page cache (as it should have been): memory.stat
I have even less experience debugging PostgreSQL than @olmari, so... Yeah, help wanted. |
https://dba.stackexchange.com/a/222815:
At least indirectly, this matches what I'm seeing. Synapse uses connection pooling internally, and all connections Synapse makes are long-lived:
@erikjohnston is there any way to make synapse's (twisted's?) connection pooler close database connections unconditionally (but cleanly) after a set amount of time? |
So, as a dirty hack, I've just set up pgbouncer with transaction pooling mode and connection lifetime of 600s and routed synapse through that. I have no idea if anything will break due to using a connection pooler on top of synapse's built-in pooler (or whether I'll lose performance due to mixing queries from different workers within a single connection), but I'll see how it goes. |
see #4473: ideally you'd find a way of changing the transaction isolation level on connections from the pgbouncer |
Oh well. It's a tough choice, then: have your server thrash and OOM every 20 hours, or live with a risk of internal inconsistency. |
Major update: Issue seems now to be completely gone after observing ~24h after big cleanup of our HS. and whole computer RAM RES seems to settle just over 4G, which perfect. Now I can't still say what is/was the rootcause for this behaviour, but we can have educated guesses. Hacklab.fi maintenance tasks done:
I'd suspect the missing media might have been the culprit, but I still honestly can't tell for sure... While at this point I feel indeed issue was implication of something else being the rootcause, it would mean that synapse doesn't know how to handle something in that situation... So at this time this ticket could also be closed, while I feel here is now important info to know what might have been the trigger for this issue, and in long run maybe synapse would handle things more automatically, say "purging" of nonexisting media perhaps? IDK, ideas though :) |
@olmari I also spend quite some time investigating that in the last weeks. The problem is in Adding an index with: So we should have a proper fix soonish, I am still gathering some metrics and see if we can do better. |
Not really. Despite me regularly performing most of those optimizations, my homeserver still suffers from the issue (I'm assuming that there is a better solution to the memory leak than just shooting off old backends).
Interesting. So the postgresql query is slower than the sqlite one and also the one that's causes the leak, and both of this is due to a missing index? |
With that index created, relevant queries still take up to 100ms for me:
I should assume this is normal? |
Interesting that things seem to get better after various clean ups, which somewhat suggests that maybe part of the problem here is large fragmentation of the DB? I don't really understand how any of it is related to memory usage of connections though :/ FTR, the new index really shouldn't be changing much, since we already have an index that covers Also, if you're seeing the SQLite style queries going faster than the postgres one, then something is going catastrophically wrong with your DB. The SQLite version is the same as the recursive query, except that the loop has been lifted up into Python.
Yeah, ideally such large queries should be cached, but its easy for the Synapse caches to be too small. Making our caches more intelligent is on our roadmap. We're also wanting to implement "incremental state res" at some point, which reuses works off deltas of state, which should also hopefully reduce the amount of times we pull large amounts of state from the DB. |
(Also FWIW I've been trying to figure out what we can do next to hunt this down, but I'm really drawing a bit of a blank 😞 ) |
I wasn't especially trying to tackle this problem with the cleanup, but indeed it ended to make this issue go away more or less fully in practice, which is ofcourse awesome, while it doesn't tell anything about the underlying root issue(s), be the failure on Synapse or in Postgresql. Instead of having timer to restart Postgres every 6h because it is eating all ram, now server ram RES grows maybe .3g per day or thereabouts... Way more manageable, even if something small would still be leaking in theory (and that grow caters to every single program on server, so who knows). |
It seems out HS instance is starting to exhibit this behaviour again, rate slowly keep climbing... We do have autoretention enabled, if that is something to do with how such unreferenced indexes might get to be created at the first place(?) I have not wanted to do manual PSQL index "fix" for our running production system. So, @MatMaul Any news on how is things going on with "official" fix for this? :) |
@olmari Sorry I haven't got the the time to iterate on this. I think the safest is probably to change the code to use the non RECURSIVE version of However, my own homeserver is running with the extra index since more than a month now with no drawback, outside of the space used by the index, which seems to be none ? When looking for size indexes I found ~5G for the new index, however the other index on this table ( |
I just delete and re-create the index to check, the size of the indexes on this table only grew by 10% (from 5GB to 5.5GB). |
so, this? #9182 (comment) :) Will do tonight, I think. |
Yes ! |
This is probably related to abusive traffic patterns on the network which we're currently mitigating. |
Are there any details on the problem and the supposed mitigation? just like administrative problems are rarely solved by technical means, technical problems are also rarely solved by administrative means... |
Indeed it seems that this mentioned abusive traffic amplifies this bug very much, so instead of "just" the attack vector itself we have this on top of that, doubling the trouble. |
|
Semi-related to dealing with some hosts post-spam, we had one host in EMS which ended up going into some very very expensive queries. The query is:
As soon as a sync request happened after restart, the host went into a pretty much locked state. We added a synchrotron to free the master to do resolve admin API's and such, but this only caused the db load to happen via the synchrotron. All sync attempts were timing out. According to Synapse logs the query was running for up to 20 minutes. Instead of exhausting the PostgreSQL server from ram it was being exchausted from CPU credits. At the time of these queries the postgres server still had plenty of credits left though, so was not constrained on ops yet.
Before creating the index we tried upping resources by a bucketful and also increasing the caches by a lot. Some of the rooms related to the long running queries did not related to high traffic rooms like Matrix HQ. One of the high running queries was linked to a state group for a room with only 500 members. After being pointed to this issue by @MatMaul I applied the index in #9182 (comment) . Pretty much immediately after that got created syncs started resolving. The index was created approx 3.50pm (EEST, for the times in the below graph and possible logs reference). Some grafana graphs: Happy to provide timestamps and server name on request to help debug this. The issue with sync taking massive amount of database time started during Monday 28th when we were going through hosts that had users in Matrix HQ and using the admin FE api's to try and help hosts with issues. At some point during that the host just stopped responding to all API calls (due to sync hogging the database), so possibility on this being related to FE clearing via the admin API's should not be ruled out (in this particular case). |
I got hinted to this issue and it reflects very much the issue we're seeing on our server in the recent (few months) past. Of course the OS is correct in swapping out unused processes, which is what happens here. The annoying part is that
We don't see any load or performance issues, other than we have to restart the service every 30GB of new swap to keep it running. Yes, the service disruption is less than a minute, but users don't like it despite it being short. Just as an additional data point. I cannot observe this on a smaller setup without workers. So maybe this is related to only a specific type of workers? |
How many and what kind of workers do you have? I have 4 generic and 4 federation senders, and it seems to level off at ~6 GiB of swap after 2 days or so. Total RAM is 8 GiB, and 16 GiB of swap. |
Our Workers:
We also had
For us it went up until our 32GB of swap were filled and OOMkiller started doing its work. |
@MacLemon wrote:
How are you counting the number of connections? Synapse's connection pooling is implemented by the Twisted library, but it should only create a new connection when there is more room in the pool, or when an existing connection fails. Do you see any errors in the server log? |
Bringing some cents. I added a really dirty hack to synapse to force reconnecting the connections in the pool. This keeps the memory usage stable and fairly low for me on a small HS. I expect the downside to be more IO but with a sensible timer it would probably be irrelevant. I think the correct approach to workaround runaway psql memory would be to have a configurable time/txn limit for connections in the pool until they will be reconnected and default it to a high value (~1 hour?). Let me know if this hack does anything good in a larger setup, please configure the random reconnect time from 30 seconds up if you're running a large HS. I don't know if this works with workers.
|
I experience this constant memory growth until OOM on two HS instances as well, and after a while of observing stuff and trying out almost every trick in this thread, nothing seems to work, some only lessen the issue (which I think is simply due to reducing the amount of data, reducing the rate of growth. However based on my observations it seems that the actual root cause is indeed simply having long-lived connections to postgres (which is bad, as mentioned already). Simply by forcing the connections to be closed and reopened the memory usage drops to expected low values no matter how it's done; restarting synapse and workers or killing the postgres backend processes responsible for the specific connections. And apparently the nice "hack" by hifi doing the same works as well, although I haven't tried it yet. Btw #5728 is loosely related to this, perhaps a fix for both would be a logic that would periodically go through the connections and either close (if more open than the pool minimum) or reconnect them. |
I feel that there is something inherently wrong or buggy in Postgresql (or some process leading to it) if it accumulates all the RAM despite any limits set in configuration. Ofcourse, if database-connection recycling cures the symptoms, I'd welcome to be utilized, especially if underlying rootcause can't be fixed. |
Based on my understanding it's a rather.. "unusual" (in world of databases) but intentional design choice to minimize the performance overhead by not having to do garbage collection or other freeing operations in the middle of everything else, and rather just do it all when the connection is no longer in active use to minimize the performance impact on running things. Though this then easily leads into people not realizing this design and using long-lived connections rather than short lived ones. |
I'll have to try this (it'll take a bit of extra work because I'll have to build my own docker image). I wonder if maybe this could be a config option or something, since there's probably a lot of people (myself included) who would gladly take a bit higher IO over high memory usage. |
Opened #10440 with more mergeable deterministic approach and less hacky. I'm running my small HS with a 1000 transaction limit but a sensible default is likely in the ballpark of 10 000 and 100 000 as it's not really required to keep reconnecting them all the time. |
We discussed this at some length in #synapse-dev yesterday . Relevant points were:
So, for whatever reason, matrix.org isn't suffering this memory bloat in quite the same way as is reported here. My main concern with the approach of just closing Postgres connections is that we are fixing the symptoms rather than understanding the problem. Using a connection pool is an extremely common technique in sql client apps, and if Postgres expects you to close the connection every so often, then I'd expect that to be well-documented. I'd be much happier with this approach if:
|
I might be talking well out of my ass here (my HS stopped experiencing this problem a while ago, of course it isn't a solution, but I can't debug a problem that doesn't happen...), but could it be heap fragmentation due to use of sbrk by the underlying allocator? Try to strace one of misbehaving postgres workers and see if it calls |
I've been running #10440 for two days now and the only memory leak I've had is with synapse itself which is slowly accumulating but it's not as bad as the postgres workers were. PostgreSQL itself has been freeing memory as configured and I haven't seen any issues so far. The total increase of memory usage is somewhere around 200MB on the whole system in two days which is pretty much all synapse. |
Thank you all for the fix! especially @hifi
@MatMaul Should this index stuff warrant its own issue? I mean "obviously" it does do something that without it is worse, so I feel this now more or less "side-track" shouldn't be totally forgotten either :) |
FTR, we've also been suffering from a PostgreSQL leak with synapse that was solved by turning off LLVM JIT compilation ( I'm looking forward to trying the transaction limit as an alternative. Still, it would be nice to know if the people suffering from memory leaks here have similar success turning off the JIT. |
Looks like a proper fix was committed to PostgreSQL recently and should be part of the next set of minor releases ( |
Description
Synapse makes related postgres processes to slowly eat all ram (16g) and swap until OOM hits.
Steps to reproduce
For this I do'nt have anything to really tell, will provide tried things later on.
Version information
When postgres and synapse is started, all synapse related postgres threads has RES quite exactly 250mb, after 12h or so their RES is approaching 1200mb each, median 1100. Obviously I have tried all the normal postgres tunables but nothing really makes perceivable difference on this behaviour. Restarting either synapse or postgres process will make memory go back to normal RES, just to climb away to OOM again
I have tried to dig what could possibly cause this behaviour, but at this point I an out of ideas. On comparable but maybe doubly bigger HS kapsi.fi that does have redis workers and whatnot enabled, does not experience this behaviour, postgres process is well happy with memory pretty much it has at startup.
Here is
postgres=# select datname, usename, pid, backend_start, xact_start, query_start, state_change, wait_event_type, wait_event, state, query from pg_stat_activity;
I've also tried to run
(gdb) p MemoryContextStats(TopMemoryContext)
onto one such process, here are results from startup and then when ram almost eaten:So, something in our instance of synapse makes it query sometihng from postgres that will never get released or accumulates over time, but I can't figure out what. Debugging help wanted, if there is no known things to do.
ADD: While rootissue is exactly synapse postgres growing, we also deploy following bridges on same server, should they bear anything in this equation:
mautrix-telegram
mautrix-whatsapp
mautrix-facebook
mautrix-signal
SmsMatrix bot
appservice-slack
appservice-discord
The text was updated successfully, but these errors were encountered: