-
-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak #1625
Comments
Unfortunatelly I don't have the logs from the attached screen shot session. If this is a blocker in investigating this issue, then next time I will attach another screen shot and will copy the logs also before. |
I have a similar set up and was surprised to see zellij take 900Mb, I assume is not expected? |
Thank you for reporting this! Sorry we took so long to reply. I tried to reproduce this with the latest zellij version locally and I can confirm this behavior. That clearly shouldn't happen! I have a suspicion where this may originate. I'll investigate and get back to you once I found something. |
So this turns out to be more subtle than I had imagined. But I have another question: Do you have a scrollback limit set? Or is your scrollback infinite? |
No, I don't have by default any limit set. Should I set up a limit? |
I have been encountering this issue as well (I have seen Zellij take up > 1 GB of memory). I have attempted to reproduce this synthetically by:
Note: this may be distinct from the > 1 GB possible memory leak. It may also be the same. |
Update: It appears that the memory leak may possibly originate from here: zellij/zellij-client/src/stdin_handler.rs Line 14 in 5ad0429
loop block, the memory leak appears to go away. More investigation is needed, though, to figure out a fix and confirm the cause of the memory leak.
I'm not quite sure what |
I can confirm that the memory leaks are related to the scrollback. The memory is not (fully) released when the pane or tab is closed.
See also #2104 |
I did a bit of testing on this, and I'm starting to doubt that this is really a memory leak. I basically repeated the test above 10 times in the same zellij session and saw memory usage stay pretty much constant. If the memory was actually leaked I would have expected usage to go up 10x. Some searching around supports the idea (given that by default on linux rust uses malloc): https://stackoverflow.com/questions/45538993/why-dont-memory-allocators-actively-return-freed-memory-to-the-os/45539066#45539066. Repeating the same kinds of tests after having switched to jemalloc, gives different results and I observed memory usage drop significantly after closing a tab. |
I think you are correct, I have not repeated my tests often enough to average out the kind-of-nondeterministic behavior of the allocator. |
this layout causes zellij to rapidly consume all memory on the system, a recoverable condition only because I have 64 gigs of RAM and it takes it a bit to fill that much up. It gets to 20 gigs in seconds. It is unresponsive $ stty size
45 167 |
moved to #2407 This issue is more about memory usage in sessions that are long running and/or had a lot of tabs / panes / scrollback lines. |
When I have btop running in its own tab, zellij will gobble up 3–5 GB RAM in a matter of a couple of days.
|
This is a comment regarding a sub-thread on HN where possible memory leaking is mentioned. TL;DR: Zellij v0.37.2 is still leaking memory with btop running in a separate tab. SetupIn a freshly launched zellij with a fairly default configuration (I believe I changed the mouse value only) I opened 3 tabs:
In all three tabs I exec'ed into the running program, so To capture zellij's memory consumption I ran while :; do
ps faux \
| rg '[z]ellij --server' \
| timestamp \
| tee -a zellij-memory-leak.txt
# sleep for 1m 11s
sleep 71
done Results[…]
2023-06-27 19:51:32 kas 894490 2.2 6.1 38396436 125172 ? Sl 19:40 0:14 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain
2023-06-27 19:52:43 kas 894490 2.1 6.3 38412820 128628 ? Sl 19:40 0:15 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain
2023-06-27 19:53:54 kas 894490 1.9 6.4 38412820 130676 ? Sl 19:40 0:15 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain
[…]
2023-06-28 06:16:43 kas 894490 0.6 61.3 40483420 1238556 ? Sl Jun27 4:05 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain
2023-06-28 06:17:54 kas 894490 0.6 61.4 40483420 1240604 ? Sl Jun27 4:06 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain
2023-06-28 06:19:05 kas 894490 0.6 62.2 53066496 1257172 ? Sl Jun27 4:07 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain
[…]
[ btop is killed, and then:]
[…]
2023-06-28 06:21:56 kas 894490 0.6 4.6 13220532 94736 ? Sl Jun27 4:10 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain Initially zellij was using ~93 MB RAM, that immediately climbed in roughly 2 kB increments for each loop above. Interestingly, it plateaued at 148 MB after 20 minutes and stayed there for almost half an hour, but then it continued climbing (except for 5 times during the test period, where memory consumption decreased between 1_664 B and 41 kB between two loop rounds). The loop ran from 2023-06-27 @ 19:40 to 2023-06-28 @ 06:19 (i.e., 10h 39m), and at the end zellij was consuming 1_228 MB RAM (1.2 GB), all of which was released when the NotesI probably wouldn't notice this on a modern machine with lots of RAM, but on a small VPS with 2 GB RAM or less, zellij will eventually be killed with an OOM error by the system. For that reason I went back to using tmux on on remote machines because zellij would be killed after a day or two since I usually run btop in a separate tab. Non-standard software used: PS: The memory I am referring to is the RSS size as reported by |
PS: $ stty size
50 211
$ uname -av
Linux fyhav 6.3.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000 x86_64 GNU/Linux The whole thing is running on a 2 GB Linode VPS with 1 CPU running an ArchLinux installation. I'm not comfortable with attaching zellij's debug logs as they seem to include everything that zellij has seen during its entire runtime. (I reran the loop in a |
Hey, just to confirm: I'm reproducing this with the btop tab. Thanks for the detailed reproduction. I'll poke around and see what's up hopefully some time this month. |
Alright - so I issued a fix for this in #2675 - it will be released in the next version. Thank you very much @kseistrup for the reproduction. This issue has become a bit of a grab bag for memory issues, some of them actual issues others symptoms of other issues (eg. things we need to give an error about instead of crashing). So I'm going to close it now after this fix. If anyone is still experiencing memory issues (starting from next release), please open a separate issue and be sure to provide a detailed and minimal reproduction. Thanks everyone! |
I'm afraid the changes made in #2675 hasn't had any effect on the memleak issue when running 2023-08-28 10:57:06 kas 2730169 8.4 0.7 50470704 85532 ? Sl 10:51 0:26 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 10:57:06 kas 2730826 1.5 0.4 12687136 56976 ? Sl 10:52 0:04 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy
2023-08-28 10:58:17 kas 2730169 7.3 0.7 50476792 91100 ? Sl 10:51 0:28 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 10:58:17 kas 2730826 1.5 0.5 12687136 61432 ? Sl 10:52 0:04 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy
2023-08-28 10:59:28 kas 2730169 6.6 0.8 50478992 93504 ? Sl 10:51 0:30 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 10:59:28 kas 2730826 1.4 0.5 12687136 61432 ? Sl 10:52 0:05 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy
[…]
2023-08-28 12:16:28 kas 2730169 3.0 2.2 50724728 262328 ? Sl 10:51 2:34 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 12:16:28 kas 2730826 1.2 0.5 12687200 62464 ? Sl 10:52 1:02 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy
2023-08-28 12:17:39 kas 2730169 3.0 2.2 50724728 264840 ? Sl 10:51 2:36 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 12:17:39 kas 2730826 1.2 0.5 12687200 62464 ? Sl 10:52 1:03 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy
2023-08-28 12:18:50 kas 2730169 3.0 2.3 50724728 268596 ? Sl 10:51 2:38 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 12:18:50 kas 2730826 1.2 0.5 12687200 62464 ? Sl 10:52 1:04 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy
[ btop is killed, and then: ]
2023-08-28 12:20:01 kas 2730169 3.0 0.6 37873300 79588 ? Sl 10:51 2:40 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 12:20:01 kas 2730826 1.2 0.5 12687168 62336 ? Sl 10:52 1:05 \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy I only ran the loop for slightly more than an hour, but the results are clear: For every loop an average of 2653 bytes is lost (min: 1044, max: 5568), which is at least as much as in v0.37.2, and possibly a little worse. (The Zellij v0.38.0 hasn't reached my Linux distribution yet, so I used the pre-compiled |
@kseistrup - thanks for giving this a test so quickly. I spent some time on this issue in the past couple of months, and for me the fixes I made solved the problem with the separate btop tab. I'm sorry to hear it didn't for you. I am growing suspicious that this is caused by the behavior of the rust allocator that might be a bit too greedy about releasing memory here and there. If you are comfortable compiling the repo from source and giving it a try, I'm wondering if you'd be able to reproduce the issue with a different allocator: main...jemalloc EDIT: You might want to wait a day until I finish updating |
I'm not sure how a I also thought I wouldn't be able to compile |
Honestly I'm also baffled. I've been around this area with a very fine comb and the only thing I found to help is the linked fix in the recent version. We keep the colors in a stack allocated structure that's essentially an elaborate "terminal character configuration". It's the same size for each character whether you have styles or not (and it's really not that big). I thought this was somehow us not dropping (eventually deallocating) the terminal lines, but that also wasn't it. The particularities of this issue across machines and setups are the reason I currently suspect the allocator. Let's see. |
The jemalloc variant started promising: Immediately it climbed from 132 MB til 162 MB and stayed there for almost 20 minutes, so that looked great. But after one hour we're at 226 MB and climbing for every loop. 😒 I'll let it run until tomorrow to see if some sort of garbage collection takes place at some point. |
PS: I believe I'm running wild in B, kB and MB: |
I may have found something - the key seems to be having the tab open but not being rendered! @kseistrup would you be able to give #2745 a try? I did a quick test and got:
|
I'm not a developer so I hope I got it right, but I believe I have your branch up and running now. I'll be back as soon as I see a trend. Even the jemalloc appraoch blew up during the night: I woke up to a memory consumption that was 10 times the amount compared to what it was when I went to bed. |
TL;DR: The @tlinford branch does seems to act differently, but it is still leaking memory here. First I thought it had stabilised at 154_632 (whatever unit It doesn't seem to matter which tab is active: in stabler periods it doesn't change anything to switch away from I hope you guys can make more sense of it than I can. |
Thanks for checking it out! I'll keep investigating :) |
It's really a strange bug. I have been monitoring another instance (with your branch) since I wrote my previous reply: it rose rather fast to 303908 and has stayed there since. I'm running the same tabs each time, so I find it very strange that the behaviour can be so different… Computer programs are never creative, so there must be something that triggers the leak. |
This is what made me suspect the allocator, which in Rust's case doesn't immediately free memory when the relevant data is dropped, and so can be particular to the whole state of its memory space (the screen thread in this case). But it's really just guessing. We're chipping away at this (I think @tlinford found a really interesting leak in the output buffer) and trust him to find whatever can be found here. We might eventually "explain" this away, but I think it's a good idea to try and track down every stray byte we cannot explain at the very least. Thank you very much for helping us out @kseistrup ! |
I had no luck with a long-running instance of @tlinford's branch: over 75 minutes it climbed steadily from 130_748 to 303_908 where it stayed for 3 hours and then climbed on. When I woke up this morning it had reached 1_421_400. So same pattern as before. The |
If this is still an issue - is zellij/zellij-server/src/panes/grid.rs Lines 2631 to 2649 in f5f8521
mem::replace ing the fields with new ones, just .clear() the fields in self.alternate_screen_state and then mem::swap them with self.lines_above , self.viewport etc. That way it probably doesn't need to free then reallocate every time that gets hit.
Those were touched in #2675 so assuming that is a hot-ish loop. This change would make no difference if there is actually a leak, but might give the allocator a chance to catch up. |
@tlinford did a thorough investigation of this a couple of months ago. Maybe he can help us out with a quick writeup? |
I still see a minor leak when I run |
Thanks @kseistrup - @tlinford showed me graphs that show the memory being released after a few days. I know he's very busy, so I'm going to give him some time to write this up here. |
So, it' been a while now since I looked at this, but I ran an experiment and gathered some data on this (thanks for being patient @imsnif). At least for me the results were quite eye opening, and challenged what I thought I knew about memory (probably not a lot to start with 😄). I started a couple of different sessions on a VPS, and left them running for some days, all using this layout:
A couple of times I logged back in, reattached to each, and then left them detached again. This chart shows the memory usage over time (logged using @kseistrup's script) of these sessions, where:
Note: the peak for What I found surprising in all cases is that the memory usage takes quite a long time to go down from the initial peak, but does not seem to grow in a significant way afterwards (although the Another thing to note is that depending an how a user builds zellij one could end up with a program that uses one of multiple allocators, and each of these has different algorithms and behaviors:
I'm hoping that someone else more knowledgeable on allocators can also provide some insights into this, but for now my take is that one can't just look at There's also been some substantial changes to the code since I ran this, so it might be worth giving another experiment like this a try. |
@tlinford The memory jump when re-attaching is interesting. I have only ever seen numbers from an attached session, as I “never” detach a session. (I'm using the |
Thank you for taking the time to file this issue! Please follow the instructions and fill in the missing parts below the instructions, if it is meaningful. Try to be brief and concise.
In Case of Graphical or Performance Issues
/tmp/zellij-1000/zellij-log
, ie withcd /tmp/zellij-1000/
andrm -fr zellij-log/
zellij --debug
Please attach the files that were created in
/tmp/zellij-1000/zellij-log/
to the extent you are comfortable with.Basic information
zellij --version
:zellij 0.30.0
stty size
:46 197
uname -av
orver
(Windows):Linux ip-### Wed Dec 16 22:44:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
List of programs you interact with as,
PROGRAM --version
: output cropped meaningful, for example:nvim --version
: NVIM v0.5.0-dev+1299-g1c2e504d5 (used the appimage release)alacritty --version
: alacritty 0.7.2 (5ac8060b)Further information
Reproduction steps, noticeable behavior, related issues, etc
I have a long running
zellij
instance to which I connect and disconnect often (more than once per day). After weeks of using the same session, I notice that more than 2GB or RAM are being used by the zellij server. See the attached screen shot.The text was updated successfully, but these errors were encountered: