Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

writer is never closed #108

Closed
loic001 opened this issue Mar 20, 2018 · 12 comments · Fixed by #118
Closed

writer is never closed #108

loic001 opened this issue Mar 20, 2018 · 12 comments · Fixed by #118

Comments

@loic001
Copy link

loic001 commented Mar 20, 2018

Hi,
https://github.com/lanpa/tensorboard-pytorch/blob/master/tensorboardX/record_writer.py
self._writer = open(path, 'wb') is never closed

@mrshu
Copy link
Contributor

mrshu commented Apr 7, 2018

@loic001
Copy link
Author

loic001 commented Apr 8, 2018

Yes but...

from tensorboardX import SummaryWriter
for i in range(10000):
  writer = SummaryWriter()
  writer.close()
----
OSError: [Errno 24] Too many open files...

It depends on the limit of the maximum number of opened files on the linux system but there shouldn't be this error anyway. That's why I said you should close it when SummaryWriter is closed.

mrshu added a commit to mrshu/tensorboard-pytorch that referenced this issue Apr 9, 2018
* Make sure calling .close() on SummaryWriter also closes the event
  file.

* Fixes lanpa#108

Signed-off-by: mr.Shu <[email protected]>
@mrshu
Copy link
Contributor

mrshu commented Apr 9, 2018

@loic001 Ah, you are indeed right. I've proposed a fix that should deal with this.

@lanpa
Copy link
Owner

lanpa commented Apr 28, 2018

the test fails on mac.

@lanpa lanpa reopened this Apr 28, 2018
@mrshu
Copy link
Contributor

mrshu commented Apr 30, 2018

@lanpa interesting. Does it fail with OSError: [Errno 24] Too many open files? I sadly do not have a way of easily testing it so more information would be very helpful.

Thanks!

@lanpa
Copy link
Owner

lanpa commented Apr 30, 2018

@mrshu It's RuntimeError: can't start new thread. I also confirmed that it fails on ubuntu if we use: for i in range(1000000): (a bigger number). Should be other issue than this one.

@mrshu
Copy link
Contributor

mrshu commented Apr 30, 2018

@lanpa Thanks. Looking at this (http://python.6.x6.nabble.com/maximum-number-of-threads-td1124579.html) it seems that the number is very much dependent on the memory address space of the box you run the test on. In other words, it is not related to the SummaryWriter but rather _EventLoggerThread.

In other words, I'd say that the test is working, but using such a high number does not make too much sense. Using something like a 100 would probably be sufficient.

@NilsRethmeier
Copy link

NilsRethmeier commented May 9, 2018

@lanpa
IMHO this should be fixed asap. This is a very serious issue. Opening 10 SummaryWriter creates close to 3k open files. Even after updating to tensorboardX==1.2.

def test_summary_writer_close():
    # Opening and closing SummaryWriter a lot should not run into
    # OSError: [Errno 24] Too many open files
    passed = True
    try:
        for i in range(10):
            writer = SummaryWriter()
            writer.file_writer.flush()
            writer.file_writer.close()
            writer._record_writer
            del writer
            gc.collect()
    except OSError:
        passed = False

    assert passed
test_summary_writer_close()
lsof | grep ' yourusername ' | awk '{print $NF}' | sort | wc -l

the .close() method has no effect.

@lanpa lanpa added the bug label May 30, 2018
@Robin-des-Bois
Copy link

Robin-des-Bois commented Aug 8, 2018

I also find myself in a situation where I need to have many summary writers and hence run into
RuntimeError: can't start new thread even though I close the summary writers.

What about adding a flag terminate to the close method as a quick fix: When set to True the _EventLoggerThread gets killed.

Ideally close would kill the thread and reopen would create a new one.

@hangg7
Copy link

hangg7 commented Aug 10, 2018

Any updates on this?

@veugene
Copy link

veugene commented Aug 24, 2019

An additional problem is that the SummaryWriter doesn't close the file at program exit and actually hangs. This can make some events not flush, if the flusher thread hasn't yet drained the queue when it hangs upon having its stop() method called. Proposing a fix in #498

@jeandut
Copy link

jeandut commented Oct 15, 2019

Would love to have a fix for this issue ! Can we get #498 accepted now that it passes tests ?

@lanpa lanpa added this to the Multi process logging milestone Feb 3, 2020
lanpa added a commit that referenced this issue Feb 26, 2020
@lanpa lanpa closed this as completed in e56a07a Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants