-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
document that memory exhaustion is possible when using parallelism #1189
Comments
This happens when the search results for a single file exceed the amount of memory available. This is fundamentally a result of combining parallelism and the requirement that the output of each file is not interleaved. It sounds like you said you'd be OK with the output from different files being interleaved, but I'm not keen on adding this option to ripgrep. Instead, you have a few work-arounds available to you, assuming we've correctly diagnosed the problem:
|
Thank you for your quick response! Awesome! That being said, it could still be handy to have a "single file, multiple workers, as fast as possible, interleave if you must" grep mode for these rare occasions - as long as a single result always arrives in one piece in the output. But the occasion might be rare enough. And a different idea: print which files it choked on when it crashed (just to have fewer github support issues) Keep up the good work ;) (oh off-topic, that it tries to allocate 5 exabyte seems like a fun overflow somewhere below the rust layer...) |
Rust's standard library doesn't allow one to easily recover from allocation failure, so it's impractical to print the file on which this choked. Of course, I agree it would be nice to improve failure modes, but I think we're stuck here. Your proposed option is undoubtedly handy, but it's not a good fit since it's an extraordinarily niche feature with some simple work-arounds. I'm mark this ticket as a doc bug and find a place to add a note about memory exhaustion being possible when parallelism is enabled, and document the work-arounds. |
Note that memory exhaustion is not unique to ripgrep, or even parallelism. Both grep and ripgrep are subject to memory exhaustion errors when searching files that have a single line that exceeds available memory. For example, something like |
What version of ripgrep are you using?
ripgrep 0.10.0 (rev 8a7db1a)
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)
How did you install ripgrep?
Precompiled msvc binary for Windows-x64
What operating system are you using ripgrep on?
Windows 7, some current pach level
Describe your question, feature request, or bug.
If I redirect the ripgrep output to a file on windows, the memory usage of
rg.exe
increases slowly but steady, probably with each found element.After a few GB,
rg.exe
then crashes with a segmentation fault.Inside
procmon
, I see the results seem to only be flushed to the file after the crash.A workaround is to force single threading with
-j1
. This seems to be directly related to this old issueOr are the reads and hits simply too fast to be written to disk if multi threaded?
If this is a bug, what are the steps to reproduce the behavior?
Trying to extract all mail addresses from a large current password leak with 130k files, 8k folders and 1.62TB in total (with different filessizes) crashes rg.
Inside git bash, run:
For obvious reasons, I cannot include the corpus here.
If this is a bug, what is the actual behavior?
If this is a bug, what is the expected behavior?
The user should be able to
rg
multi-threaded on any kind of large dataset.Maybe a synchronization point when the buffer gets too big or the choice to disable caching (I don't need the output in order, for example) could be options.
The text was updated successfully, but these errors were encountered: