ripgrep slower than grep #838

bbbsg · 2018-02-27T17:17:56Z

What version of ripgrep are you using?

ripgrep 0.8.1 (rev c8e9f25)
+SIMD -AVX

What operating system are you using ripgrep on?

Debian Stretch 64 bit

Describe your question, feature request, or bug.

ripgrep is at least 4 times slower than grep on a certain use case (never waited long enough for it to return). See data with test files attached.
ripgrep_data_test.tar.gz

attached file is around 15 mb uncompressed, may be possible to reduce the sample further but the original files i was dealing with were 100X.

See README.md and test.sh , i tried with and without --dfa-size-limit

Another minor issue I noticed is that a hint to use --fixed-strings us provided to the user even when using -F already, perhaps this only happens when regex-size-limit is not passed in.

Ideally ripgrep would be faster than grep :-)

If this is a bug, what are the steps to reproduce the behavior?

Assumes rg is in the directory or already in the path, then run test.sh

The text was updated successfully, but these errors were encountered:

BurntSushi · 2018-02-27T17:59:18Z

Thanks for the bug report! I was able to reproduce this. It looks like most of the time spent here is just compiling the patterns (since there are >100K). As you pointed out on IRC, using Aho-Corasick here should help quite a bit.

This makes the case of searching for a dictionary of a very large number of literals much much faster. (~10x or so.) In particular, we achieve this by short-circuiting the construction of a full regex when we know we have a simple alternation of literals. Building the regex for a large dictionary (>100,000 literals) turns out to be quite slow, even if it internally will dispatch to Aho-Corasick. Even that isn't quite enough. It turns out that even *parsing* such a regex is quite slow. So when the -F/--fixed-strings flag is set, we short circuit regex parsing completely and jump straight to Aho-Corasick. We aren't quite as fast as GNU grep here, but it's much closer (less than 2x slower). In general, this is somewhat of a hack. In particular, it seems plausible that this optimization could be implemented entirely in the regex engine. Unfortunately, the regex engine's internals are just not amenable to this at all, so it would require a larger refactoring effort. For now, it's good enough to add this fairly simple hack at a higher level. Unfortunately, if you don't pass -F/--fixed-strings, then ripgrep will be slower, because of the aforementioned missing optimization. Moreover, passing flags like `-i` or `-S` will cause ripgrep to abandon this optimization and fall back to something potentially much slower. Again, this fix really needs to happen inside the regex engine, although we might be able to special case -i when the input literals are pure ASCII via Aho-Corasick's `ascii_case_insensitive`. Fixes #497, Fixes #838

BurntSushi added the bug A bug. label Feb 27, 2018

BurntSushi added the libripgrep An issue related to modularizing ripgrep into libraries. label Mar 10, 2018

BurntSushi added this to the libripgrep milestone Mar 10, 2018

BurntSushi removed the libripgrep An issue related to modularizing ripgrep into libraries. label Aug 19, 2018

BurntSushi removed this from the libripgrep milestone Aug 19, 2018

BurntSushi mentioned this issue Aug 27, 2018

perf: Skip the first goto call in full construction (-35%) BurntSushi/aho-corasick#33

Closed

BurntSushi added the icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon. label Jan 27, 2019

BurntSushi mentioned this issue Apr 7, 2019

regex: make multi-literal searcher faster #1238

Merged

BurntSushi closed this as completed in #1238 Apr 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ripgrep slower than grep #838

ripgrep slower than grep #838

bbbsg commented Feb 27, 2018 •

edited

Loading

BurntSushi commented Feb 27, 2018

ripgrep slower than grep #838

ripgrep slower than grep #838

Comments

bbbsg commented Feb 27, 2018 • edited Loading

What version of ripgrep are you using?

What operating system are you using ripgrep on?

Describe your question, feature request, or bug.

If this is a bug, what are the steps to reproduce the behavior?

BurntSushi commented Feb 27, 2018

bbbsg commented Feb 27, 2018 •

edited

Loading