-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addresses issue #188 #314
addresses issue #188 #314
Conversation
prevent state explosions with epsilon transitions Signed-off-by: Tim Bray <[email protected]>
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #314 +/- ##
==========================================
- Coverage 96.39% 96.10% -0.30%
==========================================
Files 18 18
Lines 1718 1744 +26
==========================================
+ Hits 1656 1676 +20
- Misses 35 40 +5
- Partials 27 28 +1 ☔ View full report in Codecov by Sentry. |
Oh drat, I somehow missed |
Signed-off-by: Tim Bray <[email protected]>
You may have noticed a bit of thrashing in this PR There's an important trade-off to consider. PerformanceTo summarize, #314 cleanly kills issue #188 which means that you can freely use "*" globs, as many as you want, and you won't get an O(2**N) explosion in your memory size and However, there is a cost. #314 adds "epsilon transitions" and multi-state traversal to our automata, which solves a lot of problems but has a performance cost. The latest versions, compared to the current On the other hand:
Question for the audience: Should we go ahead and accept the performance penalty in exchange for opening the gates to lots of nice features? Is there anyone now using Quamina who's going to be troubled by a ~20% performance hit? |
Signed-off-by: Tim Bray <[email protected]>
That performance hit won't cause me any heartburn. Does this work help at all with a future |
Signed-off-by: Tim Bray <[email protected]>
I'm not sure. Because of the way smallTables work, I don't think there any need for nondeterministic automata to do that. Which in fact we managed to do with Ruler back in the day. |
Hearing no objections… Once again, I acknowledge that potential reviewers are probably not eager about digging through finite-automata theory, so unless someone screams STOP I'll push this Monday. |
prevent state explosions with epsilon transitions
The introduction of proper NFA epsilon transitions - see https://swtch.com/~rsc/regexp/regexp1.html - totally fixed this problem. The most dramatic illustration is line 96 of shell_style_test.go. Previously, you could only use 20 or so shell_style patterns before an O(2**N) explosion leading to millions of states. Now you can add all 12K in the test file. The
addPattern()
runs almost instantly, although matching slows down, a fair trade-off.Once this is landed, we can start implementing all the cool stuff that was-ruler has without worry about exploding NFAs.