-
-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exponential blowup in recursive parsers #74
Comments
The problem here is subtle, but is actually a more general problem with backtracking parsers, rather than an issue with chumsky itself. A similar problem will appear with a hand-written recursive descent parser. Within your code, you have two patterns that can be parsed that both look very similar. They are:
and
Notice something important: both patterns are ambiguous right the way up until the point at which the Normally, this is not a problem. For an expression like In fact, for every layer of recursion for which the wrong guess is made, the amount of work required to parse the input doubles. Your input has 21 layers of nesting, where the parser defaults to the wrong guess each time, meaning that the resulting parser is doing 2^21 = 2,097,152x the amount of work that strictly needs to be done. Hopefully, it should now be clear why we're seeing such pessimistic numbers here. Fixing thisFixing this problem requires restating the grammar in a way that gives the parser less work to do. While your parser is logically correct, in practice it explodes the space of possible parse routes and recursive descent isn't smart enough to handle such cases (there are techniques such as memoization that can avoid this exponential blowup, but they usually have worse performance for non-pessimistic grammars and inputs so chumsky does not implement them). A better approach is to restate the grammar like so: use chumsky::prelude::*;
fn parser() -> impl Parser<char, String, Error = Simple<char>> {
recursive(|expr| {
let atom = text::ident()
.or(expr.clone().delimited_by(just('('), just(')')));
let expression = atom.clone()
.then(just('+').ignore_then(atom).or_not())
.map(|(a, b)| if let Some(b) = b {
format!("{}{}", a, b)
} else {
a
});
expression
}).then_ignore(end())
}
fn main() {
println!("{:?}", parser().parse("((((((((((((((((((((((a+b))))))))))))))))))))))"));
} Here, we use I hope this helps, and sorry about not replying to this earlier: GitHub has not been showing me all of my notifications for some reason. |
Thank you for the in-depth explanation. I completely understand the underlying issue now and can easily fix it in my own parser. I guess the recommended approach can for many cases be distilled down to "prefer conditional matching over alternation where possible, in order to prevent branching during recursion". That being said, I can't help but feel that my original parser looks a lot more intuitive than your modified version. It would be nice if it were possible to write code like the first version that performs like the second version. Can't Chumsky do that transformation automatically somehow, at least in simple cases? That is, if there is an alternation ( |
I'm not sure whether it's feasible to transform one into the other, but memoisation would at least prevent the exponential blowup by 'remembering' past parse failures at each position, at the cost of increased memory usage and poorer performance in less pessimistic cases. I'll open an issue for memoisation, because it might be desirable for some particularly ambiguous grammars. It's not going to be an immediate focus for me, but it's possible that it might be worked into #82. |
Consider this simple parser:
Parsing the string
takes 8 seconds when compiling in release mode (around a minute in debug mode), and that time rapidly grows as more pairs of parentheses are added.
When the parser inside the
recursive
closure is more complex, the problem becomes even worse. With the parser I'm working on, I'm seeing multi-second parse times already for 3-4 levels of nested brackets.Boxing parsers does not solve this issue. What can be done here?
The text was updated successfully, but these errors were encountered: