Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't finish or takes forever with postgres16 grammar #473

Open
mingodad opened this issue Nov 21, 2024 · 24 comments · Fixed by softdevteam/sparsevec#25
Open

Can't finish or takes forever with postgres16 grammar #473

mingodad opened this issue Nov 21, 2024 · 24 comments · Fixed by softdevteam/sparsevec#25

Comments

@mingodad
Copy link

mingodad commented Nov 21, 2024

While testing this project with a fresh clone and build with a postgres16 grammar the nimbleparse takes forever (I've killed it) see attached grammar/lexer/input.

/usr/bin/time ../target/release/nimbleparse postgresql-16-naked.l postgresql-16-naked.y input.txt

This is the output of bison :

/usr/bin/time bison-nb postgresql-16-naked.y
>/usr/bin/time bison-nb postgresql-16-naked.y
postgresql-16-naked.y: warning: 8 shift/reduce conflicts [-Wconflicts-sr]
postgresql-16-naked.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
1.37user 0.02system 0:01.37elapsed 101%CPU (0avgtext+0avgdata 17944maxresident)k
0inputs+4016outputs (0major+10957minor)pagefaults 0swaps

Notice that the 8 shift/reduce conflicts are due to the changes in the grammar to be accepted by nimbleparse that doesn't accept empty rules with %prec that are used in the postgres16 grammar:

...
opt_existing_window_name :
	ColId
	| %empty //%prec Op /*16L*/
	;
...
json_key_uniqueness_constraint_opt :
	WITH /*13N*/ UNIQUE /*11N*/ KEYS /*12N*/
	| WITH /*13N*/ UNIQUE /*11N*/
	| WITHOUT /*13N*/ UNIQUE /*11N*/ KEYS /*12N*/
	| WITHOUT /*13N*/ UNIQUE /*11N*/
	| %empty //%prec KEYS /*12N*/
	;
...

You can also see how https://github.com/BenHanson/parsertl14 handle it and several others with wasm in the browser here https://mingodad.github.io/parsertl-playground/playground/ (select Postgresql parser (be patient) from Examples then click Parse to see a parse tree for the content in Input source).

build master parser -> Time taken : 70ms
parse user grammar -> Time taken : 6ms
start build user grammar -> Time taken : 2ms
build user parser -> Time taken : 1077ms
dump input parser tree -> Time taken : 10ms

postgres16.zip

@ratmice
Copy link
Collaborator

ratmice commented Nov 21, 2024

I haven't had a chance to actually look through the grammar yet, but this could perhaps be another occurrence of #290
This could perhaps be ruled out by modifying the goto function and checking whether you hit an added branch such as Some(i) if i == usize::from(stidx) + 1 => panic!("goto lacks forward progress!") as discussed in the original #290 report.

@ratmice
Copy link
Collaborator

ratmice commented Nov 21, 2024

Had a little look at it now I don't think this is another occurrence of that issue I had mentioned above.
That issue happens at parser runtime, while the time appears to be taken by parser build time.

Here at least it eventually finishes due to the shift/reduce conflicts nimbleparse eventually errors (after about 2 min 10 s) printing the statetable. The state table is pretty big at about 500k lines. However if I panic before the state table is printed
here it still takes 2 mins, so the majority of the time does appear to be taken by building the parser, rather than being taken up just printing stuff.

Edit: So at least from what I can observe this is more "takes forever" than a case of it never actually finishing.

@ltratt
Copy link
Member

ltratt commented Nov 21, 2024

It wouldn't surprise me if there's some low-hanging optimisation fruit here: I didn't put much effort into optimising grammar generation time (I did put some effort into reducing the size of the resulting tables though).

@ratmice
Copy link
Collaborator

ratmice commented Nov 22, 2024

Indeed I haven't looked at it much myself either, as it has never really been a problem for me...
Looking at the output of CARGO_PROFILE_RELEASE_DEBUG=true cargo flamegraph --bin nimbleparse --ignore-status -- ../postgres16/postgresql-16-naked.{l,y} ../postgres16/input.txt

A lot of the time is being taken up by sparsevec::fits through the StateTable::new -> sparsevec::from -> sparsevec::compress sequence of calls.

It might be that there could be some form of assert or unsafe that could hoist the bounds checking in that function up out of the loop if the optimizer isn't doing so already, such as assert!(target.len() <= v.len() + d) or some such 🤷. But afaict that accounts for like 98% of the time.

@ltratt
Copy link
Member

ltratt commented Nov 22, 2024

@ratmice Thanks for the analysis! I had a quick look at sparsevec and although I don't remember the algorithm at all (I think @ptersilie implemented this bit), one simple optimisation opportunity which I could do in 20 minutes jumped out at me which I've done in softdevteam/sparsevec#25. On my machine this speeds this example up by over 30x. Warning: only lightly tested!

@ltratt
Copy link
Member

ltratt commented Nov 22, 2024

A sidenote. Before and after the sparsevec change I notice that nimbleparse borks with:

thread 'main' panicked at nimbleparse/src/diagnostics.rs:308:13:
assertion failed: !r_prod_spans.is_empty()

I think that's a (separate) pretty-printing error that might not be too difficult to fix?

@ratmice
Copy link
Collaborator

ratmice commented Nov 22, 2024

Yeah, I saw that too, but I have yet to investigate, or form any hypothesis other than maybe a wild guess that somehow a conflict could be occurring with the implicit start rule and thus it doesn't have any span info? But even in that case I would expect it to be a pretty simple fix of removing the assert and doing some special case printing. But other than that I don't have any idea how we could get a production without span.

@ratmice
Copy link
Collaborator

ratmice commented Nov 24, 2024

Reopening as I believe we still need to bump version numbers etc, before this will take effect in grmtools.

@ltratt
Copy link
Member

ltratt commented Nov 24, 2024

Because this is a library, if a user runs cargo update locally they'll naturally get the latest sparsevec and get a speed-up without us updating grmtools itself. We could force the latest version of sparsevec in Cargo.toml but I don't think it's worth making a release just to do that.

@mingodad
Copy link
Author

I just did cargo update but I'm still getting takes forever !

@ltratt
Copy link
Member

ltratt commented Nov 24, 2024

Ah, sorry, I wasn't clear: I need to make a sparsevec release and then grmtools users will only need to cargo update (without us making a grmtools release).

@ltratt
Copy link
Member

ltratt commented Nov 24, 2024

@mingodad sparsevec is now updated, so hopefully you'll see the speed up!

@mingodad
Copy link
Author

Hello @ltratt thank you !
Now on my machine:

/usr/bin/time ../target/release/nimbleparse postgresql-16-naked.l postgresql-16-naked.y input.txt
[Warning] in postgresql-16-naked.y
    3686| 	select_no_parens %prec UMINUS /*22R*/
                                  ^^^^^^ Unused token
    
Shift/Reduce conflicts:
   State 3199: Shift("WITHOUT") / Reduce(json_key_uniqueness_constraint_opt:)
   State 3199: Shift("WITH") / Reduce(json_key_uniqueness_constraint_opt:)
   State 3998: Shift("WITH") / Reduce(json_key_uniqueness_constraint_opt:)
   State 3998: Shift("WITHOUT") / Reduce(json_key_uniqueness_constraint_opt:)
   State 4097: Shift("ROWS") / Reduce(opt_existing_window_name:)
   State 4097: Shift("PARTITION") / Reduce(opt_existing_window_name:)
   State 4097: Shift("GROUPS") / Reduce(opt_existing_window_name:)
   State 4097: Shift("RANGE") / Reduce(opt_existing_window_name:)
...
thread 'main' panicked at nimbleparse/src/diagnostics.rs:308:13:
assertion failed: !r_prod_spans.is_empty()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Command exited with non-zero status 101
6.13user 0.77system 0:10.47elapsed 65%CPU (0avgtext+0avgdata 151348maxresident)k
0inputs+0outputs (0major+60131minor)pagefaults 0swaps

@ltratt
Copy link
Member

ltratt commented Nov 24, 2024

Glad to see the performance has improved :) We still need to fix that assert failure though!

@ratmice
Copy link
Collaborator

ratmice commented Nov 24, 2024

Definitely something weird going on with the span parsing with this file,
And appears to have nothing to do with the productions in the implicit start rule that would lack spans.

I believe that the assert is valid, in the sense that the rule on 4840 does have productions, but the list of spans for those productions is empty, so if we e.g. remove the assert it will just fail to print the production in conflict within the rule.
Thus the assert here is just ensuring that there is something to print, some production to point to.

In this case though, there is nothing, I would chalk it up to something going wrong in the grammar parsing.
This is likely to require some amount of effort in minimizing it and debugging.

The other weirdness I notice is this initial warning, which underlines %prec as a token, when it seems like it should be referring to UMINUS.

[Warning] in postgresql-16-naked.y
    3686| 	select_no_parens %prec UMINUS /*22R*/
                                  ^^^^^^ Unused token

Edit: another thing to note is that _r_prod_names is also empty, so it isn't just the spans that are missing but the production names too.

@ltratt
Copy link
Member

ltratt commented Nov 28, 2024

@ratmice I don't suppose you've had any luck with this yet?

@ratmice
Copy link
Collaborator

ratmice commented Nov 28, 2024

Nope, I spent some time trying to minimize it, but so far none of my attempts have managed to reproduce the problem

@ltratt
Copy link
Member

ltratt commented Nov 28, 2024

I wonder if the problem is in the yacc parser: maybe fuzzing grammar inputs might make the problem pop out?

@ratmice
Copy link
Collaborator

ratmice commented Nov 28, 2024

That might be worth a shot, I think it will need to make some custom code that triggers the same assertion with that input I think, because the assertion is in nimbleparse/binary code rather than lrpar/library code. But hopefully that would also be an opportunity to fuzz faster.

I don't know when I'll be able to work on that, it being a holiday over here.

@ratmice
Copy link
Collaborator

ratmice commented Dec 2, 2024

So using cargo afl I was able to minimize it fairly nicely, (I have yet to investigate the crash)
I'd started running cargo afl tmin on the postgres-naked-16.y file to minimize it,
but I think because of the size of the postgres yacc file that was taking forever...
Instead using a corpus of lrpar/examples/calc_parsetree/src/calc.y (it needed a good input) and the crashing example...

The following .y file should be a much smaller reproducer, you run it through nimbleparse with any .l file really

%start Expr
%%
Expr: %empty | Factor ;
Factor: ')' Expr ')';

The actual fuzzing testcase used is below, one will note the testcase uses a modified version of fn pidx_prods_data from
the one currently in nimbleparse which uses an if/else where the else returns empty vecs. in the case where usize::from(pidx) < ast.prods.len(). Neither this minimal fuzzed test, and Domingo's original test file are hitting that else branch or the assert!(usize::from(pidx) < ast.prods.len()); I changed it to. So it appears it is a case of prod.symbols .iter() being empty, and not a pidx outside the ast.prods range.

use cfgrammar::{
    PIdx, Span,
    yacc::{ast::{ASTWithValidityInfo, GrammarAST, Symbol}, YaccGrammar, YaccKind},
};
use lrtable::Minimiser;

fn pidx_prods_data<StorageT>(ast: &GrammarAST, pidx: PIdx<StorageT>) -> (Vec<String>, Vec<Span>)
where
    usize: num_traits::AsPrimitive<StorageT>,
    StorageT: 'static + num_traits::PrimInt + num_traits::Unsigned,
{
    assert!(usize::from(pidx) < ast.prods.len());
    let prod = &ast.prods[usize::from(pidx)];
    prod.symbols
            .iter()
            .map(|sym| match sym {
                Symbol::Rule(name, span) => (format!("'{}'", name), span),
                Symbol::Token(name, span) => (format!("'{}'", name), span),
            })
            .unzip()
}

fn main() {
 afl::fuzz!(|data: &[u8]| {
    if let Ok(yacc_src) = std::str::from_utf8(data) {
        let yacckind = YaccKind::Original(cfgrammar::yacc::YaccOriginalActionKind::GenericParseTree);
        let ast_validation = ASTWithValidityInfo::new(yacckind, &yacc_src);
        if let Ok(grm) = YaccGrammar::<u32>::new_from_ast_with_validity_info(yacckind, &ast_validation) {
            if let Ok((_sgraph, stable)) = lrtable::from_yacc(&grm, Minimiser::Pager) {
                if let Some(c) = stable.conflicts() {
                    for (_s_tok_idx, r_prod_idx, _st_idx) in c.sr_conflicts() {
                       let (_r_prod_names, r_prod_spans) = pidx_prods_data(ast_validation.ast(), *r_prod_idx);
                       assert!(!r_prod_spans.is_empty());
                    }
                }
            }
        }
    }
  });
}

I'll have to leave it for another day to try and actually look into what is going on here, but it seems we have a minimal reproducer now.

@ltratt
Copy link
Member

ltratt commented Dec 2, 2024

That's very cool -- hopefully the fix will pop out to you now!

@ratmice
Copy link
Collaborator

ratmice commented Dec 3, 2024

I had a little bit of a look at it now, adding a eprintln!("add_prod: {rule_name} {symbols:?}"); to add_prod in ast.rs.
It looks to me as though, the problem is that %empty productions as well as their implicit variations Rule: |other; and Rule: other | ; are not actually represented by an ast::Symbol and so the spans (possibly empty in case of the implicit variations) get dropped and lost.

This only became a problem when we added the fancy diagnostic printing code though...

I think the right thing to do might be to add a variant ast::Symbol::Empty(Span), and then in parser.rs::parse_rule at the various places that we call add_rule we need to check if syms is empty for the | and ; cases and add a zero length span at the previous character, and when we run across %empty we need to add the Symbol::Empty.

@ltratt
Copy link
Member

ltratt commented Dec 3, 2024

The only thing that makes me slightly unsure if Symbol::Empty might work is that I wonder if we ask at various points "is this production's length 0?". If so, we might have to bodge things a bit.

A possible alternative (which I haven't fully thought through) is to give each production (empty or not) a Span and use that for this particular error purpose?

@ratmice
Copy link
Collaborator

ratmice commented Dec 3, 2024

I added PR #476 which tries the Symbol::Empty, the only place I ran into that checked the length was in the patch context where YaccGrammarErrorKind::NonEmptyProduction is produced, for cases like A: %empty "not-actually-empty"; and the like, In that case we do that check before adding the production so it works out. But it is probably wise to look for more checks like that.

It is also worth mentioning that there was no equivalent added for cfgrammar::Symbol, so it is merely an ast thing.
Edit: On this last point, e.g. it never makes it into YaccGrammar::prod just getting as far as GrammarAST.prods. My thought was that this reduces the amount of bodging which you are concerned about above (e.g. this isn't the prods field for which we have YaccGrammar::prod_len it is the other one. Still though, I'll try and have a look through the uses of prods to confirm.

Thoughts about alternative proposal: I'm a little bit hesitant simply because e.g. from the SIdx or Symbol Index perspective this list of spans doesn't really align with any of the other fields when an empty exists. Though now that I say it, that same argument can be used for YaccGrammar::prods , in that it makes YaccGrammar::prods and GrammarAST::prods no longer bijective, now I suppose it would be surjective, so that is another thing to worry about with the Symbol::Empty approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants