Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise by filtering out empty values. #25

Merged
merged 1 commit into from
Nov 24, 2024

Conversation

ltratt
Copy link
Member

@ltratt ltratt commented Nov 22, 2024

The fits function (and to a lesser extent apply) previously did vast amounts of pointless work on empty values that could not succeed. This commit does something very simple: it pre-filters out all the empty values so that fits has a lot less work to do.

On small grammars, the quantity of pointless work probably wasn't very noticeable, but on large grammars like Postgresql's, it became punishing. On my machine this commit takes nimbleparse's running time down from 101s to 3s.

Fixes softdevteam/grmtools#473 (comment). @ratmice I have only very lightly tested this, so if you're able to also verify whether this doesn't change anything (other than performance) for your use cases, I'd be very grateful!

src/lib.rs Outdated Show resolved Hide resolved
@ratmice
Copy link
Collaborator

ratmice commented Nov 22, 2024

Cool, well while I'm not really familiar with this code, or algorithm that is the only comments I had.
It appears to me that it should be isomorphic to the original implementation, so if it improves things...

That said I'm happy to give @ptersilie time to weigh in on things, if he feels inclined?

Verified

This commit was signed with the committer’s verified signature.
ltratt Laurence Tratt
The `fits` function (and to a lesser extent `apply`) previously did vast
amounts of pointless work on empty values that could not succeed. This
commit does something very simple: it pre-filters out all the empty
values so that `fits` has a lot less work to do.

On small grammars, the quantity of pointless work probably wasn't very
noticeable, but on large grammars like Postgresql's, it became
punishing. On my machine this commit takes nimbleparse's running time
down from 101s to 3s.
@ltratt
Copy link
Member Author

ltratt commented Nov 23, 2024

Just to check: this does the right thing on the grammars you care about? If so, you'll be an excellent reviewer and should feel free to merge (I've force pushed a squash).

@ratmice
Copy link
Collaborator

ratmice commented Nov 23, 2024

I've run it on a few grammars I care about, and everything seems to work.
I'd like to do some more testing such as adding a nimbleparse option to dump the state table.
Then run that on my own as well as some of the "Used by" dependents listed on github here https://github.com/softdevteam/grmtools/network/dependents comparing the output with the current output.

Sadly a great portion of the grammars I have laying around don't produce a valid state table,
because they are testing various error conditions, with tool work usurping much of my usage.

So I'll try and do that tomorrow.

@ltratt
Copy link
Member Author

ltratt commented Nov 23, 2024

I think if it works on the ones you care about, we're probably good to go. I wouldn't object to a PR to add a -d (or whatever) option to nimbleparse to dump the statetable -- it should be straightforward as we already do that for the "bad" cases.

@ratmice
Copy link
Collaborator

ratmice commented Nov 23, 2024

Well, running through things adding the -d flag hasn't quite helped as much as I'd hoped.
The issue is that while the actual state graph stidx's and the like seem to be deterministic,
they are printed from pp_core_states and the like in a non-deterministic order.

As such the below output part of a diff of the -d flag, from 2 runs of nimbeparse both before this patch.

 Stategraph:
 0:   [^ -> . File, {'$'}]
-     'val' -> 3
-     'annotated' -> 4
-     File -> 1
-     'term' -> 6
      D -> 2
+     File -> 1
+     'annotated' -> 4
+     'val' -> 3
      'proof' -> 5
+     'term' -> 6

If I look at that through diff -u st-old.txt st-old2.txt | grep ^[+-] | sort it seems like the state graph is probably unchanged, in that the + lines appear to align with the - ones besides the order of output. But that isn't a very satisfying or thorough test.

Edit: I'll have a short investigation into how hard it would be to make the pp_core_states output deterministic.

@ratmice ratmice added this pull request to the merge queue Nov 24, 2024
Merged via the queue into softdevteam:master with commit f285566 Nov 24, 2024
2 checks passed
@ratmice
Copy link
Collaborator

ratmice commented Nov 24, 2024

So I locally tested with fixed the pretty printing, checked the state tables of my projects and a bunch of projects from github with no hints of any unexpected differences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't finish or takes forever with postgres16 grammar
2 participants