Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Built in declarative APIs. #480

Closed
bd82 opened this issue May 30, 2017 · 12 comments
Closed

Built in declarative APIs. #480

bd82 opened this issue May 30, 2017 · 12 comments

Comments

@bd82
Copy link
Member

bd82 commented May 30, 2017

Chevrotain's APIs are imperative in nature.
This is due to the fact the origin of Chevrotain as a utility library to make
creating hand built parser easier.

This has large advantages in terms of debugging, but it does introduce some constraints
such as:

  • Larger, verbose Grammars.
  • The requirement to use numerical suffixes for the parsing methods.
  • Fragility when introducing code transformation tools such as webpack.
@bd82
Copy link
Member Author

bd82 commented May 30, 2017

Thanks to the CST output capabilities of Chevrotain embedded actions are no longer mandatory.
This means It should be possible to use Chevrotain as a pure runtime. for a more declaratively declared grammar API.

To make this useful stronger debugging and tracing support will be needed.

@cdiggins
Copy link
Contributor

FWIW: In my experience I have found that working with a CST makes parsers considerably slower and lead to more code churn than using ASTs, so I tend to skip this in my parse libraries. For example if you make a minor change to the grammar, you have to update all tree visitors or transforms downstream. If instead you use a mechanism to label which rules should generate AST nodes, then parse tree transforms tend to be less affected. This is the strategy I use in myna.

@bd82
Copy link
Member Author

bd82 commented May 30, 2017

CST creation is indeed slower, particularly when it is done by a sort of interpreter
without the benefits of code generation.
http://sap.github.io/chevrotain/performance/cst/

I've taken a deeper look at Myna's AST creation.
If I understand correctly in Myna the ability to enable to disable AST creation
is at a top level rule level?

        this.array   = m.bracketed(m.delimited(this.value.ws, this.comma)).ast;

Would something like the following be possible?
I.E calling ".ast" on part of a rule?

        this.array   = m.bracketed(m.delimited(this.value.ast.ws, this.comma));

@bd82
Copy link
Member Author

bd82 commented May 30, 2017

Random thoughts.

A Declarative API could actually result in improved performance.
Lets look at the existing imperative APIs:

 this.RULE("value", function() {
        $.OR([
            {ALT: function() { $.CONSUME(StringLiteral) }},
            {ALT: function() { $.CONSUME(NumberLiteral) }},
            {ALT: function() { $.SUBRULE($.object) }},
            {ALT: function() { $.SUBRULE($.array) }},
            {ALT: function() { $.CONSUME(True) }},
            {ALT: function() { $.CONSUME(False) }},
            {ALT: function() { $.CONSUME(Null) }}
        ]);
    });

every single time the "value" rule is invoked
*seven different anonymous functions will be created and a single array instance
to contain them. That could add up to quite a big fixed overhead.

It is not possible using the new API to optimize this because the existing
API uses Function.toString to analyze the rule's content's.
And even if it that was not a problem it would not be easy to read and write
if every anonymous function would have to be extracted to a named utility function.

Additionally if the declarative APIs will be implemented by generating old style parsing
functions. the CST creation code can be inserted "manually" as "embedded"
actions behind the scenes.

This by itself will provide a ~15% boost.
http://sap.github.io/chevrotain/performance/cst/

@bd82
Copy link
Member Author

bd82 commented May 30, 2017

That could add up to quite a big fixed overhead.

I ran a little benchmark to see what would happen if I got rid of the overhead of the anonymous functions.

Using the hack below to "trick" Chevrotain
I measured a 20% performance boost under Chrome Canary 61 in the JSON Benchmark.
I would guess that if this transformation was done to all the anonymous functions
and array inits as much as 30% boost could be achieved.

 $.HACKED_OR = $.OR
 $.RULE("value", function () {
        // in template string literal to allow the Functoin.prototype.toString
       // used in "self analysis" to still work
        `
        $.OR([
            {ALT: function () { $.CONSUME(StringLiteral) }},
            {ALT: function () { $.CONSUME(NumberLiteral) }},
            {ALT: function () { $.SUBRULE($.object) }},
            {ALT: function () { $.SUBRULE($.array) }},
            {ALT: function () { $.CONSUME(True) }},
            {ALT: function () { $.CONSUME(False) }},
            {ALT: function () { $.CONSUME(Null) }}
        ]);
        `
        // actual code that will be invoked at runtime, not using the original $.OR API
       // to avoid triggering validation errors.
        $.HACKED_OR([
            {ALT: a},
            {ALT: b},
            {ALT: c},
            {ALT: d},
            {ALT: e},
            {ALT: f},
            {ALT: g}
        ]);
    });

    function a() { $.CONSUME(StringLiteral) }

    function b() { $.CONSUME(NumberLiteral) }

    function c() { $.SUBRULE($.object) }

    function d() { $.SUBRULE($.array) }

    function e() { $.CONSUME(True) }

    function f() { $.CONSUME(False) }

    function g() { $.CONSUME(Null) }

@bd82
Copy link
Member Author

bd82 commented May 30, 2017

Feature Parity.

I'm uncertain if certain capabilities which rely on the user inserting code into the parser
can still work with a Declarative API, at least when using the existing engine.

  • Gates on alternatives may be limited.
  • Gates normally use closures to access state variables, but a gate function defined in as part of a declarative API usage will not have access(closure) to those variables nor will the user be able to easily modify and create those state variables without direct access to the parser's code.
    Gates that rely on parser state (via this) should still work though.
  • Passing arguments to parsing rules.
    • This again could rely on state managed by the parser's author.
      But via the declarative APIs such state does not exit, so it would be limited to hard coded values.

Both problems seem to be related to extra state management.
Perhaps a declarative API could provide callback hooks to external functions to manage
state. This state could be reused in Gates/Arguments via closures.
So effectively the Parser's extra state would be managed outside of it.

  • Will have to also provide a reset() utility to clean up the state between parser invocations.

@cdiggins
Copy link
Contributor

@bd82 , To answer your question, while .ast is possible inside of a rule it won't generate a named node because only rules that are properties of an object are given names.

@bd82
Copy link
Member Author

bd82 commented May 31, 2017

while .ast is possible inside of a rule it won't generate a named node because only rules that are properties of an object are given names.

So the AST creation feature is effectively CST creation with the ability to ignore/skip
certain top level rules?
I think I could introduce such a capability to Chevrotain,
I will evaluate this on a larger grammar as part of other POCs I'm involved with.
Need to see how relevant it is when the AST/CST leaf nodes are already simplified
because there is a separate tokenizer.

Thanks for the feedback!

@bd82
Copy link
Member Author

bd82 commented Jun 13, 2017

Another Random thought:

If declarative APIs are implemented by code generation this allows greater flexibility
for advanced features such as Streaming and Async APis
#311

@bd82
Copy link
Member Author

bd82 commented Jul 20, 2017

More random thoughts:

No need to reimplement a variant of EBNF syntax for a parser generator like API.
Instead we should try to use a subset of an existing one (For example Antlr's), this may allow
partial re-use of existing editor tools (syntax highlights, autocomplete).

For example antlr's syntax is supported in markdown

myRule:
   "hello" worldRule?

@bd82
Copy link
Member Author

bd82 commented Jul 24, 2018

I believe I will remove this from the scope for now, Chevrotain is already ~40K lines of code
(including tests / examples / website) adding a 2nd kind of API may increase the scope of the project too far.

e.g:

  • Duplicate Examples.
  • Increase capabilities of CUSTOM APIS (gates / embedded actions).
  • Open up issues related to editor support for the EBNF APIs.

There is also the conceptual problem that while I see the advantages of a more declarative API.
I won't actually use such an API for my own projects, I do believe an upgraded hand crafted parser (a.k.a Chevrotain) is a better way to write parsers... 😄

@bd82 bd82 closed this as completed Jul 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants