Built in declarative APIs. #480

bd82 · 2017-05-30T06:40:15Z

Chevrotain's APIs are imperative in nature.
This is due to the fact the origin of Chevrotain as a utility library to make
creating hand built parser easier.

This has large advantages in terms of debugging, but it does introduce some constraints
such as:

Larger, verbose Grammars.
The requirement to use numerical suffixes for the parsing methods.
Fragility when introducing code transformation tools such as webpack.

bd82 · 2017-05-30T06:54:35Z

Thanks to the CST output capabilities of Chevrotain embedded actions are no longer mandatory.
This means It should be possible to use Chevrotain as a pure runtime. for a more declaratively declared grammar API.

To make this useful stronger debugging and tracing support will be needed.

bd82 · 2017-05-30T06:55:09Z

Influences

cdiggins · 2017-05-30T12:20:34Z

FWIW: In my experience I have found that working with a CST makes parsers considerably slower and lead to more code churn than using ASTs, so I tend to skip this in my parse libraries. For example if you make a minor change to the grammar, you have to update all tree visitors or transforms downstream. If instead you use a mechanism to label which rules should generate AST nodes, then parse tree transforms tend to be less affected. This is the strategy I use in myna.

bd82 · 2017-05-30T14:03:29Z

CST creation is indeed slower, particularly when it is done by a sort of interpreter
without the benefits of code generation.
http://sap.github.io/chevrotain/performance/cst/

I've taken a deeper look at Myna's AST creation.
If I understand correctly in Myna the ability to enable to disable AST creation
is at a top level rule level?

        this.array   = m.bracketed(m.delimited(this.value.ws, this.comma)).ast;

Would something like the following be possible?
I.E calling ".ast" on part of a rule?

        this.array   = m.bracketed(m.delimited(this.value.ast.ws, this.comma));

bd82 · 2017-05-30T18:23:56Z

Random thoughts.

A Declarative API could actually result in improved performance.
Lets look at the existing imperative APIs:

 this.RULE("value", function() {
        $.OR([
            {ALT: function() { $.CONSUME(StringLiteral) }},
            {ALT: function() { $.CONSUME(NumberLiteral) }},
            {ALT: function() { $.SUBRULE($.object) }},
            {ALT: function() { $.SUBRULE($.array) }},
            {ALT: function() { $.CONSUME(True) }},
            {ALT: function() { $.CONSUME(False) }},
            {ALT: function() { $.CONSUME(Null) }}
        ]);
    });

every single time the "value" rule is invoked
*seven different anonymous functions will be created and a single array instance
to contain them. That could add up to quite a big fixed overhead.

It is not possible using the new API to optimize this because the existing
API uses Function.toString to analyze the rule's content's.
And even if it that was not a problem it would not be easy to read and write
if every anonymous function would have to be extracted to a named utility function.

Additionally if the declarative APIs will be implemented by generating old style parsing
functions. the CST creation code can be inserted "manually" as "embedded"
actions behind the scenes.

This by itself will provide a ~15% boost.
http://sap.github.io/chevrotain/performance/cst/

bd82 · 2017-05-30T18:58:47Z

That could add up to quite a big fixed overhead.

I ran a little benchmark to see what would happen if I got rid of the overhead of the anonymous functions.

Using the hack below to "trick" Chevrotain
I measured a 20% performance boost under Chrome Canary 61 in the JSON Benchmark.
I would guess that if this transformation was done to all the anonymous functions
and array inits as much as 30% boost could be achieved.

 $.HACKED_OR = $.OR
 $.RULE("value", function () {
        // in template string literal to allow the Functoin.prototype.toString
       // used in "self analysis" to still work
        `
        $.OR([
            {ALT: function () { $.CONSUME(StringLiteral) }},
            {ALT: function () { $.CONSUME(NumberLiteral) }},
            {ALT: function () { $.SUBRULE($.object) }},
            {ALT: function () { $.SUBRULE($.array) }},
            {ALT: function () { $.CONSUME(True) }},
            {ALT: function () { $.CONSUME(False) }},
            {ALT: function () { $.CONSUME(Null) }}
        ]);
        `
        // actual code that will be invoked at runtime, not using the original $.OR API
       // to avoid triggering validation errors.
        $.HACKED_OR([
            {ALT: a},
            {ALT: b},
            {ALT: c},
            {ALT: d},
            {ALT: e},
            {ALT: f},
            {ALT: g}
        ]);
    });

    function a() { $.CONSUME(StringLiteral) }

    function b() { $.CONSUME(NumberLiteral) }

    function c() { $.SUBRULE($.object) }

    function d() { $.SUBRULE($.array) }

    function e() { $.CONSUME(True) }

    function f() { $.CONSUME(False) }

    function g() { $.CONSUME(Null) }

bd82 · 2017-05-30T19:20:59Z

Feature Parity.

I'm uncertain if certain capabilities which rely on the user inserting code into the parser
can still work with a Declarative API, at least when using the existing engine.

Gates on alternatives may be limited.

Gates normally use closures to access state variables, but a gate function defined in as part of a declarative API usage will not have access(closure) to those variables nor will the user be able to easily modify and create those state variables without direct access to the parser's code.
Gates that rely on parser state (via this) should still work though.

Passing arguments to parsing rules.
- This again could rely on state managed by the parser's author.
  But via the declarative APIs such state does not exit, so it would be limited to hard coded values.

Both problems seem to be related to extra state management.
Perhaps a declarative API could provide callback hooks to external functions to manage
state. This state could be reused in Gates/Arguments via closures.
So effectively the Parser's extra state would be managed outside of it.

Will have to also provide a reset() utility to clean up the state between parser invocations.

cdiggins · 2017-05-30T22:32:16Z

@bd82 , To answer your question, while .ast is possible inside of a rule it won't generate a named node because only rules that are properties of an object are given names.

bd82 · 2017-05-31T06:28:31Z

while .ast is possible inside of a rule it won't generate a named node because only rules that are properties of an object are given names.

So the AST creation feature is effectively CST creation with the ability to ignore/skip
certain top level rules?
I think I could introduce such a capability to Chevrotain,
I will evaluate this on a larger grammar as part of other POCs I'm involved with.
Need to see how relevant it is when the AST/CST leaf nodes are already simplified
because there is a separate tokenizer.

Thanks for the feedback!

bd82 · 2017-06-13T14:35:54Z

Another Random thought:

If declarative APIs are implemented by code generation this allows greater flexibility
for advanced features such as Streaming and Async APis
#311

bd82 · 2017-07-20T14:43:28Z

More random thoughts:

No need to reimplement a variant of EBNF syntax for a parser generator like API.
Instead we should try to use a subset of an existing one (For example Antlr's), this may allow
partial re-use of existing editor tools (syntax highlights, autocomplete).

For example antlr's syntax is supported in markdown

myRule:
   "hello" worldRule?

bd82 · 2018-07-24T22:17:54Z

I believe I will remove this from the scope for now, Chevrotain is already ~40K lines of code
(including tests / examples / website) adding a 2nd kind of API may increase the scope of the project too far.

e.g:

Duplicate Examples.
Increase capabilities of CUSTOM APIS (gates / embedded actions).
Open up issues related to editor support for the EBNF APIs.

There is also the conceptual problem that while I see the advantages of a more declarative API.
I won't actually use such an API for my own projects, I do believe an upgraded hand crafted parser (a.k.a Chevrotain) is a better way to write parsers... 😄

bd82 mentioned this issue May 31, 2017

Benchmark page - Methodology definition. #482

Closed

bd82 mentioned this issue Jun 13, 2017

Streaming and Asynchronous operation #311

Closed

bd82 mentioned this issue Jun 23, 2017

add code config option to enable alternative rule DSL #451

Closed

bd82 mentioned this issue Jul 24, 2017

API for Tokens Emitting in custom token patterns #415

Open

bd82 mentioned this issue Aug 29, 2017

WIP - POC with Chevrotain Parser. jhipster/jhipster-core#142

Merged

bd82 added this to the V2.00 milestone Nov 17, 2017

bd82 mentioned this issue Dec 5, 2017

Increase DSL methods suffix indices. #606

Closed

bd82 added API New Feature labels Dec 8, 2017

bd82 removed this from the V2.00 milestone Feb 8, 2018

bd82 closed this as completed Jul 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Built in declarative APIs. #480

Built in declarative APIs. #480

bd82 commented May 30, 2017

bd82 commented May 30, 2017

bd82 commented May 30, 2017

cdiggins commented May 30, 2017

bd82 commented May 30, 2017

bd82 commented May 30, 2017

bd82 commented May 30, 2017

bd82 commented May 30, 2017

cdiggins commented May 30, 2017

bd82 commented May 31, 2017

bd82 commented Jun 13, 2017

bd82 commented Jul 20, 2017

bd82 commented Jul 24, 2018

Built in declarative APIs. #480

Built in declarative APIs. #480

Comments

bd82 commented May 30, 2017

bd82 commented May 30, 2017

bd82 commented May 30, 2017

Influences

cdiggins commented May 30, 2017

bd82 commented May 30, 2017

bd82 commented May 30, 2017

Random thoughts.

bd82 commented May 30, 2017

bd82 commented May 30, 2017

Feature Parity.

cdiggins commented May 30, 2017

bd82 commented May 31, 2017

bd82 commented Jun 13, 2017

Another Random thought:

bd82 commented Jul 20, 2017

More random thoughts:

bd82 commented Jul 24, 2018