Internal combinator name annotation #236

j-mie6 · 2024-04-17T14:20:24Z

As shown in #235, the new parsley.debugger functionality suffers from exposing underlying implementations in a way that might not be obvious at all to a user. A looping .foldLeft reports that pure is to blame, even when the user has not used it. In fact, this is the folding function's pure being executed first, so witnessing the bad trace. The other problem is the internal names given to combinators when the parser has no reflectively determined name is often not going to be what the user expects themselves at all; and care must be taken to properly abstract combinators from across parsley with the same naming machinery.

There are three solutions that I can think of for this problem.

`Freeze`/`Thaw`

Two combinators, freeze and thaw (names pending), can be used together to render the internals of a combinator opaque, with the arguments exposed via the thaw. This way, the automatic tagging machinery used by parsley.debugger will just skip over frozen nodes and take the name directly from the top node (this means the prettyName field of LazyParsley can be removed). The Freeze and Thaw nodes themselves can basically be no-op nodes that just forward to the underlying implementation, meaning they just incur an allocation cost and a cheap method forwarding: they would have no equivalent within StrictParsley, so would not affect the optimiser.

The problems with this approach are:

A need to annotate every combinator within the library by hand to ensure they have the correct name and opacity, which is a minor maintenance burden.
Additional overheads (it's not clear how much) on the "happy" path, which is not desirable.
Increased memory overhead (of around 16 bytes) on every combinator
To reduce the above two, it would be necessary to split out internal implementations from external facing API, so that no tagging needs to occur within combinators, this increases code size and creates some small maintenance burden.

The advantages are:

Perfectly accurate debugging traces that are meaningful to the user -- including within bridges.
The user could themselves use freeze/thaw if they wanted to define their own combinators.
Less machinery required within parsley-debug to index internal parsley modules and the lexer, which reduces maintenance burden.
Reduces the overheads of parsley-debug combinators, which will have sparser traces.

If it turns out that the cost on the "happy" path is larger than anticipated, it would be possible to lock the combinators behind a conditional guard:

A method like parsley.debugger.enable() is exposed from parsley-debug, which the user must call in their main function
This method sets an internal flag at the definition side of freeze/thaw (which might be inside parsley.debug), which enables the combinators.
If the flag is not set, the combinators are just the identity function: this trades an allocation on the "happy" path for a cheap conditional check instead (and this will likely be JIT'd out).
The downside to this is the obvious usability overhead.

`Opaque`/`transparent`

The second method instead opts for one AST node, and one meta-combinator that can destroy that node. In this case, the debugger is only allowed to latch onto Opaque nodes, and everything else is ignored. Notice that in the freeze/thaw approach, it is highly likely that every thaw combinator is wrapped around a freeze from another combinator. It's this insight that allows it to be dropped in favour of a reduced and more explicit set.

The advantages here are as follows:

Less overall allocations are required for each individual combinator.
User could still use opaque on their own combinators, and use transparent to hide library ones (even in wider parsers to suppress).
Debug traces are still accurate, and the same combinator can be given multiple names via separate Opaque on top of each other if desired in a user combinator.
parsley-debug logic is vastly simplified, as there is one attachment site, with no counting required to deal with anti-opaque nodes
Less maintenance burden, as the combinator only need be labeled at the top.
Simple extension to allow for @parsley.debuggable to automatically annotate defs with an opaque combinator, without needing to traverse the parser definition to also add in the inverse operation -- this improves the overall experience.
The prettyName field can be removed, since the only attachment site is the Opaque node.

The disadvantages are as above, with less overall memory overheads.

`opaque`/`transparent`

While the above strategy is certainly an improvement over the original scheme, it still does incur additional allocation costs on the "happy" path. An alternative approach would be to instead wholely rely on the prettyName field of the existing AST nodes. Instead of having an Opaque node, make prettyName a var, where null represents a transparent node. The opaque combinator simply sets this field, for no additional allocation cost, and transparent simply clears it back to null.
Primitives can be parameterised by this name so that no additional call is required, they can be set at initialisation (then potentially cleared when used with transparent, or separate internal implementation set to null initially.

This has the following advantages:

Close to zero-cost along the "happy" path, with a small overhead for setting the initial name/clearing it.
Setting is cheap, so less duplication of internal API is required.
User can still annotate their own internal combinators, and hide other parsley combinators if they wish.
The @parsley.debuggable annotation can still be used to annotate defs as above.

The disadvantages are:

The combinators are strictly less expressive: the user cannot ascribe multiple names to a combinator, as it is overriden by another opaque (classic nullable/Option).
Singleton AST nodes that can be exposed in the top-level API must be removed, and replaced by class
Any val parsers would either need to be reformulated with def, or would otherwise not be allowed to appear opaquely as another combinator (there is a globally unique naming for these singletons). While you could argue that this is avoidable by using val and allowing the name to be set that way, parsers like:
```
def choose(b: Boolean) = if b then digit else letter
```
would mark with opaque and destroy the name of both digit and letter in other parts of the code. As such, it becomes
necessary to switch them to def, and allocate uniquely for them, which will increase memory overheads a bit.
The updating of a name across shared parsers is now not thread-safe. However, since the results of annotation are not used in the happy path, this is in some sense ok: the names may race, but they are never observed. The machinery behind the debugger is not thread-safe anyway, so single-threaded debugging is assumed.
The use of null and the mutable variables in the LazyParsley tree is a code smell, however is necessary for allocationless and cheap operation (reusing an existing field).
The complexity of the debugging machinery remains the same, as any node requires attachment.

The text was updated successfully, but these errors were encountered:

j-mie6 · 2024-04-18T15:23:19Z

In fact, a hybrid approach between approaches 2 and 3 might be the best course of action:

`Opaque`/`unsafeOpaque`/`transparent`/`unsafeTransparent`

In this scheme, proceed as option 3, except additionally have the Opaque node: now, the user-facing API is in terms of the Opaque node, which means there is no val/def singleton problem anymore. The @parsley.debuggable uses a safe opaque combinator, which creates an Opaque node. The safe transparent operator either tears down the Opaque node or defers to unsafeTransparent otherwise.

The advantages are:

The user API has almost the same level of power as option 2, where they can cascade naming and cannot accidentally rename singletons via the annotation-based approach.
The internal cost is still zero-cost, and users can opt-in with the annotation, meaning zero-cost for them on the happy path too.
The user can still hide internal combinators by relying on the unsafeTransparent fallback.

However, the downsides are:

The maintenance burden is as bad as option 3, with an additional node to care for too.
The transparent combinator in inherently dangerous: if the user marks an internal val parser with transparent, they will erase its name across all other locations (this is less bad than the opaque issue, since it is a non-automatic combinator and transparency is probably less confusing than an ill-named opaque combinator. This would still necessitate no more vals within the internal parsley API to avoid, but it seems more permissable to drop that and just shrug it off.
There is still a race condition, with the caveats as above.

To address the transparent issue, it could be that instead of unsafeTransparent it has a Transparency node... but this takes us back to the horrid implementation challenges of option 1.

j-mie6 added the enhancement New feature or request label Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal combinator name annotation #236

Internal combinator name annotation #236

j-mie6 commented Apr 17, 2024 •

edited

Loading

j-mie6 commented Apr 18, 2024 •

edited

Loading

Internal combinator name annotation #236

Internal combinator name annotation #236

Comments

j-mie6 commented Apr 17, 2024 • edited Loading

Freeze/Thaw

Opaque/transparent

opaque/transparent

j-mie6 commented Apr 18, 2024 • edited Loading

Opaque/unsafeOpaque/transparent/unsafeTransparent

j-mie6 commented Apr 17, 2024 •

edited

Loading

`Freeze`/`Thaw`

`Opaque`/`transparent`

`opaque`/`transparent`

j-mie6 commented Apr 18, 2024 •

edited

Loading

`Opaque`/`unsafeOpaque`/`transparent`/`unsafeTransparent`