Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal combinator name annotation #236

Open
j-mie6 opened this issue Apr 17, 2024 · 1 comment
Open

Internal combinator name annotation #236

j-mie6 opened this issue Apr 17, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@j-mie6
Copy link
Owner

j-mie6 commented Apr 17, 2024

As shown in #235, the new parsley.debugger functionality suffers from exposing underlying implementations in a way that might not be obvious at all to a user. A looping .foldLeft reports that pure is to blame, even when the user has not used it. In fact, this is the folding function's pure being executed first, so witnessing the bad trace. The other problem is the internal names given to combinators when the parser has no reflectively determined name is often not going to be what the user expects themselves at all; and care must be taken to properly abstract combinators from across parsley with the same naming machinery.

There are three solutions that I can think of for this problem.

Freeze/Thaw

Two combinators, freeze and thaw (names pending), can be used together to render the internals of a combinator opaque, with the arguments exposed via the thaw. This way, the automatic tagging machinery used by parsley.debugger will just skip over frozen nodes and take the name directly from the top node (this means the prettyName field of LazyParsley can be removed). The Freeze and Thaw nodes themselves can basically be no-op nodes that just forward to the underlying implementation, meaning they just incur an allocation cost and a cheap method forwarding: they would have no equivalent within StrictParsley, so would not affect the optimiser.

The problems with this approach are:

  • A need to annotate every combinator within the library by hand to ensure they have the correct name and opacity, which is a minor maintenance burden.
  • Additional overheads (it's not clear how much) on the "happy" path, which is not desirable.
  • Increased memory overhead (of around 16 bytes) on every combinator
  • To reduce the above two, it would be necessary to split out internal implementations from external facing API, so that no tagging needs to occur within combinators, this increases code size and creates some small maintenance burden.

The advantages are:

  • Perfectly accurate debugging traces that are meaningful to the user -- including within bridges.
  • The user could themselves use freeze/thaw if they wanted to define their own combinators.
  • Less machinery required within parsley-debug to index internal parsley modules and the lexer, which reduces maintenance burden.
  • Reduces the overheads of parsley-debug combinators, which will have sparser traces.

If it turns out that the cost on the "happy" path is larger than anticipated, it would be possible to lock the combinators behind a conditional guard:

  • A method like parsley.debugger.enable() is exposed from parsley-debug, which the user must call in their main function
  • This method sets an internal flag at the definition side of freeze/thaw (which might be inside parsley.debug), which enables the combinators.
  • If the flag is not set, the combinators are just the identity function: this trades an allocation on the "happy" path for a cheap conditional check instead (and this will likely be JIT'd out).
  • The downside to this is the obvious usability overhead.

Opaque/transparent

The second method instead opts for one AST node, and one meta-combinator that can destroy that node. In this case, the debugger is only allowed to latch onto Opaque nodes, and everything else is ignored. Notice that in the freeze/thaw approach, it is highly likely that every thaw combinator is wrapped around a freeze from another combinator. It's this insight that allows it to be dropped in favour of a reduced and more explicit set.

The advantages here are as follows:

  • Less overall allocations are required for each individual combinator.
  • User could still use opaque on their own combinators, and use transparent to hide library ones (even in wider parsers to suppress).
  • Debug traces are still accurate, and the same combinator can be given multiple names via separate Opaque on top of each other if desired in a user combinator.
  • parsley-debug logic is vastly simplified, as there is one attachment site, with no counting required to deal with anti-opaque nodes
  • Less maintenance burden, as the combinator only need be labeled at the top.
  • Simple extension to allow for @parsley.debuggable to automatically annotate defs with an opaque combinator, without needing to traverse the parser definition to also add in the inverse operation -- this improves the overall experience.
  • The prettyName field can be removed, since the only attachment site is the Opaque node.

The disadvantages are as above, with less overall memory overheads.

opaque/transparent

While the above strategy is certainly an improvement over the original scheme, it still does incur additional allocation costs on the "happy" path. An alternative approach would be to instead wholely rely on the prettyName field of the existing AST nodes. Instead of having an Opaque node, make prettyName a var, where null represents a transparent node. The opaque combinator simply sets this field, for no additional allocation cost, and transparent simply clears it back to null.
Primitives can be parameterised by this name so that no additional call is required, they can be set at initialisation (then potentially cleared when used with transparent, or separate internal implementation set to null initially.

This has the following advantages:

  • Close to zero-cost along the "happy" path, with a small overhead for setting the initial name/clearing it.
  • Setting is cheap, so less duplication of internal API is required.
  • User can still annotate their own internal combinators, and hide other parsley combinators if they wish.
  • The @parsley.debuggable annotation can still be used to annotate defs as above.

The disadvantages are:

  • The combinators are strictly less expressive: the user cannot ascribe multiple names to a combinator, as it is overriden by another opaque (classic nullable/Option).

  • Singleton AST nodes that can be exposed in the top-level API must be removed, and replaced by class

  • Any val parsers would either need to be reformulated with def, or would otherwise not be allowed to appear opaquely as another combinator (there is a globally unique naming for these singletons). While you could argue that this is avoidable by using val and allowing the name to be set that way, parsers like:

    def choose(b: Boolean) = if b then digit else letter

    would mark with opaque and destroy the name of both digit and letter in other parts of the code. As such, it becomes
    necessary to switch them to def, and allocate uniquely for them, which will increase memory overheads a bit.

  • The updating of a name across shared parsers is now not thread-safe. However, since the results of annotation are not used in the happy path, this is in some sense ok: the names may race, but they are never observed. The machinery behind the debugger is not thread-safe anyway, so single-threaded debugging is assumed.

  • The use of null and the mutable variables in the LazyParsley tree is a code smell, however is necessary for allocationless and cheap operation (reusing an existing field).

  • The complexity of the debugging machinery remains the same, as any node requires attachment.

@j-mie6 j-mie6 added the enhancement New feature or request label Apr 17, 2024
@j-mie6
Copy link
Owner Author

j-mie6 commented Apr 18, 2024

In fact, a hybrid approach between approaches 2 and 3 might be the best course of action:

Opaque/unsafeOpaque/transparent/unsafeTransparent

In this scheme, proceed as option 3, except additionally have the Opaque node: now, the user-facing API is in terms of the Opaque node, which means there is no val/def singleton problem anymore. The @parsley.debuggable uses a safe opaque combinator, which creates an Opaque node. The safe transparent operator either tears down the Opaque node or defers to unsafeTransparent otherwise.

The advantages are:

  • The user API has almost the same level of power as option 2, where they can cascade naming and cannot accidentally rename singletons via the annotation-based approach.
  • The internal cost is still zero-cost, and users can opt-in with the annotation, meaning zero-cost for them on the happy path too.
  • The user can still hide internal combinators by relying on the unsafeTransparent fallback.

However, the downsides are:

  • The maintenance burden is as bad as option 3, with an additional node to care for too.
  • The transparent combinator in inherently dangerous: if the user marks an internal val parser with transparent, they will erase its name across all other locations (this is less bad than the opaque issue, since it is a non-automatic combinator and transparency is probably less confusing than an ill-named opaque combinator. This would still necessitate no more vals within the internal parsley API to avoid, but it seems more permissable to drop that and just shrug it off.
  • There is still a race condition, with the caveats as above.

To address the transparent issue, it could be that instead of unsafeTransparent it has a Transparency node... but this takes us back to the horrid implementation challenges of option 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant