You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As shown in #235, the new parsley.debugger functionality suffers from exposing underlying implementations in a way that might not be obvious at all to a user. A looping .foldLeft reports that pure is to blame, even when the user has not used it. In fact, this is the folding function's pure being executed first, so witnessing the bad trace. The other problem is the internal names given to combinators when the parser has no reflectively determined name is often not going to be what the user expects themselves at all; and care must be taken to properly abstract combinators from across parsley with the same naming machinery.
There are three solutions that I can think of for this problem.
Freeze/Thaw
Two combinators, freeze and thaw (names pending), can be used together to render the internals of a combinator opaque, with the arguments exposed via the thaw. This way, the automatic tagging machinery used by parsley.debugger will just skip over frozen nodes and take the name directly from the top node (this means the prettyName field of LazyParsley can be removed). The Freeze and Thaw nodes themselves can basically be no-op nodes that just forward to the underlying implementation, meaning they just incur an allocation cost and a cheap method forwarding: they would have no equivalent within StrictParsley, so would not affect the optimiser.
The problems with this approach are:
A need to annotate every combinator within the library by hand to ensure they have the correct name and opacity, which is a minor maintenance burden.
Additional overheads (it's not clear how much) on the "happy" path, which is not desirable.
Increased memory overhead (of around 16 bytes) on every combinator
To reduce the above two, it would be necessary to split out internal implementations from external facing API, so that no tagging needs to occur within combinators, this increases code size and creates some small maintenance burden.
The advantages are:
Perfectly accurate debugging traces that are meaningful to the user -- including within bridges.
The user could themselves use freeze/thaw if they wanted to define their own combinators.
Less machinery required within parsley-debug to index internal parsley modules and the lexer, which reduces maintenance burden.
Reduces the overheads of parsley-debug combinators, which will have sparser traces.
If it turns out that the cost on the "happy" path is larger than anticipated, it would be possible to lock the combinators behind a conditional guard:
A method like parsley.debugger.enable() is exposed from parsley-debug, which the user must call in their main function
This method sets an internal flag at the definition side of freeze/thaw (which might be inside parsley.debug), which enables the combinators.
If the flag is not set, the combinators are just the identity function: this trades an allocation on the "happy" path for a cheap conditional check instead (and this will likely be JIT'd out).
The downside to this is the obvious usability overhead.
Opaque/transparent
The second method instead opts for one AST node, and one meta-combinator that can destroy that node. In this case, the debugger is only allowed to latch onto Opaque nodes, and everything else is ignored. Notice that in the freeze/thaw approach, it is highly likely that every thaw combinator is wrapped around a freeze from another combinator. It's this insight that allows it to be dropped in favour of a reduced and more explicit set.
The advantages here are as follows:
Less overall allocations are required for each individual combinator.
User could still use opaque on their own combinators, and use transparent to hide library ones (even in wider parsers to suppress).
Debug traces are still accurate, and the same combinator can be given multiple names via separate Opaque on top of each other if desired in a user combinator.
parsley-debug logic is vastly simplified, as there is one attachment site, with no counting required to deal with anti-opaque nodes
Less maintenance burden, as the combinator only need be labeled at the top.
Simple extension to allow for @parsley.debuggable to automatically annotate defs with an opaque combinator, without needing to traverse the parser definition to also add in the inverse operation -- this improves the overall experience.
The prettyName field can be removed, since the only attachment site is the Opaque node.
The disadvantages are as above, with less overall memory overheads.
opaque/transparent
While the above strategy is certainly an improvement over the original scheme, it still does incur additional allocation costs on the "happy" path. An alternative approach would be to instead wholely rely on the prettyName field of the existing AST nodes. Instead of having an Opaque node, make prettyName a var, where null represents a transparent node. The opaque combinator simply sets this field, for no additional allocation cost, and transparent simply clears it back to null.
Primitives can be parameterised by this name so that no additional call is required, they can be set at initialisation (then potentially cleared when used with transparent, or separate internal implementation set to null initially.
This has the following advantages:
Close to zero-cost along the "happy" path, with a small overhead for setting the initial name/clearing it.
Setting is cheap, so less duplication of internal API is required.
User can still annotate their own internal combinators, and hide other parsley combinators if they wish.
The @parsley.debuggable annotation can still be used to annotate defs as above.
The disadvantages are:
The combinators are strictly less expressive: the user cannot ascribe multiple names to a combinator, as it is overriden by another opaque (classic nullable/Option).
Singleton AST nodes that can be exposed in the top-level API must be removed, and replaced by class
Any val parsers would either need to be reformulated with def, or would otherwise not be allowed to appear opaquely as another combinator (there is a globally unique naming for these singletons). While you could argue that this is avoidable by using val and allowing the name to be set that way, parsers like:
defchoose(b: Boolean) =if b then digit else letter
would mark with opaque and destroy the name of both digit and letter in other parts of the code. As such, it becomes
necessary to switch them to def, and allocate uniquely for them, which will increase memory overheads a bit.
The updating of a name across shared parsers is now not thread-safe. However, since the results of annotation are not used in the happy path, this is in some sense ok: the names may race, but they are never observed. The machinery behind the debugger is not thread-safe anyway, so single-threaded debugging is assumed.
The use of null and the mutable variables in the LazyParsley tree is a code smell, however is necessary for allocationless and cheap operation (reusing an existing field).
The complexity of the debugging machinery remains the same, as any node requires attachment.
The text was updated successfully, but these errors were encountered:
In fact, a hybrid approach between approaches 2 and 3 might be the best course of action:
Opaque/unsafeOpaque/transparent/unsafeTransparent
In this scheme, proceed as option 3, except additionally have the Opaque node: now, the user-facing API is in terms of the Opaque node, which means there is no val/def singleton problem anymore. The @parsley.debuggable uses a safe opaque combinator, which creates an Opaque node. The safe transparent operator either tears down the Opaque node or defers to unsafeTransparent otherwise.
The advantages are:
The user API has almost the same level of power as option 2, where they can cascade naming and cannot accidentally rename singletons via the annotation-based approach.
The internal cost is still zero-cost, and users can opt-in with the annotation, meaning zero-cost for them on the happy path too.
The user can still hide internal combinators by relying on the unsafeTransparent fallback.
However, the downsides are:
The maintenance burden is as bad as option 3, with an additional node to care for too.
The transparent combinator in inherently dangerous: if the user marks an internal val parser with transparent, they will erase its name across all other locations (this is less bad than the opaque issue, since it is a non-automatic combinator and transparency is probably less confusing than an ill-named opaque combinator. This would still necessitate no more vals within the internal parsley API to avoid, but it seems more permissable to drop that and just shrug it off.
There is still a race condition, with the caveats as above.
To address the transparent issue, it could be that instead of unsafeTransparent it has a Transparency node... but this takes us back to the horrid implementation challenges of option 1.
As shown in #235, the new
parsley.debugger
functionality suffers from exposing underlying implementations in a way that might not be obvious at all to a user. A looping.foldLeft
reports thatpure
is to blame, even when the user has not used it. In fact, this is the folding function'spure
being executed first, so witnessing the bad trace. The other problem is the internal names given to combinators when the parser has no reflectively determined name is often not going to be what the user expects themselves at all; and care must be taken to properly abstract combinators from across parsley with the same naming machinery.There are three solutions that I can think of for this problem.
Freeze
/Thaw
Two combinators,
freeze
andthaw
(names pending), can be used together to render the internals of a combinator opaque, with the arguments exposed via thethaw
. This way, the automatic tagging machinery used byparsley.debugger
will just skip over frozen nodes and take the name directly from the top node (this means theprettyName
field ofLazyParsley
can be removed). TheFreeze
andThaw
nodes themselves can basically be no-op nodes that just forward to the underlying implementation, meaning they just incur an allocation cost and a cheap method forwarding: they would have no equivalent withinStrictParsley
, so would not affect the optimiser.The problems with this approach are:
The advantages are:
freeze
/thaw
if they wanted to define their own combinators.parsley-debug
to index internal parsley modules and the lexer, which reduces maintenance burden.parsley-debug
combinators, which will have sparser traces.If it turns out that the cost on the "happy" path is larger than anticipated, it would be possible to lock the combinators behind a conditional guard:
parsley.debugger.enable()
is exposed fromparsley-debug
, which the user must call in theirmain
functionfreeze
/thaw
(which might be insideparsley.debug
), which enables the combinators.Opaque
/transparent
The second method instead opts for one AST node, and one meta-combinator that can destroy that node. In this case, the debugger is only allowed to latch onto
Opaque
nodes, and everything else is ignored. Notice that in thefreeze
/thaw
approach, it is highly likely that everythaw
combinator is wrapped around afreeze
from another combinator. It's this insight that allows it to be dropped in favour of a reduced and more explicit set.The advantages here are as follows:
opaque
on their own combinators, and usetransparent
to hide library ones (even in wider parsers to suppress).Opaque
on top of each other if desired in a user combinator.parsley-debug
logic is vastly simplified, as there is one attachment site, with no counting required to deal with anti-opaque nodes@parsley.debuggable
to automatically annotatedef
s with anopaque
combinator, without needing to traverse the parser definition to also add in the inverse operation -- this improves the overall experience.prettyName
field can be removed, since the only attachment site is theOpaque
node.The disadvantages are as above, with less overall memory overheads.
opaque
/transparent
While the above strategy is certainly an improvement over the original scheme, it still does incur additional allocation costs on the "happy" path. An alternative approach would be to instead wholely rely on the
prettyName
field of the existing AST nodes. Instead of having anOpaque
node, makeprettyName
avar
, wherenull
represents a transparent node. Theopaque
combinator simply sets this field, for no additional allocation cost, andtransparent
simply clears it back tonull
.Primitives can be parameterised by this name so that no additional call is required, they can be set at initialisation (then potentially cleared when used with
transparent
, or separate internal implementation set tonull
initially.This has the following advantages:
@parsley.debuggable
annotation can still be used to annotatedef
s as above.The disadvantages are:
The combinators are strictly less expressive: the user cannot ascribe multiple names to a combinator, as it is overriden by another opaque (classic nullable/
Option
).Singleton AST nodes that can be exposed in the top-level API must be removed, and replaced by
class
Any
val
parsers would either need to be reformulated withdef
, or would otherwise not be allowed to appear opaquely as another combinator (there is a globally unique naming for these singletons). While you could argue that this is avoidable by usingval
and allowing the name to be set that way, parsers like:would mark with
opaque
and destroy the name of bothdigit
andletter
in other parts of the code. As such, it becomesnecessary to switch them to
def
, and allocate uniquely for them, which will increase memory overheads a bit.The updating of a name across shared parsers is now not thread-safe. However, since the results of annotation are not used in the happy path, this is in some sense ok: the names may race, but they are never observed. The machinery behind the debugger is not thread-safe anyway, so single-threaded debugging is assumed.
The use of
null
and the mutable variables in theLazyParsley
tree is a code smell, however is necessary for allocationless and cheap operation (reusing an existing field).The complexity of the debugging machinery remains the same, as any node requires attachment.
The text was updated successfully, but these errors were encountered: