-
-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent grabbing of whitespace #519
Comments
I'm new to pest and also facing this, and this sounds like a bug to me. According to the manual, silent rule will never appear in parse results. Therefore, I think whitespaces appearing inside the pair violates this specification. |
I'm not certain around the difference between I've never worked with this part of the codebase, so this is just conjecture, but the following "desugaring" should help illustrate why I believe this is occuring:
I don't actually know how " Ultimately I agree that this manifests as an inconsistency to the user, and ideally should be improved. However, I'm really not certain how to go about doing so. I think the most expected to end-user would be to never include trailing whitespace in a capture. But this is problematic in practice; consider the simple case of
This refers specifically to the fact that the |
the end of rule whitespace handling is weird and inconsistent, see pest-parser/pest#396 and pest-parser/pest#519
This behavior is expected, see https://pest.rs/book/examples/ini.html?highlight=whitespace#whitespace. To fix your problem, mark the
For non-atomic rules, the
Automatic whitespace handling on the other hand is disabled for atomic rule. |
Consider the following grammar:
When parsing the input
2 d 3
as a diceroll, its bottom-level tokens are integer2
, the dd
, and the integer3
. Note how spaces are captured in the integers but not in the d.If the d rule is changed to
d = { "d"+ }
, suddenly it starts capturing the space after the d too. The same doesn't happen ford = { "d"* }
, which continues to capture onlyd
.If you increase the number of spaces in a given position, they'll all be captured; so, given the input:
The first integer will be captured as:
(Apologies for the full-line code blocks there; inline ones collapse multiple spaces down to just one in traditional HTML fashion, whereas full-line ones don't.)
If the integers are more than one digit long, spaces are no longer captured after them.
22 d 3
's first integer is22
, no space included. This holds irrespective of how many spaces are present in the source text; the number always turns to zero, it doesn't just decrement by one per digit or anything along those lines.This behavior all seems to ultimately flow down from the WHITESPACE rule; with the rules as defined here, parsing
2
as an integer yields2
, but with the WHITESPACE rule removed, it instead yields just2
.I'm not at all sure what the intended behavior here is—whether the intention is that the whitespace be consistently captured in the token to its left, or that it be consistently not-captured—but I'm almost certain this isn't the intended behavior. There are too many weird inconsistencies, with the spaces only being captured after one-character inputs whose rules use plus signs. But I figure it's worth raising this issue as an alert of "things are probably not working the way they're supposed to", even in the absence of knowledge of exactly how they are supposed to work.
The text was updated successfully, but these errors were encountered: