Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one or more rules consume trailing whitespace when exactly matched once #396

Closed
ejoebstl opened this issue Jun 15, 2019 · 4 comments · Fixed by #878
Closed

one or more rules consume trailing whitespace when exactly matched once #396

ejoebstl opened this issue Jun 15, 2019 · 4 comments · Fixed by #878

Comments

@ejoebstl
Copy link

ejoebstl commented Jun 15, 2019

Given the following grammar:

a = { "a" ~ b+ | a }
b = { "b" }
root = { SOI ~ a* ~ EOI }

Then, parsing the following text:

a b

Will associate the trailing newline after b with the rule for a, instead of the the rule for root.

That seems to work properly with zero-or-more expressions, where the whitespace will be associated with the root rule.

Also, it works properly for more than one b. The following example:

a b b

Puts the whitespace correctly inside root.

@ejoebstl ejoebstl changed the title one or more rules consume trailing whitespace one or more rules consume trailing whitespace when exactly matched once Jun 15, 2019
@ejoebstl
Copy link
Author

After looking through the code, I'd say this is because Expr::RepOnce is translated to a sequence.

E.g. for pest, b+ is equal to b ~ b*.

@ejoebstl
Copy link
Author

Proposed fix in #397.

@dragostis
Copy link
Contributor

Ah, finally. I knew this day was going to come. Unfortunately, this implementation bug is probably something that people make use of in their logic. Fixing this might actually break them, but I'm happy to give it a try regardless maybe there's no actual dependee that uses it.

I have a fix for 3.0, but I haven't had that much time to work on it.

doy added a commit to doy/nbsh that referenced this issue Jan 6, 2022
the end of rule whitespace handling is weird and inconsistent, see
pest-parser/pest#396 and
pest-parser/pest#519
@keks
Copy link

keks commented Apr 6, 2022

This just bit us. Are there still plans to resolve this? Is there evidence that people rely on this bug?

And lastly, is there a good workaround? The obvious idea would be to strip the white space off of everything pest returns, but I suspect there is a more elegant way, since otherwise there probably would be more people commenting here.

In our case we are having trouble to parse data in the format "x <- foo" (see this page).

Also, otherwise we are having a great time using pest, thank you!

cc @siccegge @chrisbrzuska

Update:

I found a satisfying workaround, documented here for future visitors. Instead of

identifier = { (ASCII_ALPHA | "_")+ }

I now use

identifier = @{ (ASCII_ALPHA | "_") ~ (ASCII_ALPHA | "_")* }

where @ means that no internal white space is allowed and all subrules are muted (which is not important here since there are no internal rules).

@tomtau tomtau added this to the v3.0 milestone Nov 26, 2022
@tomtau tomtau linked a pull request Jun 19, 2023 that will close this issue
tomtau added a commit to tomtau/pest that referenced this issue Jun 30, 2023
the fix was feature-guarded with "grammar-extras" in order to keep
the semver compatibility. plus a test was added.

Closes pest-parser#396

Co-authored-by: Emanuel Joebstl <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants