Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Python-style comments in Lark grammar #1230

Merged
merged 1 commit into from
Jan 30, 2023

Conversation

vincent-hugot
Copy link
Contributor

Given that

  • most (all?) editors are unaware of Lark's syntax
  • most lark grammars live in Python strings
  • most editors will use # when asked to comment lines or blocks within Lark strings (eg Pycharm's CTRL+D)
  • commenting lines and blocks is frequently done while developing a grammar (debugging...)
  • adding this style of comments should not break existing grammars

I propose in this small PR to enable Python-style comments in Lark grammars. If accepted, I'll do another PR to reflect that in documentation.

@MegaIng
Copy link
Member

MegaIng commented Dec 7, 2022

In most editors it's easy to add support for lark syntax and there are a few plugins for it already in other repos in this organization.

@vincent-hugot
Copy link
Contributor Author

Yes, I saw.

The existence of plugins does not mean things should be more complicated than they need to be by default.

@erezsh
Copy link
Member

erezsh commented Dec 7, 2022

most (all?) editors are unaware of Lark's syntax

That's just not true. Most major IDEs have plugins that you can easily install.

most editors will use # when asked to comment lines or blocks within Lark strings (eg Pycharm's CTRL+D)

That is an interesting point. Do you know how to enable this feature in vscode?

I misread that. Yeah, that's a good point. And we're not using # for anything else.

@erezsh
Copy link
Member

erezsh commented Dec 7, 2022

But also:

most lark grammars live in Python strings

I really hope that's not true! To be honest, I don't know why anyone would use strings instead of a .lark file, unless when doing a quick test in the interpreter.

@vincent-hugot
Copy link
Contributor Author

That's just not true. Most major IDEs have plugins that you can easily install.

Yes, I meant by default. There are also various small editors like idle3 and specialized pedagogical editors that can't or won't get plugins, that people use in pedagogical settings. (I'm thinking about using Lark in language theory classes, personally).

I really hope that's not true! To be honest, I don't know why anyone would use strings ..

Most of the examples in the documentation, for starters :) Because it's very convenient.

In large projects, absolutely, grammars will go into dedicated files; in very small projects, short exercises in a lab class, and during first contact with Lark, as I did yesterday, ... that's different. Convenience is key.

@erezsh
Copy link
Member

erezsh commented Dec 7, 2022

I think you make a good argument, and I don't see the harm in allowing it. It would be nice to keep the # symbol open for future syntax extensions, but I don't have anything specific in mind, and we still have plenty of unused ascii to choose from.

@MegaIng Do you have any objections?

@vincent-hugot
Copy link
Contributor Author

I'll add one last thing, re plugins. When I said

editors are unaware of Lark's syntax

I didn't mean that wrt to syntax highlighting, which is irrelevant here. I meant that in the context of the two lines that immediately followed that statement: that of bloc commenting inside Lark strings.

Unless the syntax highlighting plugins I saw also provide awareness of which lines should be commented by # or // when pressing CTRL+D (for instance, in PyCharm's case), (am I in Python or in the middle of a Lark r""" ?) they don't help in that situation.

@erezsh
Copy link
Member

erezsh commented Dec 7, 2022

Unless the syntax highlighting plugins I saw also provide awareness of which lines should be commented

Maybe there's a way to add it?

@vincent-hugot
Copy link
Contributor Author

vincent-hugot commented Dec 7, 2022

Maybe there's a way

If your plugin is arbitrary code, then certainly it's possible (in simple cases).

But is that desirable behaviour? Do you want your editor to be "smart" and syntax/context aware when commenting lines?

PyCharm (and everything else I've used), when commenting a bloc, does not take context into account beyond indentation. Does not matter if it cleaves a """ string in two or anything, it just puts # at the beginning of the line no matter what, as expected.

Having a bloc comment operation suddenly do context aware stuff would be more trouble than it's worth, imo.

Having # just work for both contexts, solving the problem for all editors in literally one line of code, seems a lot more parsimonious to me ;)

Out of curiosity, what was the rationale for the initial choice of // for a tool in the Python ecosystem? Habit from using lex/yacc?

@erezsh
Copy link
Member

erezsh commented Dec 7, 2022

Not so much habit, as wanting to make the syntax familiar, and easy to copy from other grammars. It seems like most other parsers, like yacc, bison, antlr, grammatica, etc.

Also, I think I felt that making it look purposefully different than Python will be less confusing, when they are side by side.

parsimonious

Don't mention our competition!! 😛

@vincent-hugot
Copy link
Contributor Author

I felt that making it look purposefully different

Ok. Didn't expect that. It didn't occur to me to see that as a positive. I favour consistency where possible, and I've personally never looked at a lexer/grammar and been unsure whether I was looking at code. Not even when working with yacc/ocamlyacc/menhir, where you put actual target code in your production, which I don't think can ever occur in Lark. Maybe I've just not spent enough time on such things for the confusion to occur :P

Leaving the choice to the individual user probably won't hurt, though.

Don't mention our competition!!

You've got to give it to them: [censored] is a spiffy name for a parsing tool.

@erezsh
Copy link
Member

erezsh commented Dec 7, 2022

Well, at the time, the most popular library was PyParsing, so the line between grammar and code was a bit more blurred. But anyway, that's what I thought at the time.

[censored] is a spiffy name for a parsing tool.

Indeed it is. And what's worse, it even has a well-designed API!

@erezsh erezsh merged commit 223af36 into lark-parser:master Jan 30, 2023
@erezsh
Copy link
Member

erezsh commented Jan 30, 2023

@vincent-hugot Sorry for the delay. Thanks for contributing!

@vincent-hugot
Copy link
Contributor Author

vincent-hugot commented Jan 30, 2023

@erezsh No worries. I should submit a PR to reflect that change in documentation in a couple of weeks.

I'll have an intern start work on parser-adjacent stuff in ~3 months (using Lark); I might bug you with more documentation PRs at that point.

Cheers.

@erezsh
Copy link
Member

erezsh commented Jan 30, 2023

I should submit a PR to reflect that change in documentation in a couple of weeks.

Great!

start work on parser-adjacent stuff in ~3 months (using Lark);

Cool!

I might bug you with more documentation PRs

Don't threaten me with a good time ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants