Conversation
|
List of reserved words taken from https://ocaml.org/manual/5.3/lex.html#sss:keywords |
|
nice! |
|
I was working on this myself at 314eter/tree-sitter-ocaml. But the tests are failing because the Python bindings don't support 0.25 yet, so I was waiting on that to get released to create a PR. Some things I did that are missing here:
|
|
Oh okay - should I close this PR then?
…On Mon, Apr 21, 2025 at 10:54 AM Pieter Goetschalckx < ***@***.***> wrote:
I was working on this myself at 314eter/tree-sitter-ocaml
<https://github.com/314eter/tree-sitter-ocaml/tree/tree-sitter-0.25>. But
the tests are failing because the Python bindings don't support 0.25 yet
<tree-sitter/py-tree-sitter#333>, so I was
waiting on that to get released to create a PR.
Some things I did that are missing here:
- Upgraded the dependencies to tree-sitter 0.25
- Excluded the nonrec keyword. It's new since OCaml 4.02, so old code
may be using it as a variable.
- Included the binary operators or, lor, lxor, mod, land, lsl, lsr and
asr by making them tokens in the grammar.
- Used a different set of keywords for attribute_id.
—
Reply to this email directly, view it on GitHub
<#117 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGD5T5QZAWMEKAEYQU2TID22UBCBAVCNFSM6AAAAAB3QJUKFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJYGY3DMMZYGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
*314eter* left a comment (tree-sitter/tree-sitter-ocaml#117)
<#117 (comment)>
I was working on this myself at 314eter/tree-sitter-ocaml
<https://github.com/314eter/tree-sitter-ocaml/tree/tree-sitter-0.25>. But
the tests are failing because the Python bindings don't support 0.25 yet
<tree-sitter/py-tree-sitter#333>, so I was
waiting on that to get released to create a PR.
Some things I did that are missing here:
- Upgraded the dependencies to tree-sitter 0.25
- Excluded the nonrec keyword. It's new since OCaml 4.02, so old code
may be using it as a variable.
- Included the binary operators or, lor, lxor, mod, land, lsl, lsr and
asr by making them tokens in the grammar.
- Used a different set of keywords for attribute_id.
—
Reply to this email directly, view it on GitHub
<#117 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGD5T5QZAWMEKAEYQU2TID22UBCBAVCNFSM6AAAAAB3QJUKFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJYGY3DMMZYGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
|
It looks like the tree-sitter/py-tree-sitter#333 has stalled (maintainers aren't responding to the author's write access requests); can we consider merging this and not waiting on 0.25? Re: For the other grammar changes to binary operators and |
|
If we merge this, tree-sitter-ocaml will become incompatible with py-tree-sitter. That's not a huge problem, but it's annoying, and there's nothing urgent about the reserved keywords feature. It just improves error recovery. About It's a logical decision not to test new opam packages on 4.02, since probably nobody is still writing new code using 4.02. But old 4.02 code does still exist, so I think tools like tree-sitter should try to support a wide range of versions for a longer time. |
|
I don't have visibility into how annoying incompatibility with py-tree-sitter is, but I can say more about my use case. It's not just minor cosmetics - I have custom query logic (providing functionality, not highlighting) that does not work properly with the previous grammar because it tries to detect an error case that does not appear in that position b/c the tree is fundamentally changed (e.g., the Re: |
|
Ok, I didn't expect error recovery to make such a difference. The problem with being one of the first to move to tree-sitter ABI 15 (none of the officially supported grammars have been updated), is that many tools (language bindings for Python and Swift, editors like Emacs) will not be ready yet. So new features that get added or bugs that get fixed, will not be available for many users. I'd prefer to get at least OCaml 5.4 support done first in a 0.24 version, and then we can move to 0.25 (temporarily disabling Python and Swift tests). |
Tree-sitter now supports reserved keywords for better error recovery.
This commit updates the OCaml grammar to mark reserved words. For
example, before
```ocaml
let x =
type t = int
```
was parsed as
```
(compilation_unit ; [0, 0] - [4, 0]
(value_definition ; [0, 0] - [2, 12]
"let" ; [0, 0] - [0, 3]
(let_binding ; [0, 4] - [2, 12]
pattern: (value_name) ; [0, 4] - [0, 5]
"=" ; [0, 6] - [0, 7]
body: (infix_expression ; [2, 0] - [2, 12]
left: (application_expression ; [2, 0] - [2, 6]
function: (value_path ; [2, 0] - [2, 4]
(value_name)) ; [2, 0] - [2, 4]
argument: (value_path ; [2, 5] - [2, 6]
(value_name))) ; [2, 5] - [2, 6]
operator: (rel_operator) ; [2, 7] - [2, 8]
right: (value_path ; [2, 9] - [2, 12]
(value_name)))))) ; [2, 9] - [2, 12]
```
and now it is parsed as
```
(compilation_unit ; [0, 0] - [4, 0]
(value_definition ; [0, 0] - [0, 5]
"let" ; [0, 0] - [0, 3]
(let_binding ; [0, 4] - [0, 5]
pattern: (value_name))) ; [0, 4] - [0, 5]
(ERROR ; [0, 6] - [0, 7]
"=") ; [0, 6] - [0, 7]
(type_definition ; [2, 0] - [2, 12]
"type" ; [2, 0] - [2, 4]
(type_binding ; [2, 5] - [2, 12]
name: (type_constructor) ; [2, 5] - [2, 6]
"=" ; [2, 7] - [2, 8]
equation: (type_constructor_path ; [2, 9] - [2, 12]
(type_constructor))))) ; [2, 9] - [2, 12]
```
That's not true (and "officially supported" doesn't mean much these days). I'd have to check with the Emacs people, but Neovim definitely supports ABI 15. It all depends on the whether a project uses a language-specific binding, or whether it uses the lib directly (in C or Rust) -- which I believe is the majority of "large" consumers. But of course OCaml is special, and there may be important language-specific projects you know and care about that use the Python bindings. (To put some numbers to it, out of the 319 parsers I track, 44 are ABI 15, 249 are ABI 14, and 26 are still ABI 13.) In any case, moving to ABI 15 just means consumers stuck on ABI 14 can't update your parser to the latest version and miss out on the benefits of this PR. Just mark the next release as breaking and let users decide. And all this is moot since you already bumped the ABI to 15 in #123 ;) (Tree-sitter 0.25 defaults to ABI 15; you need to (You would be the first language to use the new reserved words feature, though ;)) |
|
I only looked at the version numbers of other grammars. Most of them follow the tree-sitter version. So if they are still on version 0.24, I was assuming they're not on ABI 15, but it looks like that's not true anymore. Combined with the fact that 2 out of 5 official language bindings (official meaning they're in the tree-sitter organization and tree-sitter-cli templates) don't support ABI 15 yet made me conclude it's not widely supported yet. For most grammars that doesn't really matter, since you can regenerate with ABI 14 if necessary. But if we start using reserved keywords immediately, that will be impossible. The master branch is indeed on ABI 15 now, as I said I'd do after #122. I rebased this PR on master and added my fixes, so I think it's ready to be merged now. |
No, that's not true. That was a convention that some -- but by no means all -- stick to. The C parser is definitely at ABI 15, and more will follow.
That is true. |
Tree-sitter now supports reserved keywords for better error recovery. This commit updates the OCaml grammar to mark reserved words. For example, before
was parsed as
and now it is parsed as