Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement experimental TSX-support via tree-sitter, if available. #155

Closed
wants to merge 1 commit into from

Conversation

josteink
Copy link
Member

Implement support using recipe suggested by @josemiguelo in this
comment: #4 (comment)

This closes #4.

@josteink josteink mentioned this pull request Sep 28, 2021
@ramblehead
Copy link
Contributor

ramblehead commented Oct 13, 2021

I think, typescript.el needs to be somehow made aware of react-style elements, otherwise it does "extra colouring":

Screenshot from 2021-10-13 21-35-04

In line 27 it uses font-lock-type-face for Next.js;
In lines 38 and 43 in is using font-lock-keyword-face;
and in line 60 public is coloured with typescript-access-modifier-face.

@ramblehead
Copy link
Contributor

ramblehead commented Oct 16, 2021

The following seems to do the trick without any changes to typescript.el:

(add-hook
 'typescript-mode-hook
 (lambda ()
   (setq-local font-lock-defaults '(()))
   (tree-sitter-hl-mode 1)))

;; and

(tree-sitter-require 'tsx)
(add-to-list
 'tree-sitter-major-mode-language-alist
 '(typescript-mode . tsx))

There is no need for tsx mode - it is just typescript and some projects I have seen are using .ts files for both - plain and react:

Screenshot from 2021-10-16 00-38-38

See my init.el here:
https://github.com/ramblehead/.emacs.d/blob/e38df5f63e5054a2a64928d08c767723fa804928/init.el#L4202

Thus, possibly, we just need a small manual of how to setup typescript.el with tree-sitter instead of changing anything in typescript.el. This is font-lock only of-course - indentation, code folding and forward-list-alike would still require some deeper integration with tree-sitter.

@theothornhill
Copy link
Collaborator

I think typescript-mode should enable this automatically, or behind a defcustom, if in fact tree-sitter exists. I think having a separate mode for this is desirable, and is in fact, how js-mode does it if i'm not mistaken. The recipe that @ramblehead is suggesting seems quite simple, though, so it might be a good idea to merge the two approaches?

@theothornhill theothornhill marked this pull request as draft April 20, 2022 12:44
@theothornhill
Copy link
Collaborator

Converted to draft since this is very much a wip for some time. People are free to start using it and adding patches. Hopefully we can do this one quickly :)

@theothornhill
Copy link
Collaborator

@josteink if you're feeling adventurous I've added some instructions for how to get the tree sitter integration to work for the new emacs proper version. It is pretty much done, but seeing how it isn't even in a feature branch of core emacs, setup is a little involved.

I'd love some input though, if you or someone else wants to try!

(I also vendored and modified the indentation engine for the rust tree sitter package so that we don't need to depend on too many things. I expect the rust tree sitter variant not to make it into master, but that is up for debate)

@josteink
Copy link
Member Author

For me these last 2 weeks I've been really busy with organizational stuff at work or things which are otherwise 100% C#, so I haven't yet had time to fully look into your changes.

Because I know how frustrating it can be to feel that there is complete "radio-silence" when you are making big changes, and are looking for feedback, and getting none...

So just to be clear: I'm definitely watching the changes as they are chugging along, and I'm very excited about the progress you report 😄

From what I understand, besides our (or more like it, your) implementation of TypeScript & TSX-support in typescript-mode using tree-sitter... There's a few issue we need to address or at least decide on:

  • tree-sitter does not yet have an official release with TSX support... So for now we are building it ourselves, but only for x64 Linux.
  • Emacs may or may not ship with tree-sitter built in in an upcoming version. Should we double down on that, and use that only, or can we optionally fall back to the tree-sitter packages for older Emacs versions?

Is that a correct understanding? Or are there other things I've missed? Or some things I've gotten 100% backwards?

Both the Emacs & tree-sitter ecosystems are clearly in motion, and things are feeling a bit "messy" right now. That said, I don't think that should be a reason not to try to get onto the band-wagon. We just need to be somewhat considerate about how we do it 😉

As for questions of my own...

  • What's up with TSX as its own language? Does it have to be that? Can't we say TypeScript is TSX and TSX is TypeScript? Or is that a decision which has been made on our behalf already upstream in the tree-sitter project?
  • And should we really keep maintaining two different versions of typescript-mode within one project? Could/should we instead try to create a new typescript-tsx-mode package, based entirely on tree-sitter?

@theothornhill
Copy link
Collaborator

For me these last 2 weeks I've been really busy with organizational stuff at work or things which are otherwise 100% C#, so I haven't yet had time to fully look into your changes.

Oh don't worry, I'm just reporting things when I remember them

So just to be clear: I'm definitely watching the changes as they are chugging along, and I'm very excited about the progress you report smile

Great :)

From what I understand, besides our (or more like it, your) implementation of TypeScript & TSX-support in typescript-mode using tree-sitter... There's a few issue we need to address or at least decide on:

* tree-sitter does not yet have an official release with TSX support... So for now we are building it ourselves, but only for x64 Linux.

Yes, or mac. I can add back the mac compat to the install-tsx.sh. I just hacked it together so that it worked for me. As for windows, I barely know how to login, much less use it, I'm afraid.

* Emacs may or may not ship with tree-sitter built in in an upcoming version. Should we double down on that, and use that only, or can we optionally fall back to the tree-sitter packages for older Emacs versions?

It is prioritized for emacs 29, so I think it'd be wise to try to support it, much like many people supported nativecomp when it still was in a feature branch.

Is that a correct understanding? Or are there other things I've missed? Or some things I've gotten 100% backwards?

I think you're on the money!

Both the Emacs & tree-sitter ecosystems are clearly in motion, and things are feeling a bit "messy" right now. That said, I don't think that should be a reason not to try to get onto the band-wagon. We just need to be somewhat considerate about how we do it wink

Yes I agree. I think personally that we should support both variants. However I'm not sure how to do this either. I believe the rust variant will wither slowly if the upstream variant is good enough, which it has to be. So a part of me wants to just support the upstream variant. Another reason for the upstream variant to be the blessed one is that indentation is just that much easier.

As for questions of my own...

* What's up with TSX as its own language? Does it have to be that? Can't we say TypeScript is TSX and TSX is TypeScript? Or is that a decision which has been made on our behalf already upstream in the tree-sitter project?

This is the exact reasoning. It is a choice made upstream, but we should support both yet again.

https://github.com/tree-sitter/tree-sitter-typescript/blob/master/common/define-grammar.js#L205

* And should we really keep maintaining two different versions of typescript-mode within one project? Could/should we instead try to create a new typescript-tsx-mode package, based entirely on tree-sitter?

Not sure. I was even thinking starting an effort of something like tree-sitter-modes as its own package that is cross language. I see no need for modes to be bound to their own repo and having so much maintenance anymore, considering it is that much easier to make modes. It could also help in generalizing what a tree sitter major mode is supposed to support. Right now I believe C# and Typescript are the only modes supporting this out of the box. Again, I'm not sure

@josteink
Copy link
Member Author

josteink commented Apr 28, 2022

Not sure. I was even thinking starting an effort of something like tree-sitter-modes as its own package that is cross language. I see no need for modes to be bound to their own repo and having so much maintenance anymore, considering it is that much easier to make modes. It could also help in generalizing what a tree sitter major mode is supposed to support. Right now I believe C# and Typescript are the only modes supporting this out of the box. Again, I'm not sure

I think that idea sounds pretty neat, but it might come across in a way which is not entirely expected by the rest of the Emacs ecosystem (in that many packages still pressume major-mode = language)?

That said, perhaps tree-sitter-modes-mode could become a "new CC-mode" of sorts which allows one to centralize lots of common logic across languages?

That way a typescript-tsx-mode becomes purely a derived-mode which bundles its config-along. And it could be in the same package, because as you say, there's not that much reason for each mode having its own repo if implementing them is simple enough.

For a central model to succeed though, I guess one would also want a more structured approach to testing? Testing is always a bit more ad-hoc and messy than one expects it to 😉

@theothornhill
Copy link
Collaborator

Absolutely. I'm thinking something in the lines of nvim-treesitter. But not sure. I think it'll be smart to wait out the feature branch a little first.

@theothornhill
Copy link
Collaborator

@josteink now the feature branch has landed, and if you (or anybody else) want to try things the installation is "pretty simple" and described at the bottom of the readme on feature/tsx-support branch. I'd love for some feedback on the new core version :)

@theothornhill theothornhill force-pushed the feature/tsx-support branch 2 times, most recently from 5bd2305 to fba29e0 Compare May 8, 2022 20:10
@josteink
Copy link
Member Author

josteink commented May 8, 2022

Sounds exciting! I'll see what I get time for tomorrow.

My latest project at work is heavy into TSX so it's definitely a welcome feature 🙂

@theothornhill theothornhill force-pushed the feature/tsx-support branch from fba29e0 to 22ad4f1 Compare May 8, 2022 20:11
@theothornhill
Copy link
Collaborator

I actually think this is ready for merge into master, but I'll hold off until you've tried it. Need to look at ci now anyways....

@theothornhill theothornhill force-pushed the feature/tsx-support branch 2 times, most recently from f182be8 to 9e977a4 Compare May 8, 2022 20:25
@theothornhill theothornhill marked this pull request as ready for review May 8, 2022 21:32
There are for the time being two tree sitter implementations.  The first one is
the older, rust based variant written by @ubolonton which is widely used in the
emacs community, and the other is the newer variant from emacs proper, which for
the time being is situated in the feature/tree-sitter branch.
@theothornhill theothornhill force-pushed the feature/tsx-support branch from 9e977a4 to 2e2995d Compare May 9, 2022 05:54
@josteink
Copy link
Member Author

josteink commented May 9, 2022

Ok, so I've tested and tried your latest work... Ready for the verdict? It's gonna be a long one! 😄

Executive executive summary:

  • I'm against merging this PR here. 😮
  • But I'm actually super-stoked about this work, and think 100% that what we have here is the future typescript-mode 😉

After that initial shocker, let's get down and dirty with the details!

As for typescript-tree-sitter.el

  • Dead simple installation. If we tweak the package manifest, literally no documentation will be needed.
  • Results seems somewhat inconsistent/buggy to me. Is this conflicting with having typescript-ts-mode.el loaded at the same time?
  • Sometimes regular class methods gets highlighted as function-calls.
  • And for TSX I'm getting several (at least three) different faces for and other element-names in HTML markup (basic, tree-sitter-hl-face:type => font-lock-type-face, tree-sitter-hl-face:type.parameter)
  • Indentation seems buggy too. New line always at beginning. Trying to re-flow a full existing document creates runtime error.

Which is weird. I remember this working better last time I tried?

As for typescript-ts-mode.el:

  • No need to depend on third-party packages for what Emacs can provide for us!
  • A subtle benefit over relying on @ubolonton's package is that his package only provides libtree-sitter builds for a few select platforms and architectures, with the increasingly popular Aarch64 noticably missing. Having libtree-sitter as a "proper" native dependency, means it will probably be provided by Emacs (or the system package-manager) when installed, for a lot more architectures and platforms.
  • Nice and accurate installation instructions. They will be required for the time being!
  • BUT!! No fault of you, but until this lands in a final releases (both emacs, and tree-sitter), this option will be "experts" only.

I'm having this weird issue with indentation being somewhat off during editing, but not while re-indenting existing code. New-lines often seem to produce inaccurate results. Is that a known issue?

Other technical issues aside: It looks glorious! ❤️

General thoughts

Are these two implementations supposed to be "equals", or have you perhaps invested more into the Emacs nativetypescript-ts-mode.el? It certainly seems to look and act better.

Looking at the code for typescript-ts-mode.el, I can see you've codified concrete rules, while for typescript-tree-sitter.el, you've mostly piggybacked on top of tree-sitter-hl-mode. Is that correct?

While the latter sounds "nice", the former sounds like something which is easier for us to maintain/tweak when we need to. Which may already be evident in the implementations we have here? 😉

The biggest big issue 😁

Now ... all this amazing effort leaves us with three (3!) typescript-modes which is a bit of a luxury, but also something I think can become a maintenance problem, and very much a communications-issue.

If we merge this as is, I'm not going to say it will be a "nightmare", but it's going to become quite involved quite quickly.

There's several challenges I can see already now:

  • How are we to instruct people which mode to use, and why and when?
  • How are we to "thoughtfully" declare our dependencies so that things works as simple as possible for as many as possible?
  • How are we going to deal with bugs... Which may be present in some or all of the implementations?
  • How do we autoload extension-mapping consistently when we have several modes in the same package which should handle them all? We can't!
  • Since tree-sitter may not (yet) be available for all platforms and architectures, relying on it is a actually breaking change from the previous 100% Elisp-based implementation, which worked "everywhere".
  • Installation instructions will also be wildly different based on which mode you prefer

You get the picture.

The simple (and only right) solution IMO is when you've made a new major-mode, is that you also make a new package.

We can then "sunset" this repo and package, with a note on new development having moved to the new repo.

The new package can have new requirements, working declared dependencies, its own issue-tracker, focused installation instructions and can allow itself to have "breaking" changes, since nobody is getting force-upgraded to it from the old package. Etc etc etc.

And to avoid the long list of issues listed above for this new package/repo, we should decide on one implementation only and just go with it, and leave the other behind.

And for me, that vote is that we keep and further develop the Emacs-native version.

Not a doubt in my mind! Good job looking into that! That it even existed as something usable completely escaped my mind!

The result of going with that implementation will be that people who want TSX-support will have to be early adopters, both for Emacs and our new package/major-mode. Right now, that's going to be a show-stopper for some people, but not everyone.

Over time though, this package will be the natural choice for "everyone" when Emacs is shipped with native tree-sitter support, and the platform/architecture support issue has been solved upstream. And the best part is that we don't break innocent bystander's workflow while getting there.

Which means that I'm actually against merging this PR in this repo.

** phewww **

End long reply.

That's all my thought, laid out, hopefully in a understandable and agreeable form.

What do you think? 😄

@theothornhill
Copy link
Collaborator

Ok, so I've tested and tried your latest work... Ready for the verdict? It's gonna be a long one! smile

Executive executive summary:

* I'm against merging this PR here. open_mouth

WAAAAAAA :)

As for typescript-tree-sitter.el

* Dead simple installation. If we tweak the package manifest, literally no documentation will be needed.

* Results seems somewhat inconsistent/buggy to me. Is this conflicting with having `typescript-ts-mode.el` loaded at the same time?

No, but as you've discovered, it is a little more inconsistent. This is mainly due to the indentation engine being what it is. It is just not really that good... This is for me the biggest showstopper. I'm not really sure how to deal with that. But more details later!

* Sometimes regular class methods gets highlighted as function-calls.

Highlighting is for now relayed completely over to the highlights described by the emacs-tree-sitter project. We could define our own, but I haven't.

* And for TSX I'm getting several (at least three) different faces for  and other element-names in HTML markup (`basic`,  `tree-sitter-hl-face:type` => `font-lock-type-face`, `tree-sitter-hl-face:type.parameter`)

* Indentation seems buggy too. New line always at beginning. Trying to re-flow a full existing document creates runtime error.

Really? I haven't noticed that. I can look into that.

As for typescript-ts-mode.el:

* No need to depend on third-party packages for what Emacs can provide for us!

* A subtle benefit over relying on @ubolonton's package is that his package only provides `libtree-sitter` builds for a few select platforms and architectures, with the increasingly popular Aarch64 noticably missing. Having libtree-sitter as a "proper" native dependency, means it will probably be provided by Emacs (or the system package-manager) when installed, for a lot more architectures and platforms.

Yep!

* Nice and accurate installation instructions. They _will_ be required for the time being!

That is very good to hear.

* **BUT!!** No fault of you, but until this lands in a final releases (both emacs, and tree-sitter), this option will be "experts" only.

Agreed!

I'm having this weird issue with indentation being somewhat off during editing, but not while re-indenting existing code. New-lines often seem to produce inaccurate results. Is that a known issue?

This can be due to two things.

  1. emacs treesit library doesn't handle ERROR nodes well enough
  2. an indentation rule for that specific node is missing

Most likely it is a mix of the two. I've seen number 1 lots of times. You can usually see it when font-locking suddenly malfunctions, and is fixed by reverting the buffer. This is a known and reported bug.
Number 2 is very likely too, as ts is a big language.

Other technical issues aside: It looks glorious! heart

Great! I've put at lot more effort in typescript-ts-mode, so that is good to hear.

General thoughts

Are these two implementations supposed to be "equals", or have you perhaps invested more into the Emacs nativetypescript-ts-mode.el? It certainly seems to look and act better.

They are supposed to be equals, but I'm not sure whether or not we really should support both variants. I think it will become a messy world, and I'd rather support the native version myself.

Looking at the code for typescript-ts-mode.el, I can see you've codified concrete rules, while for typescript-tree-sitter.el, you've mostly piggybacked on top of tree-sitter-hl-mode. Is that correct?

Hearsay! ... No, that's correct.

If we merge this as is, I'm not going to say it will be a "nightmare", but it's going to become quite involved quite quickly.

There's several challenges I can see already now:

* How are we to instruct people which mode to use, and why and when?

My opinion is to not use the melpa tree sitter, but I'm open to suggestions.

* How are we to "thoughtfully" declare our dependencies so that things works as simple as possible for as many as possible?

I think the best idea is to be dependency free. By that I think that we should only use what emacs itself supports, even if that requires installing some libraries to your computer.

* How are we going to deal with bugs... Which may be present in some or all of the implementations?

Yeah, This can get messy.

* How do we autoload extension-mapping consistently when we have several modes in the same package which should handle them all? We can't!

This is true. We need to defcustom it, I think. If we keep tree-sitter in this repo.

The simple (and only right) solution IMO is when you've made a new major-mode, is that you also make a new package.

We can then "sunset" this repo and package, with a note on new development having moved to the new repo.

I agree.

And for me, that vote is that we keep and further develop the Emacs-native version.

+10

Not a doubt in my mind! Good job looking into that! That it even existed as something usable completely escaped my mind!

Yeah it is pretty new. I've bothered @casouri a lot trying to get this mainlined, hehe:)

What do you think? smile

I'm very thankful for some level-headed advice and great suggestions! I think the wisest plan is to go for the separate repo. I also think that keeping this repo tree sitter free is smart, because it has some lineage, while tree sitter does not. I'll steal my code then and create the repo.

As far as I can tell this issue is settled, and I'll make my own package for this.

Thanks again for looking at this. I really think @casouri has done some amazing work here, and hope it'll be mainlined very soon!

@josteink
Copy link
Member Author

josteink commented May 9, 2022

I think the wisest plan is to go for the separate repo. ... As far as I can tell this issue is settled, and I'll make my own package for this.

Make it so number one 😉

I also think that keeping this repo tree sitter free is smart, because it has some lineage, while tree sitter does not.

Then I guess we're at a 100% agreement.

Thanks again for looking at this.

Thanks for putting in the effort! I'm too busy to implement this code from scratch myself, but I'm more than happy to provide feedback and guidance in the ways that I can 😄

@theothornhill
Copy link
Collaborator

My pleasure! Closing this!

@ubolonton
Copy link

  • A subtle benefit over relying on @ubolonton's package is that his package only provides libtree-sitter builds for a few select platforms and architectures, with the increasingly popular Aarch64 noticably missing.

If this is about aarch64-apple-darwin, it's no longer correct. And it's extremely easy to build for other platforms. I do agree with the high-level point though, as there are a lot more people building and distributing Emacs.

emacs treesit library doesn't handle ERROR nodes well enough

Most likely it is a mix of the two. I've seen number 1 lots of times. You can usually see it when font-locking suddenly malfunctions, and is fixed by reverting the buffer. This is a known and reported bug.

Which bug is this? AFAICT there is no such bug confirmed for elisp-tree-sitter. Do you mean bugs in the indentation library?

And for me, that vote is that we keep and further develop the Emacs-native version.

This is my vote too, though not (yet) on a technical ground.

@theothornhill
Copy link
Collaborator

  • A subtle benefit over relying on @ubolonton's package is that his package only provides libtree-sitter builds for a few select platforms and architectures, with the increasingly popular Aarch64 noticably missing.

If this is about aarch64-apple-darwin, it's no longer correct. And it's extremely easy to build for other platforms. I do agree with the high-level point though, as there are a lot more people building and distributing Emacs.

emacs treesit library doesn't handle ERROR nodes well enough
Most likely it is a mix of the two. I've seen number 1 lots of times. You can usually see it when font-locking suddenly malfunctions, and is fixed by reverting the buffer. This is a known and reported bug.

Which bug is this? AFAICT there is no such bug confirmed for elisp-tree-sitter. Do you mean bugs in the indentation library?

This is not a bug in your package :) this is a bug in the new native version.

And for me, that vote is that we keep and further develop the Emacs-native version.

This is my vote too, though not (yet) on a technical ground.

Right!

@ubolonton
Copy link

This is not a bug in your package :) this is a bug in the new native version.

Sorry, I misread 😅

@josteink
Copy link
Member Author

josteink commented Jun 9, 2022

As far as I can tell this issue is settled, and I'll make my own package for this.

Any news on this? :)

@theothornhill
Copy link
Collaborator

I have started work on this over at https://git.sr.ht/~theo/tree-sitter-modes for now! I might just move the tree sitter over to this org, but I'm still waiting for the feature branch to settle a little. The api changes from time to time, so it might be smart to hold off just a little bit. But if you are interested, the code is there, and works with the feature branch in core.

@josteink
Copy link
Member Author

josteink commented Jun 9, 2022

RIghto. Was just curious about how things were going since there had been absolute radio silence for a month or so now 😁

@theothornhill
Copy link
Collaborator

Yeah, I know, sorry! I'm just waiting for now, but development hasn't halted at least :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support JSX/TSX
4 participants