Support for semantic tokens #615

leungbk · 2021-01-31T06:15:46Z

https://microsoft.github.io/language-server-protocol/specifications/specification-3-16/#textDocument_semanticTokens

Introduced in 3.16, and provided by the clangd and rust-analyzer servers. It would be nice if Eglot supported this feature.

The text was updated successfully, but these errors were encountered:

joaotavora · 2021-01-31T11:56:12Z

In your opinion, what should Eglot do with this? Fontify? these things are normally handled by the major mode already, though maybe it can be added to Eglot's responsibilities, not it's not clear how just yet.

joaotavora · 2021-02-01T10:31:34Z

Calling this "minor" because I don't (yet) understand what could be gained here.

leungbk · 2021-02-05T00:11:44Z

The semantic highlighting makes it a bit easier for users to connect the dots when reading code. This is what we see without semantic highlighting (default Emacs theme + rustic-mode, which should not meaningfully differ from rust-mode here):

Notice that the arguments ctx, cap, etc. are light-brownish inside the argument list, but colored black within the function body. When we use lsp-mode's semantic highlighting, they are colored consistently:

The emacs-tree-sitter package does something similar.

joaotavora · 2021-02-05T00:16:44Z

Two questions:

Is semantic highlighting conceptually different from what can be achieved with traditional Emacs font-lock and font-lock-keywords. It's be great to get rid of that complexity. But it successfully parses a lot of languages and is very well integrated into Emacs.
In the lsp-mode example. Is font-lock also active or is all of the code being highlighting with "semantic tokens"?

leungbk · 2021-02-05T00:40:17Z

Is semantic highlighting conceptually different from what can be achieved with traditional Emacs font-lock and font-lock-keywords. It's be great to get rid of that complexity. But it successfully parses a lot of languages and is very well integrated into Emacs.

Not sure I understand your question. Semantic highlighting should be based on an AST, in contrast to traditional regexp-based font-lock.

In the lsp-mode example. Is font-lock also active or is all of the code being highlighting with "semantic tokens"?

font-lock-mode is on; lsp-mode simply advises font-lock-fontify-region-function, as you can see here.

joaotavora · 2021-02-05T00:43:36Z

no sure I understand your question. Semantic highlighting should be based on an AST, in contrast to traditional regexp-based font-lock.

font-lock.el doesn't have to be based on regexps. as you yourself have noted in the next response. But it often is, and reasonably successfully.

font-lock-mode is on; lsp-mode simply advises font-lock-fontify-region-function, as you can see here.

Right. But are the font-lock keywords in rust-mode doing any work? Will you get the same colors if you simply remove those keywords? Or is it doing some work?

leungbk · 2021-02-05T01:07:44Z

Right. But are the font-lock keywords in rust-mode doing any work? Will you get the same colors if you simply remove those keywords? Or is it doing some work?

When redefining the major-mode to run

  (setq-local font-lock-keywords nil)
  (setq-local font-lock-defaults nil)

then with rustic-mode + the rust-analyzer language server, we still see semantic-token-based highlighting when lsp-mode is enabled and the user has requested that semantic highlighting be enabled.

joaotavora · 2021-02-05T01:11:51Z

then with rustic-mode + the rust-analyzer language server, we still see semantic-token-based highlighting when lsp-mode is enabled and the user has requested that semantic highlighting be enabled.

And does it look exactly like the second example you posted here: #615 (comment) ?

Also, does this work with rust-mode?

leungbk · 2021-02-05T01:26:40Z

And does it look exactly like the second example you posted here: #615 (comment) ?

Also, does this work with rust-mode?

I redid with rust-mode, and with a different file (font-locking took longer on the other one). The font-lock-keywords are doing a small amount of work, presumably for keywords where either rust-analyzer or lsp-mode is mum.

Here's a file with the full fruit salad, semantic highlighting combined with rust-mode's non-nil font-lock-keywords and font-lock-defaults:

And here's the same file with semantic highlighting but with nil font-lock-defaults and font-lock-keywords:

terlar · 2021-02-05T03:41:20Z

It nice to hear that semantic highlighting has become a thing in LSP as that brings this feature to all major editors. Previously I have used these two packages to varying successful degree:

However, it has been to various successful degree depending on language, for Emacs Lisp it works great, but for many other languages it usually highlights too many things, personally I would only like to see it for variables to follow them through the code (I am generally using a colorless theme, so it becomes really obvious where the variables are used). These modes are slightly different as it generally assigns one color per variable.

maan2003 · 2021-02-05T06:05:49Z

there is a lot more to semantic highlight than the default lsp-mode highlighting. For example, I have added a face for mutable modifier that rust analyzer sends. there is also highlighting within doc comments

joaotavora · 2021-02-05T13:23:44Z

@maan2003 what major mode as you using? And would you generally recommend rust-analyzer over `rls?

stephe-ada-guru · 2021-02-05T19:13:08Z

João Távora <[email protected]> writes:

* Is semantic highlighting conceptually different from what can be achieved with traditional Emacs font-lock and font-lock-keywords. It's be great to get rid of that complexity. But it successfully parses a lot of languages and is very well integrated into Emacs.

The example shown above requires "name resolution"; it matches the parameter names with the same names in the function code. That can only be done by a parser, which is what LSP is for. So using the semantic hightlighting from LSP is a good way to get that level of highlighting for an Emacs major mode.

…

-- -- Stephe

joaotavora · 2021-02-05T19:25:06Z

That can only be done by a parser, which is what LSP is for.

Right. But "parse" is a very broad term. I think you mean LSP is aware of the program's AST more fully than Emacs is (for most major modes, perhaps not all). Regexps are a way of parsing certain bits of the language, and syntax tables (for parenthesis matching, for not other) are another where regexps will struggle. Neither are as good as having the AST but they are still pretty good in many cases.

But excluding font-lock-keywords, it'd be nice to hook the font-lock machinery into the parse results provided by LSP or by some other incremental parser designed for editors (like tree-sitter). Do you have an idea of how that could be achieved properly, i.e. without some dirty hack or advice? If I learn of a suitable interface, hooking it onto LSP doesn't seem like an extraordinary amount of work. But I would also like to experiment hooking it onto "tree sitter" (https://tree-sitter.github.io/tree-sitter/)

leungbk · 2021-02-05T20:18:10Z

cc @ubolonton

stephe-ada-guru · 2021-02-06T19:23:36Z

João Távora <[email protected]> writes:

> That can only be done by a parser, which is what LSP is for. Right. But "parse" is a very broad term. I think you mean LSP is aware of the program's AST more fully than Emacs is (for most major modes, perhaps not all).

Yes.

Regexps are a way of parsing certain bits of the language, and syntax tables (for parenthesis matching, for not other) are another where regexps will struggle. Neither are as good as having the AST but they are still pretty good in many cases.

Right.

But excluding font-lock-keywords, it'd be nice to hook the font-lock machinery into the parse results provided by LSP or by some other incremental parser designed for editors (like tree-sitter).

Yes.

Do you have an idea of how that could be achieved properly, i.e. without some dirty hack or advice? If I learn of a suitable interface, hooking it onto LSP doesn't _seem_ like an extraordinary amount of work. But I would also like to experiment hooking it onto "tree sitter" (https://tree-sitter.github.io/tree-sitter/)

I have not studied the LSP SemanticToken interface in detail, but from a quick look, I would expect that each Emacs language major mode would provide a mapping from SemanticTokens to faces, based on the language grammar. Eglot could specify a format for that mapping, and use it when available. It might be reasonable for eglot to provide a default or base mapping. For example, tokenType 'variable', modifier 'readonly' could map to font-lock-constant-face, tokenType 'function' to font-lock-function-name-face, tokenType 'type' to font-lock-type-face, tokenType 'keyword' to font-lock-keyword-face. Those are the only faces currenty used by ada-mode, but a smarter parser could use more. The Gnu ELPA package 'wisi' provides a way of specifying faces directly in the language grammar, as actions. It's used in ada-mode, and has a custom interface to the parser in an external process (which is built by WisiToken). The external process also has code for computing face, indent, and navigation in post-parse actions (based on the grammar actions). (I wrote all of that code). It might be reasonable to use an LSP server as a wisi backend, but the interfaces are very different. It would be reasonable to use tree-sitter as a wisi backend. I started down that road, but the tree-sitter parser generator can't handle the Ada grammar (at least, not without a lot of work to eliminate more conflicts). There would have to be a separate mapping from AST nodes to wisi actions for face, indent, navigate; the tree-sitter grammar syntax doesn't support actions. Most of the hard work in computing face from AST is in the external process post-parse code, and in the grammar, where face actions are tied to productions. So there's no code in wisi that eglot can use with an existing LSP. Getting back to the highlight actions that started this thread, the WisiToken parser for Ada does not do name resolution, so it cannot provide the highlighting suggested. The AdaCore parser libadalang does do name resolution, and is used by the AdaCore Ada language server. However, it does not (yet) do indent, and does not support incremental parsing. I don't know if it provides SemanticTokens yet. I'm currently upgrading wisi and WisiToken to support incremental parsing.

…

-- -- Stephe

maan2003 · 2021-02-09T16:37:12Z

what major mode as you using?

Rustic

And would you generally recommend rust-analyzer over rls?

Yes, rust-analyzer is a lot better.

maan2003 · 2021-02-09T17:18:29Z

Also LSP based highlighting is lot advance than tree sitter. I have few cases from the above example

Example 1:

matches!(
    leaf.kind()

highlighting inside macros(matches in this case) requires macro expansion.

Example 2:

let item_keyword = ctx.token_at_offset().find(|leaf| {

also find is underlined because it takes as argument &mut self
requires lot of parts of language

type inference
libraries and modules lookup
trait resolution(finding out the definition of the function)

nowislewis · 2021-08-10T05:39:07Z

Can anyone tell me how the semantic highlighting is going now? I just switched from lsp to eglot after 1 year friendship with lsp. I really like the cleanliness and speed of eglot

joaotavora · 2021-08-10T09:19:35Z

Eglot doesn't do "semantic highlighting" with information taken from the LSP server. The current understanding is that it is much slower (though potentially less accurate in some languages) than Emacs's own syntax highlighting, which is called "font lock" in Emacs.

nowislewis · 2021-08-10T13:26:33Z

Thanks for your amazing work. I want to know whether semantic highlight is difficult to achieve elegantly. If it does not affect the cleanliness and beauty of eglot, then we would be very grateful if there is an option for the user to switch. But if the implementation is troublesome and destructive, don’t let it affect the normal development of eglot.

joaotavora · 2021-08-10T13:29:46Z

I don't know that it is particularly destructive, neither very laborious. Feek free to give it a go. I believe lsp-mode has an implementation for it (but I haven't looked). YOu can see a summarized description here. Seems that it uses advice, but that can probably be avoided.

AkibAzmain · 2022-02-18T08:15:52Z

I'm using Eglot and very happy with it. But I miss semantic tokens, and I have no idea about how to implement it.

AkibAzmain · 2022-02-21T12:37:43Z

I've implemented it. See #839.

AkibAzmain · 2022-03-12T12:02:03Z

@nowislewis Check out #839.

artempyanykh · 2022-08-21T20:10:56Z

One more datapoint about semantic highlighting. I maintain a Markdown LSP server. In addition to regular markdown links it also support [[wiki-style#links]]. These wiki-style links are not highlighted in any way by editors. Semantic tokens allow me to add the extra highlighting that works uniformly across all LSP clients and in my opinion this improves the overall experience quite a bit.

An example:

With semantic tokens

Without semantic tokens

slondr · 2023-09-04T14:18:06Z

@stephe-ada-guru Do I understand correctly that this feature being implemented in eglot would allow face support in ada-mode without wisi? Wisi doesn't compile on my machine, so I'm investigating my options for a functional ada-mode.

coffeemug · 2023-12-28T21:03:46Z

Another data point-- clangd implements a semantic highlighting extension for inactive code. So for example if you're working on a C file, in VSCode the #ifdef sections for other platforms are grayed out and you can focus on the platform you're developing on. For big files this is a godsend.

HaraldKi · 2024-04-05T10:49:05Z

Regex based syntax highlighting when the server knows all the details without fumbling and guessing seems sub-optimal. I started an eglot-semtok experiment for semantic token based highlighting. But I never used Elisp before, so don't wait for this. But friendly comments about elisp botches I make are welcome (except hanging parens, read the README).

joaotavora · 2024-04-05T12:01:11Z

Regex based syntax highlighting when the server knows all the details without fumbling and guessing seems sub-optimal

Most -ts- major modes use in-process incremental tree-sitter parsing. Tree sitter uses language definitions. I'd say this makes communicating buffer changes over stdin/out/network to an external process and then constantly asking for loads of JSON to say where things are located seem sub-optimal, except maybe for small or toy cases.

But I never used Elisp before, so don't wait for this. But friendly comments about elisp botches I make are welcome (except hanging parens, read the README).

For the reasons above I'm not "waiting for this". Maybe others are, so feel free to propose this to Emacs's proper (via [email protected]), maybe it's useful in some way. Haven't looked at the implementation. If you ask me, chances are your parens quirk will decrease the chances that I or others will ever want to.

Anyway, there are already implementations for this around this repo. Last I checked the challenge was doing this with a tight efficient connection to the fontification machinery. You'd want a connection as tight as the tree-sitter implementation has, but that engine is inherently faster since it lives in the C. With LSP/Elisp it's very challenging, since you have to deal with asynchronicity (and failure) and care not to over-request for very large buffers. Just like with inlay hints, where I ended up using jit-lock-register which is decent but still not 100% accurate. Good luck!

appetrosyan · 2024-08-07T13:20:01Z

Regex based syntax highlighting when the server knows all the details without fumbling and guessing seems sub-optimal.

I would unfortunately have to disagree on this one. While there is no mandate to use incremental parsing for any major mode, the vast majority of the upstream major modes are exceptionally polished and not very... as you say... fumbling. They do the job well, especially for lisp modes. So this may not necessarily help with accuracy.

A bigger problem IMO and that @joaotavora pointed out, is that most communication with a language server happens via JSON. The inefficiency of parsing JSON, creating a delta, and then applying it results in significant performance degradation. Couple that with the fact that you cannot, even in principle, apply this fontification asynchronously.

While I appreciate that semantic tokens can be useful in some situations, applying them to general fontification is a bad idea. Implementing it shouldn't be too difficult, but I don't see using it.

HaraldKi · 2024-08-16T09:16:07Z

Is this in an order of magnitude the user cares about:

The inefficiency of parsing JSON, creating a delta, and then applying it results in significant performance degradation.

A great deal of the Internet these days relies on exactly this, sending JSON (or similar) over the net. And that's not just localhost. So yes, no doubt there is a performance degradation, but do we have numbers? Would the user notice?

joaotavora · 2024-08-16T10:06:12Z

So yes, no doubt there is a performance degradation, but do we have numbers?

We have a fair number of anecdotal evidence that large quantities of JSON emitted towards Emacs slow the user experience, especially because Emacs is single threaded. This has been getting better. I cannot produce numbers. But you can implement all this yourself, do the measurements, and come to your own conclusions.

Would the user notice?

No idea.

joaotavora added enhancement Minor labels Feb 1, 2021

AkibAzmain linked a pull request Feb 21, 2022 that will close this issue

Close #615: Add support for semantic tokens #839

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for semantic tokens #615

Support for semantic tokens #615

leungbk commented Jan 31, 2021

joaotavora commented Jan 31, 2021

joaotavora commented Feb 1, 2021

leungbk commented Feb 5, 2021

joaotavora commented Feb 5, 2021

leungbk commented Feb 5, 2021

joaotavora commented Feb 5, 2021 •

edited

Loading

leungbk commented Feb 5, 2021

joaotavora commented Feb 5, 2021

leungbk commented Feb 5, 2021 •

edited

Loading

terlar commented Feb 5, 2021

maan2003 commented Feb 5, 2021

joaotavora commented Feb 5, 2021

stephe-ada-guru commented Feb 5, 2021 via email

joaotavora commented Feb 5, 2021

leungbk commented Feb 5, 2021

stephe-ada-guru commented Feb 6, 2021 via email

maan2003 commented Feb 9, 2021

maan2003 commented Feb 9, 2021

nowislewis commented Aug 10, 2021

joaotavora commented Aug 10, 2021

nowislewis commented Aug 10, 2021

joaotavora commented Aug 10, 2021

AkibAzmain commented Feb 18, 2022

AkibAzmain commented Feb 21, 2022

AkibAzmain commented Mar 12, 2022

artempyanykh commented Aug 21, 2022

slondr commented Sep 4, 2023

coffeemug commented Dec 28, 2023

HaraldKi commented Apr 5, 2024

joaotavora commented Apr 5, 2024

appetrosyan commented Aug 7, 2024 •

edited

Loading

HaraldKi commented Aug 16, 2024

joaotavora commented Aug 16, 2024

Support for semantic tokens #615

Support for semantic tokens #615

Comments

leungbk commented Jan 31, 2021

joaotavora commented Jan 31, 2021

joaotavora commented Feb 1, 2021

leungbk commented Feb 5, 2021

joaotavora commented Feb 5, 2021

leungbk commented Feb 5, 2021

joaotavora commented Feb 5, 2021 • edited Loading

leungbk commented Feb 5, 2021

joaotavora commented Feb 5, 2021

leungbk commented Feb 5, 2021 • edited Loading

terlar commented Feb 5, 2021

maan2003 commented Feb 5, 2021

joaotavora commented Feb 5, 2021

stephe-ada-guru commented Feb 5, 2021 via email

joaotavora commented Feb 5, 2021

leungbk commented Feb 5, 2021

stephe-ada-guru commented Feb 6, 2021 via email

maan2003 commented Feb 9, 2021

maan2003 commented Feb 9, 2021

nowislewis commented Aug 10, 2021

joaotavora commented Aug 10, 2021

nowislewis commented Aug 10, 2021

joaotavora commented Aug 10, 2021

AkibAzmain commented Feb 18, 2022

AkibAzmain commented Feb 21, 2022

AkibAzmain commented Mar 12, 2022

artempyanykh commented Aug 21, 2022

An example:

slondr commented Sep 4, 2023

coffeemug commented Dec 28, 2023

HaraldKi commented Apr 5, 2024

joaotavora commented Apr 5, 2024

appetrosyan commented Aug 7, 2024 • edited Loading

HaraldKi commented Aug 16, 2024

joaotavora commented Aug 16, 2024

joaotavora commented Feb 5, 2021 •

edited

Loading

leungbk commented Feb 5, 2021 •

edited

Loading

appetrosyan commented Aug 7, 2024 •

edited

Loading