-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for semantic tokens #615
Comments
In your opinion, what should Eglot do with this? Fontify? these things are normally handled by the major mode already, though maybe it can be added to Eglot's responsibilities, not it's not clear how just yet. |
Calling this "minor" because I don't (yet) understand what could be gained here. |
The semantic highlighting makes it a bit easier for users to connect the dots when reading code. This is what we see without semantic highlighting (default Emacs theme + Notice that the arguments The |
Two questions:
|
Not sure I understand your question. Semantic highlighting should be based on an AST, in contrast to traditional regexp-based font-lock.
|
font-lock.el doesn't have to be based on regexps. as you yourself have noted in the next response. But it often is, and reasonably successfully.
Right. But are the font-lock keywords in rust-mode doing any work? Will you get the same colors if you simply remove those keywords? Or is it doing some work? |
When redefining the major-mode to run (setq-local font-lock-keywords nil)
(setq-local font-lock-defaults nil) then with |
And does it look exactly like the second example you posted here: #615 (comment) ? Also, does this work with |
I redid with Here's a file with the full fruit salad, semantic highlighting combined with And here's the same file with semantic highlighting but with nil |
It nice to hear that semantic highlighting has become a thing in LSP as that brings this feature to all major editors. Previously I have used these two packages to varying successful degree:
However, it has been to various successful degree depending on language, for Emacs Lisp it works great, but for many other languages it usually highlights too many things, personally I would only like to see it for variables to follow them through the code (I am generally using a colorless theme, so it becomes really obvious where the variables are used). These modes are slightly different as it generally assigns one color per variable. |
@maan2003 what major mode as you using? And would you generally recommend |
João Távora <[email protected]> writes:
* Is semantic highlighting conceptually different from what can be
achieved with traditional Emacs font-lock and font-lock-keywords. It's
be great to get rid of that complexity. But it successfully parses a
lot of languages and is very well integrated into Emacs.
The example shown above requires "name resolution"; it matches the
parameter names with the same names in the function code. That can only
be done by a parser, which is what LSP is for.
So using the semantic hightlighting from LSP is a good way to get that
level of highlighting for an Emacs major mode.
…--
-- Stephe
|
Right. But "parse" is a very broad term. I think you mean LSP is aware of the program's AST more fully than Emacs is (for most major modes, perhaps not all). Regexps are a way of parsing certain bits of the language, and syntax tables (for parenthesis matching, for not other) are another where regexps will struggle. Neither are as good as having the AST but they are still pretty good in many cases. But excluding font-lock-keywords, it'd be nice to hook the font-lock machinery into the parse results provided by LSP or by some other incremental parser designed for editors (like tree-sitter). Do you have an idea of how that could be achieved properly, i.e. without some dirty hack or advice? If I learn of a suitable interface, hooking it onto LSP doesn't seem like an extraordinary amount of work. But I would also like to experiment hooking it onto "tree sitter" (https://tree-sitter.github.io/tree-sitter/) |
cc @ubolonton |
João Távora <[email protected]> writes:
> That can only be done by a parser, which is what LSP is for.
Right. But "parse" is a very broad term. I think you mean LSP is aware
of the program's AST more fully than Emacs is (for most major modes,
perhaps not all).
Yes.
Regexps are a way of parsing certain bits of the language, and syntax
tables (for parenthesis matching, for not other) are another where
regexps will struggle. Neither are as good as having the AST but they
are still pretty good in many cases.
Right.
But excluding font-lock-keywords, it'd be nice to hook the font-lock
machinery into the parse results provided by LSP or by some other
incremental parser designed for editors (like tree-sitter).
Yes.
Do you have an idea of how that could be achieved properly, i.e.
without some dirty hack or advice? If I learn of a suitable interface,
hooking it onto LSP doesn't _seem_ like an extraordinary amount of
work. But I would also like to experiment hooking it onto "tree
sitter" (https://tree-sitter.github.io/tree-sitter/)
I have not studied the LSP SemanticToken interface in detail, but from a
quick look, I would expect that each Emacs language major mode would
provide a mapping from SemanticTokens to faces, based on the language
grammar. Eglot could specify a format for that mapping, and use it when
available.
It might be reasonable for eglot to provide a default or base mapping.
For example, tokenType 'variable', modifier 'readonly' could map to
font-lock-constant-face, tokenType 'function' to
font-lock-function-name-face, tokenType 'type' to font-lock-type-face,
tokenType 'keyword' to font-lock-keyword-face. Those are the only faces
currenty used by ada-mode, but a smarter parser could use more.
The Gnu ELPA package 'wisi' provides a way of specifying faces directly
in the language grammar, as actions. It's used in ada-mode, and has a
custom interface to the parser in an external process (which is built by
WisiToken). The external process also has code for computing face,
indent, and navigation in post-parse actions (based on the grammar
actions). (I wrote all of that code).
It might be reasonable to use an LSP server as a wisi backend, but the
interfaces are very different.
It would be reasonable to use tree-sitter as a wisi backend. I started
down that road, but the tree-sitter parser generator can't handle the
Ada grammar (at least, not without a lot of work to eliminate more
conflicts). There would have to be a separate mapping from AST nodes to
wisi actions for face, indent, navigate; the tree-sitter grammar syntax
doesn't support actions.
Most of the hard work in computing face from AST is in the external
process post-parse code, and in the grammar, where face actions are tied
to productions. So there's no code in wisi that eglot can use with an
existing LSP.
Getting back to the highlight actions that started this thread, the
WisiToken parser for Ada does not do name resolution, so it cannot
provide the highlighting suggested. The AdaCore parser libadalang does
do name resolution, and is used by the AdaCore Ada language server.
However, it does not (yet) do indent, and does not support incremental
parsing. I don't know if it provides SemanticTokens yet.
I'm currently upgrading wisi and WisiToken to support incremental
parsing.
…--
-- Stephe
|
Yes, |
Also LSP based highlighting is lot advance than tree sitter. I have few cases from the above example Example 1:
highlighting inside macros(matches in this case) requires macro expansion. Example 2:
also
|
Can anyone tell me how the semantic highlighting is going now? I just switched from lsp to eglot after 1 year friendship with lsp. I really like the cleanliness and speed of eglot |
Eglot doesn't do "semantic highlighting" with information taken from the LSP server. The current understanding is that it is much slower (though potentially less accurate in some languages) than Emacs's own syntax highlighting, which is called "font lock" in Emacs. |
Thanks for your amazing work. I want to know whether semantic highlight is difficult to achieve elegantly. If it does not affect the cleanliness and beauty of eglot, then we would be very grateful if there is an option for the user to switch. But if the implementation is troublesome and destructive, don’t let it affect the normal development of eglot. |
I don't know that it is particularly destructive, neither very laborious. Feek free to give it a go. I believe lsp-mode has an implementation for it (but I haven't looked). YOu can see a summarized description here. Seems that it uses advice, but that can probably be avoided. |
I'm using Eglot and very happy with it. But I miss semantic tokens, and I have no idea about how to implement it. |
I've implemented it. See #839. |
@nowislewis Check out #839. |
One more datapoint about semantic highlighting. I maintain a Markdown LSP server. In addition to regular markdown links it also support An example: |
@stephe-ada-guru Do I understand correctly that this feature being implemented in eglot would allow face support in ada-mode without wisi? Wisi doesn't compile on my machine, so I'm investigating my options for a functional ada-mode. |
Another data point-- clangd implements a semantic highlighting extension for inactive code. So for example if you're working on a C file, in VSCode the |
Regex based syntax highlighting when the server knows all the details without fumbling and guessing seems sub-optimal. I started an eglot-semtok experiment for semantic token based highlighting. But I never used Elisp before, so don't wait for this. But friendly comments about elisp botches I make are welcome (except hanging parens, read the README). |
Most
For the reasons above I'm not "waiting for this". Maybe others are, so feel free to propose this to Emacs's proper (via [email protected]), maybe it's useful in some way. Haven't looked at the implementation. If you ask me, chances are your parens quirk will decrease the chances that I or others will ever want to. Anyway, there are already implementations for this around this repo. Last I checked the challenge was doing this with a tight efficient connection to the fontification machinery. You'd want a connection as tight as the tree-sitter implementation has, but that engine is inherently faster since it lives in the C. With LSP/Elisp it's very challenging, since you have to deal with asynchronicity (and failure) and care not to over-request for very large buffers. Just like with inlay hints, where I ended up using |
I would unfortunately have to disagree on this one. While there is no mandate to use incremental parsing for any major mode, the vast majority of the upstream major modes are exceptionally polished and not very... as you say... fumbling. They do the job well, especially for lisp modes. So this may not necessarily help with accuracy. A bigger problem IMO and that @joaotavora pointed out, is that most communication with a language server happens via JSON. The inefficiency of parsing JSON, creating a delta, and then applying it results in significant performance degradation. Couple that with the fact that you cannot, even in principle, apply this fontification asynchronously. While I appreciate that semantic tokens can be useful in some situations, applying them to general fontification is a bad idea. Implementing it shouldn't be too difficult, but I don't see using it. |
Is this in an order of magnitude the user cares about:
A great deal of the Internet these days relies on exactly this, sending JSON (or similar) over the net. And that's not just localhost. So yes, no doubt there is a performance degradation, but do we have numbers? Would the user notice? |
We have a fair number of anecdotal evidence that large quantities of JSON emitted towards Emacs slow the user experience, especially because Emacs is single threaded. This has been getting better. I cannot produce numbers. But you can implement all this yourself, do the measurements, and come to your own conclusions.
No idea. |
https://microsoft.github.io/language-server-protocol/specifications/specification-3-16/#textDocument_semanticTokens
Introduced in 3.16, and provided by the
clangd
andrust-analyzer
servers. It would be nice if Eglot supported this feature.The text was updated successfully, but these errors were encountered: