-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help TS team to assume ownership of semantic highlighting in TS tooling #92789
Comments
In #92963 (comment), @alexdima mentioned something about a delta strategy. I can't find any place where this delta strategy is documented. Can you please elaborate on that? |
@DanielRosenwasser Here is the relevant documentation. A |
One of the things we brought up when this was first discussed is that TypeScript doesn't have a model where we can correlate the current semantic state of the world with that of the last semantic state. I don't think that that's accounted for in this model. In Visual Studio we've had APIs to provide semantic classifications which are handled by Roslyn and we simply haven't had to worry about any issues there. This has worked for years, and given the constraints that I've mentioned above, along with the fact with this API each language server has to figure out how to achieve this, I think we need to rethink the approach here. |
I would agree with this. It doesn't make sense to me for languages to have to figure it out. It's non-trivial and would involve a lot of work per-language. In Roslyn we allow the languages to just be 'dumb'. They compute their syntactic or semantic classifications however they see fit, and we just update our host intelligently with the minimal change that's appropriate. Note that this is what we do with all our taggers. classification, outlining, line separators, reference highlighting, etc. etc. etc. All of these sit on systems where the language is just asked to compute simply, and the host (i.e. Roslyn) intelligently manages those and ensures minimal updating to actual clients. Having this complex logic centralized has been a boon for all roslyn languages to be able to focus on the language tasks and not on complexities of semantic changes and diffs. I don't think it's reasonable or appropriate to now push that back out onto all the languages again. |
Sending deltas is only solving a small part of the problem. We also have to consider that the language side is going to end up spending a lot of time computing semantic information for highlights that are off-screen and remain off-screen - a user editing a large file might only ever scroll a tiny portion of it into view. For TypeScript, a question like "Is this JS identifier actually a class or a function" involves a surprisingly large amount of work, e.g. looking for prototype assignments, looking for Worse, edits near the top of the file are going to invalidate a lot of semantic highlights as the file goes in and out of a syntactically valid state, so even under a delta-based scheme we're going to have to send back a bunch of (nontrivially computed) diffs for off-screen tokens. Even considering none of that, we're looking at 5+ seconds of extra startup responsiveness delay for a large file as VS Code processes the initial highlighting data, which deltas will not fix any of, I'd like to revisit #92963 to address the underlying concern here and ship a performant feature - VS Code has already invested a lot in making the experience shine in large files and I think we need to continue to keep these scenarios in mind. |
The purpose of I'm going to try to explain the idea behind I find it best to reason about an example. So, suppose you open Let's suppose the 1.6MB semantic tokens have made it to the editor, they are now rendered and all is good. But suppose a single edit is done: a new line is inserted at the top of The simple small edit is communicated to the language service with a single text delta event that is small: it will just say that at So that is where the Now, for the simple implementation I have promised. On the language server side, when a For a subsequent call to You need to compute one or more edits which brings If you don't want to do this, then that is ok, simply don't implement |
The thing is that there's a huge cost in this one step:
The minimum overhead of this is:
We're already seeing the negative effects of this - that's the 5+ second delays we're seeing locally. On the other hand, doing that huge JSON serialization and then going across the wire - that's also unacceptable. Serializing and deserializing about 2 million integers in arrays of arrays takes about 500ms on my machine, and nobody wants that. var result = []
for (let i = 0; i < 2000000; i += 5) {
let tuple = [];
for (let j = 0; j < 5; j++) {
tuple.push(j);
}
result.push(tuple);
}
var start = Date.now();
var yadda = JSON.parse(JSON.stringify(result))
var end = Date.now();
console.log(end - start); So both of these problems need to be fixed. Regardless of the API, the bottom line is that the editor is potentially going to be asking for language services to calculate too much stuff. |
@DanielRosenwasser So the slowness of serializing a large result over and over again, and the slowness of downloading it over wi-fi potentially (in the remote SSH case) over and over again is resolved by the proposed API, which allows for an edit to be sent. But to be honest, the slowness of walking the AST and generating semantic tokens is purely under the control of the TS language service, so I cannot prescribe in the VS Code API how to eliminate that slowness. I can only ask: why are suggestions so fast? A simple implementation for suggestions which can be implemented in under 1 day would be to walk the entire program and all the ASTs and collect all possible reachable symbols... Yet, perhaps you are doing something faster since it doesn't take 5+ seconds? Perhaps you spent more than 1 day on this? Perhaps something can be engineered by a TS developer with intimate knowledge of the compiler such that it is not necessary to type check the entire program when computing semantic tokens? Perhaps when looking at an identifier it can be decided quicker if it is a class type name, an enum type name, a variable, a constant or an argument... etc without having to type check all type constraints? Perhaps it is possible to cache semantic tokens with the AST and invalidate them only selectively when typing? For example if I am typing inside a comment, or in a method body, computing semantic tokens could be super fast since most of the symbols in the file have not changed. Also, if the 5s+ is spent in type-checking the program, it means that those 5s+ would have been spent anyways, when computing diagnostics, or when hovering, right? How much is work that would have been done anyways? Another idea is that you could also delay answering to the request to when you do have a type checked program. There is one more thing I forgot to mention, you can throw All I'm trying to say is that I believe something better than walking the entire AST and typechecking the entire program can be built, but I think you are in the best position to build that. But I agree, if the dev budget for this feature is 1 day, then it will take 5s+ on |
Another idea I just had. If So if |
@alexdima it sounds like what you're asking for is some sort of eureka-style optimization that we can apply to a problem space which we've thought about for years (reusing semantic information between compiles) for a feature that's already been implemented reasonably well for other editors. I really urge you to reconsider: is this eureka-style optimization something you think every language service is going to be able to implement independently? Does it exist?
This question is pertinent, and I'm glad that you asked it. Apart from incremental parsing, the reason that features like completion, quick info, go-to-definition, etc. are reasonably quick isn't that they're doing super special optimizations - it's that they're doing the minimum work necessary to provide an answer for a client. If you were to ask for quick info on every identifier in the program, you would hit similar scaling issues on any reasonably-sized file. The way that semantic highlights is able to stay reasonably fast in Visual Studio is through a very similar approach: editors have only requested highlights around the user's viewport. The server only needs to dive into the subset of the tree within the range requested by the editor. It only resolves symbols in that range. That cuts out a large amount of computation. So ultimately that's what we're asking about: can the API ask for less in an intelligent manner? |
@DanielRosenwasser Sure, then implement only a I ask that you double check that Visual Studio is doing 0 synchronous calls from the UI thread to |
I can obviously eagerly request 20 lines above and below the viewport, but the flash will be there given a higher latency or a higher scrolling speed. |
This is what we'd like the VS Code TS plugin to do, then. The theme colors should be designed such that the flickering is not distracting. VS does this and it works very well. By choosing good colors, it's e.g. a shift from blue to darker blue, or blue to a teal color. When jumping between file locations it is certainly visible but is not at all bothersome; we don't receive complaints about it. Having A good semantic highlighting scheme makes the semantic information subtly different from the nonsemantic coloring of the same identifier; I think people get tempted to add very bright "it's obviously working!"-style colors when they first add a new form of colorization, but users don't like this kind of highlighting as I think you've seen from feedback. |
@CyrusNajmabadi the reason why we ask the server to do some delta encoding is to minimize the data that needs to go over the wire (e.g. from the server to any client) because semantic coloring for a whole file can be heavy weight. In a single process world I do agree that the language smarts shouldn't be bothered with this. We tried to make it very easy for a server to compute the diff since it is not a semantic diff. It is a number array diff which in most cases should be easy to compute. I also want to point out that from the discussion I had with other teams only semantic coloring the current range is not always enough. Doing so will result in the fact that the mini map will not show semantic colors or incorrect semantic colors. For example C++ uses semantic coloring to highlight active and inactive regions (rendered gray) in the minimap. |
In my role as an API provider, this is a great outcome. The vscode API comes with both As a daily user of TypeScript, this is not the best outcome, because I will be able to tell if I ever see a flicker why that is, and I know it could have been better. But I have hope that maybe one day you could implement an efficient In conclusion, I think this is under the control of the TS extension and TS language server, so the API is good to ship in its current form this milestone. |
@alexdima the repeated implication that we just haven't thought about this problem very hard, and are unwilling to do so for more than a day, isn't useful here, and I want to make sure you understand that we have indeed thought about this quite hard so that you have realistic expectations going forward and for other languages. Consider this example: The amount of work required to color the two You mentioned before doing incremental semantic analysis based on invalidating only certain things based on what in the program changed. If we were capable of producing differential semantic outputs, we would have done so years ago. An earlier version of the TS checker tried to maintain meaningful semantic information between edits, and it failed miserably. The flow team at Facebook is trying to do this, and this takes up so much of their time that they had to blog about the fact that they didn't appear to be doing much of anything else. Other languages that have been designed with static analysis in mind will surely have an easier time of this, but JS is not a language designed with easy static analysis in mind. Maybe semantic highlighting is not a good fit for TS/JS given the efficiency constraints being implied here -- it seems like this feature has been designed with the idea that computing the coloring of a token is nearly free from the provider's perspective, but this just isn't the case for TS/JS. What do you think? |
@RyanCavanaugh You have a very good point. I apologize for speculating without knowing how I only ran each test once, so the absolute times don't mean too much, since the variation is large, but it helps to get an idea of what is going on. With semantic highlighting turned off:
With semantic highlighting turned on:
I find two things interesting: computing semantic tokens is more expensive than typechecking, and it looks like some things (but not all) get cached between calls on the same project state, because I then went ahead and made one change, to delay the server request from the ts extension in order to run the semantic highlighting after the diagnostics pass, in the hope of benefiting from that caching:
This confirms that some things are cached in the typechecker, but computing semantic tokens is still very expensive. I thought that there must be something wrong with the way semantic tokens are computed since even now computing them is slower than computing semantic diagnostics. I then started looking at the plugin's implementation of I took a peek at the typechecker itself and that is written differently, in a what I understood to be a more traditional "top-down visit each node in a typed manner" approach, with functions which are way less polymorphic than the plugin's very generic I began writing a more traditional visitor. I haven't finished yet (I still have nodes left to visit with
And here is where things get interesting. A lot of time is actually not spent in the visitor, but again in the typechecker. So while some things are cached, I have a strong suspicion (without really understanding the typechecker itself) that not a lot of things are cached, since I can see in the profiles I'm doing that still a lot of time is spent typechecking while generating the semantic tokens on a project with the same state. I am not finished, but I wanted to give a quick update on what my thinking currently is. I think if more caching would be done in the typechecker, the semantic tokens generation would be significantly cheaper (maybe an optimistic 10x). This also makes me think about the C++ team who AFAIK generate the semantic tokens while reconciling (they call it the "intelisense pass") to avoid computing things twice. When I reference in my comments above a dev budget of 1 day, I don't mean that I ask someone from the TS team to invent a new kind of compiler or rewrite all the compiler infrastructure for this feature. I mean that someone spends time because I consider this to be an engineering problem that could be solved with caching and better timing of things. I will continue on this visitor and let you know of the results, but I still believe that someone with deep knowledge of the TS compiler/domain would be much better suited for the task. |
Here is a quick update: In the end, I was able to reach about 10x overall improvements. Here are the average times to compute semantic tokens for
I have compared semantic tokens generated by the The |
Outcome of discussion today:
@aeschli I'm going to unassign myself from this issue and tentatively move it to the next iteration. However if you need help coordinating this work with the TS team, just let me know! |
I was under the impression that someone for the TypeScript team would move the plugin code to the server and adopt it to their liking. Did I misunderstand? I'm happy maintain the plugin in the meantime and to adopt the TS server when this is ready. |
@aeschli Yes I think the TS team would do the code changes. You would driving the work by following up with the TS team and providing them with any support they need for this |
The PR integrating these features into TS is microsoft/TypeScript#39119 👍 |
Great news, @orta! As the migration is now well on its way, I'll close this issue. I'm happy to help anywhere, just ping me! |
We've been working on our semantic coloring support. This is currently powered by a TS Server plugin but we’d like to merge this work back into TS Server.
The plugin's readme descibed it's purpose and a list of the implemented features.
Also see the Tests for more details: https://github.com/aeschli/typescript-vscode-sh-plugin/blob/master/src/test/semanticTokens.test.ts
Along with the adoption goes the request to improve the
encodedSemanticClassifications-full
command to also support deltas. See #92963 (comment)The text was updated successfully, but these errors were encountered: