-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM when working with large files because of huge DocumentSymbols response #815
Comments
(I am quoting this part of your OP message because it was an edit people only following email notifications won't see it)
The only LSP4J change I can think of that would have a small (tiny?) effect is that messages have a new field As to your main question, I don't have much thoughts on it. Having a configurable max message size seems ok, but not sure where that should be configured. Very large JSON messages can translate into enormous Objects, or relatively small ones (for example if most of the JSON message is simply long strings such as file contents). I don't think LSP4J's jsonrpc should have a default max size. I wonder if there is more efficient way (space wise) to process the JSON input that would avoid all the intermediary objects. i.e equivalent to sax vs dom xml parsing. Perhaps jackson? It would expose how much of the LSP4J API leaks out GSON API I guess. |
When thinking about replacing GSON I recommend to check out https://github.com/fabienrenaud/java-json-benchmark esp. the deserialization performance graphs. |
Useful link. Thanks for sharing.
We need to adjust benchmark payload to 100× that and look at memory usage too. |
I completely agree. When the number of nodes in the input message is huge, there is only so much LSP4J can do, GSON or not -- the resulting deserialized object graph would still be huge. However, LSP supports partial results streaming for such use cases, which the DocumentSymbols request also supports: So, it does support batching. It is just that both the server and the client need to support this option. |
@pisv does LSP4J support partial results? I am wondering how I can convert final var params = new DocumentSymbolParams(LSPEclipseUtils.toTextDocumentIdentifier(documentURI));
symbols = outlineViewerInput.wrapper.execute(ls -> ls.getTextDocumentService().documentSymbol(params)); |
Yes, LSP4J supports partial results, i.e. it provides mapping of all the necessary LSP structures to Java. For a usage example on the client-side, you can have a look at See also AbstractPartialResultProgress and related classes in LXTK. |
I had a look at the project. To be honest I don't really grasp it, there are so many layers of abstractions/indirection that I got lost in the code. It looks like you needed to implement a whole infrastructure on top of lsp4j just to get partial results working. So far I understand I need to set partialResultToken and workDoneToken on the request params and then can send the request with the same tokens multiple times. A very naive implementation of my current understanding is: var params = new DocumentSymbolParams(new TextDocumentIdentifier(documentURI));
params.setPartialResultToken(UUID.randomUUID().toString());
params.setWorkDoneToken(UUID.randomUUID().toString());
do {
List<Either<SymbolInformation,DocumentSymbol>> result = languageServer.getTextDocumentService().documentSymbol(params).get());
} while ( /* when to exit ?? */ ); What I don't understand is, how to know when all results are returned. |
No, it does not work so. First, work done progress is independent of partial result progress, so you do not actually need to set a work done token if all you need is partial results. Second, you only need to send the request once; partial results, if the server supports them for this request, will be sent by the server via one or more So, the key here is that partial results need to be actually supported by the server for a given request. Then, you need to ensure that I'd suggest reading the following sections of the specification for more information: (The latter describes client-initiated work done progress, but the same principles apply equally to partial result progress.) |
@sebthom are you working currently on a PR for this issue here or in LSP4E? |
@ghentschke no I am not. go for it :-) |
While working on #703 LSSymbolsContentProvider populating the outline view freezes UI for minutes for large files I am now facing the issue that when loading large JSON on log files (in my case 23MB with 11k lines) my Eclipse IDE with 3GB heap exits with an OOM while rendering the outline.
I traced the issue down to the fact that the JSON language server responds to the DocumentSymbols request with a whooping 160MB large JSON response which is then processed by the lsp4j's ConcurrentMessageProcessor + StreamMessageProducer. Unfortunately the StreamMessageProducer isn't really stream processing but tries to deserialize the JSON string DOM-style like into a huge object with countless of GSON LinkedTreeMap instances which eventually trigger the OOM.
A first optimization I see is to avoid this string allocation and instead pass an InputStreamReader instance at:
lsp4j/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/json/StreamMessageProducer.java
Lines 191 to 193 in c1d4cd3
In my case that would avoid the temporary blocking 160MB of heap and make it available for the JSON processing.
However ultimately this will not be enough. The main issue here is that the DocumentSymbols request does not support batching microsoft/language-server-protocol#1533
So I was wondering if it wouldn't make sense in LSP4J to not process incoming messages of a certain size and handle them gracefully with a RuntimeException that is properly logged.
Btw. I did not encounter these OOMs half a year ago when I opened the eclipse/lsp4e#703. At that time I was using Eclipse 2023-06 and the respective lsp4j version. Can it be that lsp4j allocates more memory in the latest release?
The text was updated successfully, but these errors were encountered: