-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share indexes among workspaces #2341
Share indexes among workspaces #2341
Conversation
org.eclipse.jdt.ls.core/src/org/eclipse/jdt/ls/core/internal/handlers/IndexUtils.java
Outdated
Show resolved
Hide resolved
File localIndexFile = localIndexLocation.getIndexFile(); | ||
if (localIndexLocation.exists()) { | ||
File sharedIndexFile = sharedIndexLocation.getIndexFile(); | ||
if (sharedIndexFile == null || localIndexFile == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can check if sharedIndexFile
exists on disk. If yes then skip the file copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a use case to not skip the file copy.
Each index file contains a header metadata (e.g. INDEX VERSION 1.131
) at the begining. When the sharedIndexFile is generated by an outdated index scheme (e.g. 1.130), jdt.ls will find it not to match current expected index version, and then regenerate a new version (e.g. 1.131) in local workspace. In such case, we need to overwrite the outdated one in shared location.
I have optimized the code to check if the shared index is being used by current workspace. If so, skip copy and delete local index directly. If not, that means it should be an outdated index and language server will use local index to overwrite it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can check if sharedIndexFile exists on disk. If yes then skip the file copy?
If we support grouping the shared indexes by version, then it's ok to skip the file copy if shared index exists.
I made a PR in jdt.core to support reading shared indexes by version, @rgrunber could you help a review on that? eclipse-jdt/eclipse.jdt.core#564
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can check if sharedIndexFile exists on disk. If yes then skip the file copy?
@rgrunber This should be the last open question.
Since we cannot tell from file name if the sharedIndexFile contains a correct index version, currently we just always overwrite the sharedIndexFile with local copies. But the disadvantage is that the shared indexes cannot be used by index engine of different versions.
If we skip the file copy when the sharedIndexFile exists, this may result in the shared location containing an outdated version. The indexes are hidden from users, that means if our tools don't keep the index versions update, no one will do that and evetually the shared folder becomes useless.
As a tradeoff, if we save the index versions in different folder, then we can skip file copy if the same shared index version exists.
needDeleteLocalIndex = true; | ||
} else { | ||
Index localIndex = indexManager.getIndex(localIndexLocation); | ||
if (localIndex == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might happen that after project imported, the index manager has not setup the index locations, then we skip a lot of local index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the job to write out the indices is done in the background, so you can definitely have a case where for large libraries, they are still being written to disk after import. This is one of the reasons why in Eclipse (and likely VS Code) you can try to search after the project is imported and not get any results.
Would something like https://github.com/eclipse/eclipse.jdt.ls/blob/250ac4ff3d169ad56bf68bf6cb7bb48ef181be1f/org.eclipse.jdt.ls.core/src/org/eclipse/jdt/ls/core/internal/JobHelpers.java#L242 help ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waitUntilIndexesReady
seems to work quite well. It can ensure all indexes are generated before copying.
if (needDeleteLocalIndex) { | ||
try { | ||
// Remove the local index after it has been copied to the shared index location. | ||
Files.deleteIfExists(localIndexFile.toPath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen after we delete the location index file, but the location mappings are not updated in the index manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not completely sure, but I think it would only switch over to the shared index, if a change is detected on the index (doesn't really happen for a jar), or if we find a way to manually clear it out.
I see there's a clearCache()
in the index manager. Maybe that's enough to re-read the entries without re-writing anything. I don't remember ever testing the case of copying over an index from the "local" area to the "shared" area after import has completed (to see if the shared area takes over).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen after we delete the location index file, but the location mappings are not updated in the index manager?
This is a really good question.
I've tried it and it can cause the cached indexes to be unusable. This is because the search engine will use the cached Index/DiskIndex instances in memory for results. Simply deleting the index file may break the cached DiskIndex instance on that file, because the query operation on the DiskIndex is lazily loading the index data on disk and may get a FileNotFound error.
I came up with a conservative way to solve this problem. The first time we only copy the local index to the shared location, but without deleting the local copy. The next time when you reload the language server, it is safe to delete the local indexes because the language server will automatically switch to the shared index and local indexes become unused.
From what I've tried, the new approach works well.
5c05918
to
1a0a3d4
Compare
}, IResourceChangeEvent.PRE_REFRESH); | ||
} | ||
|
||
private static void copyIndexesToSharedLocation(IJavaProject[] javaProjects) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that if it is necessary to schedule this method into a Job, since it's dealing with IO and probably takes a couple of seconds to finish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What benefits do you expect from wrapping it as a Job? Lock the workspace? Or show some progress notification through the Job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thought is to wrap it to a job with a very low priority - not sure how much benefit will get.
Here is the performance benefit we can get from shared indexes. The sample project is spring-petclinic with 112 external libraries dependencies, and the time and storage costs for indexing these external jars are as follows.
|
Just an oddity I've found, though it doesn't seem to have any negative side effect. With the following Maven project, maven-my-app.zip. The important detail is that it uses a JDK (eg. 19) that is not the same as the one embedded (eg .JDK 17) After importing the project initially, the indexes will be duplicated over into the
I'm not sure why the JDK 19 index in the shared location isn't used after the first reload. |
Can you reproduce this problem consistently? I tried your sample project, and after first reload, local index for jdk 19 is removed and it is switched to the shared index automatically. |
In my workspace there are two Java runtimes. One that is used by the project itself, and one that is used by the default java project (jdt.ls-java-project/). The one used by the default project doesn't seem to get switched to use the shared index and continues to use the local index. I've tried on a fresh VS Code instance just to rule out any other settings having an impact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I know what's happening. On the initial project import all projects are indexed (the Maven/Gradle project & jdt.ls-java-project
. My guess is the default project gets indexed as part of its initial creation. After reloading, the default project doesn't get re-indexed at all (it isn't needed), so it's libraries are never switched over.
If the Maven/Gradle project uses a JRE different from the default project (default JRE), then the default project JRE never gets shared and remains in the local area.
I don't think this is harmful, but it just means the workspace storage won't be as small as if we re-indexed it on re-load.
Feel free to squash and merge when ready.
Thanks for looking into the root cause. Since it doesn't have impact on the functionality, i will merge this PR first. And track it with a new issue #2402. |
The upstream PR https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/152349/ allows reusing library indexes from a common location if they exist. If the common index is not hit, then it will fallback to create indexes in local workspace instead of writing back to the common location.
This PR attempts to solve the writing problem and support copying local library indexes back to the common location after the projects have been imported.
To avoid conflicts between two processes writing to a shared index, each Java process first copies the local index (e.g. 1977137404.index) to a unique tmp file (e.g. 1977137404.index.<timestamp>) on the shared folder, and then renames the tmp file name to the desired index file name. Because renaming is atomic, it does not leave a file with corrupted contents.
JobHelpers.waitUntilIndexesReady() can force the indexes to be saved.