Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share indexes among workspaces #2341

Merged
merged 7 commits into from
Jan 16, 2023

Conversation

testforstephen
Copy link
Contributor

@testforstephen testforstephen commented Nov 22, 2022

The upstream PR https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/152349/ allows reusing library indexes from a common location if they exist. If the common index is not hit, then it will fallback to create indexes in local workspace instead of writing back to the common location.

This PR attempts to solve the writing problem and support copying local library indexes back to the common location after the projects have been imported.

To avoid conflicts between two processes writing to a shared index, each Java process first copies the local index (e.g. 1977137404.index) to a unique tmp file (e.g. 1977137404.index.<timestamp>) on the shared folder, and then renames the tmp file name to the desired index file name. Because renaming is atomic, it does not leave a file with corrupted contents.

One known issue is that current sync behavior between local indexes and shared location is incremental. It's because IndexManager keeps indexes in memory until JobManager is idle and then writes them back to disk. This means that when we open a fresh workspace, its local indexes may still be in memory after the project has been imported. Copying the local indexes may not do anything. But the good part is that if you reload the same workspace a few times, these local indexes will eventually be saved to disk and get chance to be synced up with the shared location.

JobHelpers.waitUntilIndexesReady() can force the indexes to be saved.

File localIndexFile = localIndexLocation.getIndexFile();
if (localIndexLocation.exists()) {
File sharedIndexFile = sharedIndexLocation.getIndexFile();
if (sharedIndexFile == null || localIndexFile == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can check if sharedIndexFile exists on disk. If yes then skip the file copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a use case to not skip the file copy.

Each index file contains a header metadata (e.g. INDEX VERSION 1.131) at the begining. When the sharedIndexFile is generated by an outdated index scheme (e.g. 1.130), jdt.ls will find it not to match current expected index version, and then regenerate a new version (e.g. 1.131) in local workspace. In such case, we need to overwrite the outdated one in shared location.

I have optimized the code to check if the shared index is being used by current workspace. If so, skip copy and delete local index directly. If not, that means it should be an outdated index and language server will use local index to overwrite it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can check if sharedIndexFile exists on disk. If yes then skip the file copy?

If we support grouping the shared indexes by version, then it's ok to skip the file copy if shared index exists.

I made a PR in jdt.core to support reading shared indexes by version, @rgrunber could you help a review on that? eclipse-jdt/eclipse.jdt.core#564

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can check if sharedIndexFile exists on disk. If yes then skip the file copy?

@rgrunber This should be the last open question.

Since we cannot tell from file name if the sharedIndexFile contains a correct index version, currently we just always overwrite the sharedIndexFile with local copies. But the disadvantage is that the shared indexes cannot be used by index engine of different versions.

If we skip the file copy when the sharedIndexFile exists, this may result in the shared location containing an outdated version. The indexes are hidden from users, that means if our tools don't keep the index versions update, no one will do that and evetually the shared folder becomes useless.

As a tradeoff, if we save the index versions in different folder, then we can skip file copy if the same shared index version exists.

needDeleteLocalIndex = true;
} else {
Index localIndex = indexManager.getIndex(localIndexLocation);
if (localIndex == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might happen that after project imported, the index manager has not setup the index locations, then we skip a lot of local index.

Copy link
Contributor

@rgrunber rgrunber Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the job to write out the indices is done in the background, so you can definitely have a case where for large libraries, they are still being written to disk after import. This is one of the reasons why in Eclipse (and likely VS Code) you can try to search after the project is imported and not get any results.

Would something like https://github.com/eclipse/eclipse.jdt.ls/blob/250ac4ff3d169ad56bf68bf6cb7bb48ef181be1f/org.eclipse.jdt.ls.core/src/org/eclipse/jdt/ls/core/internal/JobHelpers.java#L242 help ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitUntilIndexesReady seems to work quite well. It can ensure all indexes are generated before copying.

if (needDeleteLocalIndex) {
try {
// Remove the local index after it has been copied to the shared index location.
Files.deleteIfExists(localIndexFile.toPath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen after we delete the location index file, but the location mappings are not updated in the index manager?

Copy link
Contributor

@rgrunber rgrunber Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not completely sure, but I think it would only switch over to the shared index, if a change is detected on the index (doesn't really happen for a jar), or if we find a way to manually clear it out.

I see there's a clearCache() in the index manager. Maybe that's enough to re-read the entries without re-writing anything. I don't remember ever testing the case of copying over an index from the "local" area to the "shared" area after import has completed (to see if the shared area takes over).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen after we delete the location index file, but the location mappings are not updated in the index manager?

This is a really good question.

I've tried it and it can cause the cached indexes to be unusable. This is because the search engine will use the cached Index/DiskIndex instances in memory for results. Simply deleting the index file may break the cached DiskIndex instance on that file, because the query operation on the DiskIndex is lazily loading the index data on disk and may get a FileNotFound error.

I came up with a conservative way to solve this problem. The first time we only copy the local index to the shared location, but without deleting the local copy. The next time when you reload the language server, it is safe to delete the local indexes because the language server will automatically switch to the shared index and local indexes become unused.

From what I've tried, the new approach works well.

}, IResourceChangeEvent.PRE_REFRESH);
}

private static void copyIndexesToSharedLocation(IJavaProject[] javaProjects) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that if it is necessary to schedule this method into a Job, since it's dealing with IO and probably takes a couple of seconds to finish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What benefits do you expect from wrapping it as a Job? Lock the workspace? Or show some progress notification through the Job?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought is to wrap it to a job with a very low priority - not sure how much benefit will get.

@testforstephen
Copy link
Contributor Author

testforstephen commented Jan 3, 2023

Here is the performance benefit we can get from shared indexes.

The sample project is spring-petclinic with 112 external libraries dependencies, and the time and storage costs for indexing these external jars are as follows.

Indexing external libraries Time Cost Storage Cost
Generating indexes for 112 library dependencies of spring-petclinic (including JDK17) 30s 104 M
Generating index for JDK17 jrt-fs.jar 19s 37.6 M

@rgrunber
Copy link
Contributor

rgrunber commented Jan 12, 2023

Just an oddity I've found, though it doesn't seem to have any negative side effect.

With the following Maven project, maven-my-app.zip. The important detail is that it uses a JDK (eg. 19) that is not the same as the one embedded (eg .JDK 17)

After importing the project initially, the indexes will be duplicated over into the .jdt/index location.
Once the editor is reloaded, the language server switches to the shared index location for the external libraries, except for the embedded JDK 19.

file:/home/rgrunber/.jdt/index/1.132/567126731.index -> Index for /home/rgrunber/git/vscode-java/jre/17.0.4.1-linux-x86_64/lib/jrt-fs.jar

file:/home/rgrunber/.config/Code/User/workspaceStorage/d282e03119115911b164ce5d98de7d04/redhat.java/jdt_ws/.metadata/.plugins/org.eclipse.jdt.core/1023491807.index -> Index for /usr/lib/jvm/java-19-openjdk/lib/jrt-fs.jar

I'm not sure why the JDK 19 index in the shared location isn't used after the first reload.
It takes yet another reload for the JDK 19 shared index to be used. (not sure what I did to get this to happen but wasn't able to reproduce)

@testforstephen
Copy link
Contributor Author

Just an oddity I've found, though it doesn't seem to have any negative side effect.

With the following Maven project, maven-my-app.zip. The important detail is that it uses a JDK (eg. 19) that is not the same as the one embedded (eg .JDK 17)

After importing the project initially, the indexes will be duplicated over into the .jdt/index location. Once the editor is reloaded, the language server switches to the shared index location for the external libraries, except for the embedded JDK 19.

file:/home/rgrunber/.jdt/index/1.132/567126731.index -> Index for /home/rgrunber/git/vscode-java/jre/17.0.4.1-linux-x86_64/lib/jrt-fs.jar

file:/home/rgrunber/.config/Code/User/workspaceStorage/d282e03119115911b164ce5d98de7d04/redhat.java/jdt_ws/.metadata/.plugins/org.eclipse.jdt.core/1023491807.index -> Index for /usr/lib/jvm/java-19-openjdk/lib/jrt-fs.jar

I'm not sure why the JDK 19 index in the shared location isn't used after the first reload. It takes yet another reload for the JDK 19 shared index to be used.

Can you reproduce this problem consistently?

I tried your sample project, and after first reload, local index for jdk 19 is removed and it is switched to the shared index automatically.

@rgrunber
Copy link
Contributor

In my workspace there are two Java runtimes. One that is used by the project itself, and one that is used by the default java project (jdt.ls-java-project/). The one used by the default project doesn't seem to get switched to use the shared index and continues to use the local index. I've tried on a fresh VS Code instance just to rule out any other settings having an impact.

Copy link
Contributor

@rgrunber rgrunber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I know what's happening. On the initial project import all projects are indexed (the Maven/Gradle project & jdt.ls-java-project. My guess is the default project gets indexed as part of its initial creation. After reloading, the default project doesn't get re-indexed at all (it isn't needed), so it's libraries are never switched over.

If the Maven/Gradle project uses a JRE different from the default project (default JRE), then the default project JRE never gets shared and remains in the local area.

I don't think this is harmful, but it just means the workspace storage won't be as small as if we re-indexed it on re-load.

Feel free to squash and merge when ready.

@testforstephen
Copy link
Contributor Author

Ok, I think I know what's happening. On the initial project import all projects are indexed (the Maven/Gradle project & jdt.ls-java-project. My guess is the default project gets indexed as part of its initial creation. After reloading, the default project doesn't get re-indexed at all (it isn't needed), so it's libraries are never switched over.

If the Maven/Gradle project uses a JRE different from the default project (default JRE), then the default project JRE never gets shared and remains in the local area.

I don't think this is harmful, but it just means the workspace storage won't be as small as if we re-indexed it on re-load.

Feel free to squash and merge when ready.

Thanks for looking into the root cause. Since it doesn't have impact on the functionality, i will merge this PR first. And track it with a new issue #2402.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants