Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve PreloadEmbeddedSchema performance #1369

Merged
merged 3 commits into from
Aug 30, 2023

Conversation

dbanck
Copy link
Member

@dbanck dbanck commented Aug 30, 2023

Background

In #1071, we have changed the way we store and load the bundled schemas to reduce our memory footprint. We released the work with version 0.29.3. User reports indicate that in previous versions, e.g. 0.29.2, the observed performance was better.

The Problem

TL:DR we read a schema file from disk every time, even if we have already loaded the file

We schedule the PreloadEmbeddedSchema job when decoding a module (textDocument/didOpen, textDocument/didChange) and from the indexer when walking the workspace files.

The job reads the required providers from a module and checks which schemas are still missing in our internal memory. We use the list of missing schemas to load schemas from disk. After loading a schema from disk, we store it with the address (e.g. registry.terraform.io/hashicorp/aws) and the specific version (found in the schema path) (e.g. 4.33.0) in memory. The next time we come across the same provider in another module, we would reuse the stored schema and not load it again. At least in theory.

Provider requirements usually contain a source and a version (it is a good practice to always include a version). When checking for missing schemas, we were using the unique combination of both to check if a schema already exists in memory. The schemas we bundle with the language server are only the most recent versions of all official and partner providers, so a mismatch between those versions is likely.

A version mismatch triggers a read of the schema from disk (if the schema is part of our bundle) and we try to store it in memory (again). Storing fails with schema for %s is already loaded, because we already store the specific combination of address and version of the schema file from disk.

The Fix

When checking for missing schemas, we now only compare the address and use a version constraint that matches any version (> 0.0.0). We also account for legacy sources and the builtin Terraform provider.

Results

0.29.2 CPU & Memory after launch
cpu_0_29_2
memory_0_29_2

>= 0.29.3 CPU & Memory after launch
cpu_0_29_3
memory_0_29_3

This PR CPU & Memory after launch
cpu_fix
memory_fix

As a result of this fix, we're back to 0.29.2 levels of CPU usage while maintaining the memory improvements of 0.29.3. This also reduces the work & IO we do on each change event.

@dbanck dbanck requested a review from a team as a code owner August 30, 2023 11:41
@dbanck dbanck self-assigned this Aug 30, 2023
@dbanck dbanck added bug Something isn't working performance Gotta go fast labels Aug 30, 2023
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working performance Gotta go fast
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants