Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy-load embedded provider schemas to improve performance #990

Closed
radeksimko opened this issue Jul 6, 2022 · 2 comments · Fixed by #1071
Closed

Lazy-load embedded provider schemas to improve performance #990

radeksimko opened this issue Jul 6, 2022 · 2 comments · Fixed by #1071
Assignees
Labels
enhancement New feature or request

Comments

@radeksimko
Copy link
Member

radeksimko commented Jul 6, 2022

Background

As documented in https://github.com/hashicorp/terraform-ls/blob/main/docs/benchmarks.md#memory-usage we currently load 1 big blob of JSON with all ~200 providers into memory, which makes up the majority of the ~300MB of memory.

log.Println("creating schemas file")
schemasFile, err := os.Create(filepath.Join("data", "schemas.json"))
if err != nil {
return err
}
log.Println("writing schemas to file")
err = json.NewEncoder(schemasFile).Encode(ps)
if err != nil {
return err
}
log.Println("generating embedded go file")
return vfsgen.Generate(fs, vfsgen.Options{
Filename: "schemas_gen.go",
PackageName: "schemas",
VariableName: "files",
})

Aside from memory usage, it also impacts the initialize response time - i.e. time it takes for the server to become ready for other requests. It currently takes 90+% of the 2 seconds.

err = schemas.PreloadSchemasToStore(svc.stateStore.ProviderSchemas)
if err != nil {
return err
}

Related:

Proposal

There's no exact proposal yet, but there are early ideas we have discussed within the team:

  1. breaking down the single JSON into 1 per provider
  2. lazily-loading the schema into memdb, i.e. loading it only if/when we parse the relevant provider constraint
  3. distributing JSON files alongside the LS binary, as opposed to embedding files into it

Each area comes with some different trade-offs which may need to be considered alongside the issues referenced above.

@radeksimko radeksimko added the enhancement New feature or request label Jul 6, 2022
@radeksimko radeksimko changed the title Improve memory usage by lazy-loading embedded provider schemas Lazy-load embedded provider schemas to improve performance Aug 8, 2022
@radeksimko radeksimko self-assigned this Aug 11, 2022
@radeksimko
Copy link
Member Author

radeksimko commented Sep 12, 2022

Based on various sources and experimenting it appears that:

  • any files embedded via embed are only allocated into memory when actually used/needed
  • the memory allocated as a result of this does get garbage collected

In my experiment, I tried to load ~70MiB JSON file with schemas into memdb and then delete it. This resulted in allocation of 198MiB, which dropped back to zero after garbage collection. The garbage collector acted automatically in ~2mins after deletion, but I was also able to force it via runtime.GC().

Crucially, the memory usage/profile remains the same regardless of whether the file is loaded via os from disk or from embed FS. Therefore we shouldn't need to distribute separate JSON files alongside the binary. This means we don't have to worry about distribution problems in various packaging systems, such as Homebrew or APT/DNF etc. in Linux.

Furthermore I can confirm that breaking down the JSON would have positive impact on peak memory usage. When I tried to store just 1 small part of the JSON into memdb from the ~70MiB JSON, the whole JSON had to be parsed and allocated 1st. The memory was later garbage collected, but loading just the necessary data from a smaller JSON means less pressure on the GC, lower peak memory usage and likely also less CPU spent on parsing the JSON data repeatedly.

@github-actions
Copy link

github-actions bot commented Nov 4, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant