Lazy-load embedded provider schemas to improve performance #990

radeksimko · 2022-07-06T19:40:59Z

Background

As documented in https://github.com/hashicorp/terraform-ls/blob/main/docs/benchmarks.md#memory-usage we currently load 1 big blob of JSON with all ~200 providers into memory, which makes up the majority of the ~300MB of memory.

terraform-ls/internal/schemas/gen/gen.go

Lines 163 to 180 in 81b49a1

    
           log.Println("creating schemas file") 
        
           schemasFile, err := os.Create(filepath.Join("data", "schemas.json")) 
        
           if err != nil { 
        
           	return err 
        
           } 
        
           log.Println("writing schemas to file") 
        
           err = json.NewEncoder(schemasFile).Encode(ps) 
        
           if err != nil { 
        
           	return err 
        
           } 
        
           log.Println("generating embedded go file") 
        
           return vfsgen.Generate(fs, vfsgen.Options{ 
        
           	Filename:     "schemas_gen.go", 
        
           	PackageName:  "schemas", 
        
           	VariableName: "files", 
        
           })

Aside from memory usage, it also impacts the initialize response time - i.e. time it takes for the server to become ready for other requests. It currently takes 90+% of the 2 seconds.

terraform-ls/internal/langserver/handlers/service.go

Lines 510 to 513 in 4308af8

    
           err = schemas.PreloadSchemasToStore(svc.stateStore.ProviderSchemas) 
        
           if err != nil { 
        
           	return err 
        
           }

terraform-ls.exe uses ~500 MB of RAM for each vscode window #986 and similar issues people reported about memory usage before
Use Go 1.16 native go:embed instead of vfsgen #517
Provide opt out from preloaded schemas #506
Option to disable auto terraform providers schema #645
Allow loading of on-disk provider schemas #828
Use registry schema cache if no initialized data for providers #193
Report progress when obtaining schema #150

Proposal

There's no exact proposal yet, but there are early ideas we have discussed within the team:

breaking down the single JSON into 1 per provider
lazily-loading the schema into memdb, i.e. loading it only if/when we parse the relevant provider constraint
distributing JSON files alongside the LS binary, as opposed to embedding files into it

Each area comes with some different trade-offs which may need to be considered alongside the issues referenced above.

The text was updated successfully, but these errors were encountered:

radeksimko · 2022-09-12T17:50:16Z

Based on various sources and experimenting it appears that:

any files embedded via embed are only allocated into memory when actually used/needed
the memory allocated as a result of this does get garbage collected

In my experiment, I tried to load ~70MiB JSON file with schemas into memdb and then delete it. This resulted in allocation of 198MiB, which dropped back to zero after garbage collection. The garbage collector acted automatically in ~2mins after deletion, but I was also able to force it via runtime.GC().

Crucially, the memory usage/profile remains the same regardless of whether the file is loaded via os from disk or from embed FS. Therefore we shouldn't need to distribute separate JSON files alongside the binary. This means we don't have to worry about distribution problems in various packaging systems, such as Homebrew or APT/DNF etc. in Linux.

Furthermore I can confirm that breaking down the JSON would have positive impact on peak memory usage. When I tried to store just 1 small part of the JSON into memdb from the ~70MiB JSON, the whole JSON had to be parsed and allocated 1st. The memory was later garbage collected, but loading just the necessary data from a smaller JSON means less pressure on the GC, lower peak memory usage and likely also less CPU spent on parsing the JSON data repeatedly.

github-actions · 2022-11-04T03:31:26Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

radeksimko added the enhancement New feature or request label Jul 6, 2022

radeksimko changed the title ~~Improve memory usage by lazy-loading embedded provider schemas~~ Lazy-load embedded provider schemas to improve performance Aug 8, 2022

radeksimko self-assigned this Aug 11, 2022

radeksimko mentioned this issue Sep 12, 2022

schemas/cleanup: replace vsfsgen w/ Go stdlib's embed #1070

Merged

radeksimko mentioned this issue Sep 13, 2022

schemas: Break down schema generation #1071

Merged

radeksimko closed this as completed in #1071 Oct 4, 2022

github-actions bot locked as resolved and limited conversation to collaborators Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy-load embedded provider schemas to improve performance #990

Lazy-load embedded provider schemas to improve performance #990

radeksimko commented Jul 6, 2022 •

edited

Loading

radeksimko commented Sep 12, 2022 •

edited

Loading

github-actions bot commented Nov 4, 2022

Lazy-load embedded provider schemas to improve performance #990

Lazy-load embedded provider schemas to improve performance #990

Comments

radeksimko commented Jul 6, 2022 • edited Loading

Background

Proposal

radeksimko commented Sep 12, 2022 • edited Loading

github-actions bot commented Nov 4, 2022

radeksimko commented Jul 6, 2022 •

edited

Loading

radeksimko commented Sep 12, 2022 •

edited

Loading