Skip to content

Load packages concurrently in filesystem indexers#1489

Merged
mrodm merged 15 commits intoelastic:mainfrom
mrodm:concurrent-load-packages-fsys
Nov 28, 2025
Merged

Load packages concurrently in filesystem indexers#1489
mrodm merged 15 commits intoelastic:mainfrom
mrodm:concurrent-load-packages-fsys

Conversation

@mrodm
Copy link
Copy Markdown
Contributor

@mrodm mrodm commented Nov 25, 2025

This PR updates the file system indexers to load all the packages found in the given paths in parallel using a worker pool.

Doing some local tests:

  • Before (1m 30secs):
{"log.level":"info","@timestamp":"2025-11-25T12:03:26.751Z","log.origin":{"function":"github.com/elastic/package-registry/packages.(*FileSystemIndexer).getPackagesFromFileSystem","file.name":"packages/packages.go","file.line":247},"message":"Searching packages in /packages/package-storage","service.name":"package-registry","service.version":"1.33.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-11-25T12:04:59.557Z","log.origin":{"function":"github.com/elastic/package-registry/packages.(*FileSystemIndexer).getPackagesFromFileSystem","file.name":"packages/packages.go","file.line":247},"message":"Searching packages in /packages/package-storage","service.name":"package-registry","service.version":"1.33.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-11-25T12:04:59.558Z","log.origin":{"function":"main.ensurePackagesAvailable","file.name":"package-registry/main.go","file.line":619},"message":"12427 local package manifests loaded.","service.name":"package-registry","service.version":"1.33.0","ecs.version":"1.6.0"}
  • After (21secs):
{"log.level":"info","@timestamp":"2025-11-25T12:12:37.866Z","log.origin":{"function":"github.com/elastic/package-registry/packages.(*FileSystemIndexer).getPackagesFromFileSystem","file.name":"packages/packages.go","file.line":263},"message":"Searching packages in filesystem","service.name":"package-registry","service.version":"1.33.1","indexer":"ZipFileSystemIndexer","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-11-25T12:12:58.315Z","log.origin":{"function":"github.com/elastic/package-registry/packages.(*FileSystemIndexer).getPackagesFromFileSystem","file.name":"packages/packages.go","file.line":336},"message":"Searching packages in filesystem done","service.name":"package-registry","service.version":"1.33.1","indexer":"ZipFileSystemIndexer","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-11-25T12:12:58.369Z","log.origin":{"function":"github.com/elastic/package-registry/packages.(*FileSystemIndexer).getPackagesFromFileSystem","file.name":"packages/packages.go","file.line":263},"message":"Searching packages in filesystem","service.name":"package-registry","service.version":"1.33.1","indexer":"FileSystemIndexer","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-11-25T12:12:58.419Z","log.origin":{"function":"github.com/elastic/package-registry/packages.(*FileSystemIndexer).getPackagesFromFileSystem","file.name":"packages/packages.go","file.line":336},"message":"Searching packages in filesystem done","service.name":"package-registry","service.version":"1.33.1","indexer":"FileSystemIndexer","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-11-25T12:12:58.420Z","log.origin":{"function":"main.ensurePackagesAvailable","file.name":"package-registry-load-packages-fsys/main.go","file.line":619},"message":"12427 local package manifests loaded.","service.name":"package-registry","service.version":"1.33.1","ecs.version":"1.6.0"}

Author's checklist

  • Add and run benchmark to load packages in file system indexers.
  • Add changelog entry
  • Compare responses returned by the service with the current (main) logic.

How to test this PR locally

# in one terminal
# run the current production image and keep it running to be able to copy the new package-registry binary
docker run --rm --name eprprod -it docker.elastic.co/package-registry/distribution:production

# in other terminal
# build the new package-registry binary with the changes in this PR
# copy that binary into the running container and commit that image to generate a new tag

mage build
docker cp ./package-registry eprprod:/package-registry

docker commit eprprod

# this new docker image is going to run the binary from the EPR
docker tag <sha/id> docker.elastic.co/package-registry/distribution:production-new

# stop the previous container running
docker run --rm -it --name eprnew docker.elastic.co/package-registry/distribution:production-new

@mrodm mrodm self-assigned this Nov 25, 2025
@prodsecmachine
Copy link
Copy Markdown

prodsecmachine commented Nov 25, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copy link
Copy Markdown
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying this, this can be a great improvement.

position int
path string
}
numWorkers := runtime.GOMAXPROCS(0) / 2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try directly to use GOMAXPROCS instead of half of them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only used half of them because I was being conservative. But, it should not be any problem on setting GOMAXPROCS instead.

Comment on lines +265 to +268
for range numWorkers {
wg.Add(1)
go func(logger *zap.Logger, fsBuilder FileSystemBuilder) {
defer wg.Done()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, consider moving the workers pool logic to different functions for clarity. Something like this is done in https://github.com/elastic/package-registry/pull/1335/files#diff-8ea6771aea0aa89d25c6b29227dbc22db28f067930f7b9fc9921134c66c744b5R50.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to use the same functions as your link.
I added all these methods as an internal package in 01407b7

Copy link
Copy Markdown
Contributor

@teresaromero teresaromero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like an improvement !

}
numWorkers := runtime.GOMAXPROCS(0) / 2
mu := sync.Mutex{}
packagesFound := make(map[packageKey]struct{})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why using here an empty struct? could we use a bool? and chech its found (by key) and true?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I just interested in whether or not the key is present. I think it is not needed to use boolean. An advantage of usingstruct{} here is that its size is also zero according to this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i need a 💡 for reacting , thanks! learned something today

return nil, err
}
for _, path := range packagePaths {
jobChan <- packageJob{position: count, path: path}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the paths to keep the original positions? 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to keep the same ordering that it was now, to ensure there are no changes in the list generated.

@mrodm
Copy link
Copy Markdown
Contributor Author

mrodm commented Nov 26, 2025

Some benchmarks run with a small subset of packages from testdata folder using the benchmark added in c9eff98:

  • main vs loading in parallel using runtime.GOMAXPROCS(0)/2:
goos: linux
goarch: amd64
pkg: github.com/elastic/package-registry/packages
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
        │ sequential_load_packages.txt │ parallel_load_packages_half_cpus.txt │
        │            sec/op            │    sec/op     vs base                │
Init-16                    37.12m ± 7%    15.57m ± 3%  -58.06% (p=0.000 n=10)

        │ sequential_load_packages.txt │ parallel_load_packages_half_cpus.txt  │
        │             B/op             │     B/op       vs base                │
Init-16                   9.647Mi ± 0%   11.998Mi ± 0%  +24.37% (p=0.000 n=10)

        │ sequential_load_packages.txt │ parallel_load_packages_half_cpus.txt │
        │          allocs/op           │   allocs/op    vs base               │
Init-16                    197.6k ± 0%     199.2k ± 0%  +0.84% (p=0.000 n=10)
  • main vs loading in parallel using runtime.GOMAXPROCS(0):
goos: linux
goarch: amd64
pkg: github.com/elastic/package-registry/packages
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
        │ sequential_load_packages.txt │ parallel_load_packages_all_cpus.txt  │
        │            sec/op            │    sec/op     vs base                │
Init-16                    37.12m ± 7%   18.29m ± 40%  -50.72% (p=0.000 n=10)

        │ sequential_load_packages.txt │  parallel_load_packages_all_cpus.txt  │
        │             B/op             │     B/op       vs base                │
Init-16                   9.647Mi ± 0%   12.217Mi ± 1%  +26.65% (p=0.000 n=10)

        │ sequential_load_packages.txt │ parallel_load_packages_all_cpus.txt │
        │          allocs/op           │  allocs/op    vs base               │
Init-16                    197.6k ± 0%    199.4k ± 0%  +0.90% (p=0.000 n=10)

This shows that there is an improvement in speed (secs/op), but as it must read more packages in parallel, there is also an increase in memory related to the number of CPUs used.

@mrodm mrodm force-pushed the concurrent-load-packages-fsys branch from 5128360 to 01407b7 Compare November 26, 2025 16:23
@mrodm
Copy link
Copy Markdown
Contributor Author

mrodm commented Nov 26, 2025

Looking at this error: https://buildkite.com/elastic/package-registry/builds/1528#019ac11a-cfdc-4fb0-9eeb-024454995a6b/118-1448

It happens some time when running go test -count=1 ./... -v

@mrodm
Copy link
Copy Markdown
Contributor Author

mrodm commented Nov 26, 2025

After 9241261 , I re-run the benchmark and this is the result using all the CPUs available:

goos: linux
goarch: amd64
pkg: github.com/elastic/package-registry/packages
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
        │ sequential_load_packages.txt │ parallel_load_packages_all_cpus_duplicates.txt │
        │            sec/op            │         sec/op          vs base                │
Init-16                    37.12m ± 7%              16.05m ± 1%  -56.77% (p=0.000 n=10)

        │ sequential_load_packages.txt │ parallel_load_packages_all_cpus_duplicates.txt │
        │             B/op             │          B/op           vs base                │
Init-16                   9.647Mi ± 0%            12.177Mi ± 0%  +26.23% (p=0.000 n=10)

        │ sequential_load_packages.txt │ parallel_load_packages_all_cpus_duplicates.txt │
        │          allocs/op           │        allocs/op         vs base               │
Init-16                    197.6k ± 0%               199.3k ± 0%  +0.89% (p=0.000 n=10)

It follows the same pattern as in #1489 (comment)

Comment on lines -375 to -381
if _, found := packagesFound[key]; found {
i.logger.Debug("duplicated package",
zap.String("package.name", p.Name),
zap.String("package.version", p.Version),
zap.String("package.path", p.BasePath))
continue
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This process is now moved out of this loop to ensure the same ordering as before is kept.
Considering that the same package is defined in different paths, performing the load of package concurrently could lead to load first into the array the package from the second path but not the package from the first path.

The current implementation loads all the packages in the expected position (as it was before). And then once finished this loop, duplicated packages are removed following the order (keeping the first packages defined).

if err != nil {
return nil, fmt.Errorf("loading package failed (path: %s): %w", path, err)
}
taskPool := workers.NewTaskPool(runtime.GOMAXPROCS(0))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the number of workers is hardcoded to be GOMAXPROCS number.

Should it be added some flag to be able to override this setting ? Maybe we could keep it as is for now. WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what i've read, this default has been "checked" at 1.25 making it more flexible, so it will take whatever is able to take, even in containers (previous 1.25 it took the machine limit, now it takes the cpu limit of the container) https://go.dev/blog/container-aware-gomaxprocs

given this, i would leave it as hardcoded GOMAXPROCS to have the max possible. perhaps we could monitor and change this in the future if it takes too many resources.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although registry is still at 1.24 so this number is taking the cpu available on the machine when running inside a container 🤔 so until we upgrade maybe we have to limit as an option :D

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Binaries are built with the version set in .go-version and currently it is 1.25.3.

However our go.mod is still pointing to 1.24.

go 1.24.0

Just in case, I've added a new flag to be able to override this value.
By default, it will be runtime.GOMAXPROCS(0).

@mrodm mrodm changed the title [PoC] Load in parallel packages from filesystem Load packages concurrently in filesystem indexers Nov 27, 2025
@mrodm mrodm marked this pull request as ready for review November 27, 2025 13:15
@mrodm mrodm requested a review from a team as a code owner November 27, 2025 13:15
return nil, err
}

// Remove duplicates while preserving filesystem discovery order.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is it needed to preserve filesystem order? It could be quite random.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be re-phrased. At least, it needs to be kept the same ordering of appearance of packages depending on which path they appear.
Those packages defined in the first path set by the user have precedence if they also are defined in the following paths defined by the user.

There is also a test that checks that too:

"title": "Multi Version Second with the same version! This one should win, because it is first.",

From this test:

for _, test := range tests {
t.Run(test.PackageName+"/"+test.PackageVersion, func(t *testing.T) {
packageEndpoint := "/package/" + test.PackageName + "/" + test.PackageVersion + "/"
fileName := filepath.Join("package", test.PackageName, test.PackageVersion, "index.json")
runEndpoint(t, packageEndpoint, packageIndexRouterPath, fileName, packageIndexHandler)
})
}

Updated comment here e9ca9a6

flag.StringVar(&proxyTo, "proxy-to", "https://epr.elastic.co/", "Proxy-to endpoint")

flag.BoolVar(&packagePathsEnableWatcher, "package-paths-enable-watcher", false, "Enable file system watcher for package paths to automatically detect new packages.")
flag.IntVar(&packagePathsWorkers, "package-paths-workers", runtime.GOMAXPROCS(0), "Number of workers to use for reading packages concurrently from the configured paths. Default is the number of CPU cores returned by GOMAXPROCS.")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested using flag and environment variable

./package-registry -log-level debug -package-paths-workers=5
EPR_PACKAGE_PATHS_WORKERS=2 ./package-registry -log-level debug

@mrodm mrodm requested a review from jsoriano November 27, 2025 19:26
@mrodm mrodm force-pushed the concurrent-load-packages-fsys branch from d182ee8 to fcb290e Compare November 27, 2025 19:32
Comment on lines +407 to +408
logger.Debug("Using workers to read packages from package paths", zap.Int("workers", fsOptions.PathsWorkers))
logger.Debug("Watching package paths for changes", zap.Bool("enabled", fsOptions.EnablePathsWatcher))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to show in debug level the values used by the service. WDYT ?

Examples of logs shown:

{
  "log.level": "debug",
  "@timestamp": "2025-11-27T20:17:56.815+0100",
  "log.origin": {
    "function": "main.initIndexer",
    "file.name": "package-registry-load-packages-fsys/main.go",
    "file.line": 407
  },
  "message": "Using workers to read packages from package paths",
  "service.name": "package-registry",
  "service.version": "1.33.1",
  "workers": 16,
  "ecs.version": "1.6.0"
}
{
  "log.level": "debug",
  "@timestamp": "2025-11-27T20:17:56.815+0100",
  "log.origin": {
    "function": "main.initIndexer",
    "file.name": "package-registry-load-packages-fsys/main.go",
    "file.line": 408
  },
  "message": "Wathching package paths for changes",
  "service.name": "package-registry",
  "service.version": "1.33.1",
  "enabled": false,
  "ecs.version": "1.6.0"
}

@elasticmachine
Copy link
Copy Markdown

💚 Build Succeeded

History

cc @mrodm

@mrodm mrodm merged commit d659d7f into elastic:main Nov 28, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants