Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize RepoMappingManifestAction #20091

Closed
wants to merge 2 commits into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Nov 8, 2023

The following optimizations reduce the time spent writing the repo mapping manifest from >6s to <80ms for a test referencing each of ~4,000 repos created by a module extension:

  • The set of repository names appearing in runfiles paths is only constructed once rather than for each repo by moving the build() call out of a closure.
  • The relevant entries per repository are now sorted after filtering out those for repos that don't contribute runfiles.
  • Crucially, the relevant mapping entries per repository are now cached per instance of RepositoryMapping#entries(). Since extension repos all share the same instance due to interning, this reduces the complexity from quadratic to linear in the number of extension repos.

The following optimizations reduce the time spent writing the repo
mapping manifest from >6s to <80ms for a test referencing each of
~4,000 repos created by a module extension:

* The set of repository names appearing in runfiles paths is only
  constructed once rather than for each repo by moving the `build()`
  call out of a closure.
* The relevant entries per repository are now sorted after filtering
  out those for repos that don't contribute runfiles.
* Crucially, the relevant mapping entries per repository are now cached
  per instance of `RepositoryMapping#entries()`. Since extension repos
  all share the same instance due to interning, this reduces the
  complexity from quadratic to linear in the number of extension repos.
@fmeum fmeum marked this pull request as ready for review November 8, 2023 08:51
@fmeum fmeum requested a review from a team as a code owner November 8, 2023 08:51
@fmeum fmeum requested review from sdtwigg and Wyverald and removed request for a team and sdtwigg November 8, 2023 08:51
@github-actions github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Configurability platforms, toolchains, cquery, select(), config transitions labels Nov 8, 2023
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 8, 2023

Turns out that the action itself is really fast now, but the key computation is still slow. I will see whether that can be fixed without memory overhead.

@fmeum
Copy link
Collaborator Author

fmeum commented Nov 8, 2023

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Nov 8, 2023
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 8, 2023

I pushed a commit that also caches the key computation. The first computation is actually still a bit slower than running the action, but subsequent evaluations are much faster (~1-2ms), leveraging both the nested set fingerprint cache as well as the new repo mapping fingerprint cache.

Here is a profile of the reproducing repo (https://github.com/DavidZbarsky-at/repo-mapping-manifest-repro) with //:slow_test duplicated 20 times:
after.json.gz

@keertk
Copy link
Member

keertk commented Nov 8, 2023

@bazel-io fork 7.0.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Nov 8, 2023
Copy link
Member

@Wyverald Wyverald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!!

@Wyverald Wyverald added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Nov 8, 2023
@fmeum fmeum requested a review from Wyverald November 8, 2023 20:08
@fmeum fmeum force-pushed the faster-repo-mapping-manifest branch from 7684474 to c6e3efb Compare November 8, 2023 20:08
@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Nov 10, 2023
bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Nov 10, 2023
The following optimizations reduce the time spent writing the repo mapping manifest from >6s to <80ms for a test referencing each of ~4,000 repos created by a module extension:

* The set of repository names appearing in runfiles paths is only constructed once rather than for each repo by moving the `build()` call out of a closure.
* The relevant entries per repository are now sorted after filtering out those for repos that don't contribute runfiles.
* Crucially, the relevant mapping entries per repository are now cached per instance of `RepositoryMapping#entries()`. Since extension repos all share the same instance due to interning, this reduces the complexity from quadratic to linear in the number of extension repos.

Closes bazelbuild#20091.

PiperOrigin-RevId: 581128978
Change-Id: I946e7788b8538e84714cf25ece89a86edd0d6948
@fmeum fmeum deleted the faster-repo-mapping-manifest branch November 10, 2023 08:11
keertk pushed a commit that referenced this pull request Nov 10, 2023
The following optimizations reduce the time spent writing the repo
mapping manifest from >6s to <80ms for a test referencing each of ~4,000
repos created by a module extension:

* The set of repository names appearing in runfiles paths is only
constructed once rather than for each repo by moving the `build()` call
out of a closure.
* The relevant entries per repository are now sorted after filtering out
those for repos that don't contribute runfiles.
* Crucially, the relevant mapping entries per repository are now cached
per instance of `RepositoryMapping#entries()`. Since extension repos all
share the same instance due to interning, this reduces the complexity
from quadratic to linear in the number of extension repos.

Closes #20091.

Commit
305ab3b

PiperOrigin-RevId: 581128978
Change-Id: I946e7788b8538e84714cf25ece89a86edd0d6948

Co-authored-by: Fabian Meumertzheim <[email protected]>
@iancha1992
Copy link
Member

The changes in this PR have been included in Bazel 7.0.0 RC5. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=last_rc.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Configurability platforms, toolchains, cquery, select(), config transitions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants