-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(cargo-package): match certain path prefix with pathspec #14997
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
`check_repo_state` checks the entire git repo status. This is usually fine if you have only a few packages in a workspace. For huge monorepos, it may hit performance issues. For example, on awslabs/aws-sdk-rust@2cbd34d the workspace has roughly 434 members to publish. `git ls-files` reported us 204379 files in this Git repository. That means git may need to check status of all files 434 times. That would be `204379 * 434 = 88,700,486` checks! Moreover, the current algorithm is finding the intersection of `PathSource::list_files` and `git status`. It is an `O(n^2)` check. Let's assume files are evenly distributed into each package, so roughly 470 files per package. If we're unlucky to have some dirty files, say 100 files. We will have to do `470 * 100 = 47,000` times of path comparisons. Even worse, because we `git status` everything in the repo, we'll have to it for all members, even when those dirty files are not part of the current package in question. So it becomes `470 * 100 * 434 = 20,398,000`! Instead of comparing with the status of the entire repository, this patch use the magic pathspec[1] to tell git only reports paths that match a certain path prefix. This wouldn't help the `O(n^2)` algorithm, but at least it won't check dirty files outside the current package. Also, we don't `git status` against entire git worktree/index anymore. [1]: https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
rustbot
added
A-git
Area: anything dealing with git
Command-package
S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
Jan 1, 2025
epage
reviewed
Jan 13, 2025
epage
reviewed
Jan 13, 2025
epage
approved these changes
Jan 13, 2025
bors
added a commit
to rust-lang-ci/rust
that referenced
this pull request
Jan 18, 2025
Update cargo 13 commits in 088d496082726091024f1689c124a0c3dccbd775..045bf21b36a2e1f3ed85e38278d1c3cc4305e134 2025-01-10 20:10:21 +0000 to 2025-01-17 14:59:36 +0000 - created a function for user defined aliases (rust-lang/cargo#15076) - took the functionality of the third party subcommand from the list_commands function (rust-lang/cargo#15075) - fix: wrong concat and field name (rust-lang/cargo#15074) - fix(publish): Report all unpublishable packages (rust-lang/cargo#15070) - docs(cargo-clippy): correct typo (rust-lang/cargo#15072) - docs(cargo-package): alwasy include the lockfile (rust-lang/cargo#15067) - docs(ref): Deprecate 'package.authors' (rust-lang/cargo#15068) - fix(build-std): parse as comma-separated list (rust-lang/cargo#15065) - Fix benchsuite issue with newer versions of git (rust-lang/cargo#15069) - Document that cargo automatically registers variables used in env! macro to trigger rebuilds (rust-lang/cargo#15062) - perf(cargo-package): match certain path prefix with pathspec (rust-lang/cargo#14997) - Clarify note in example (rust-lang/cargo#15054) - chore(deps): update msrv (3 versions) to v1.82 (rust-lang/cargo#15050)
bors
added a commit
to rust-lang-ci/rust
that referenced
this pull request
Jan 18, 2025
Update cargo 13 commits in 088d496082726091024f1689c124a0c3dccbd775..045bf21b36a2e1f3ed85e38278d1c3cc4305e134 2025-01-10 20:10:21 +0000 to 2025-01-17 14:59:36 +0000 - created a function for user defined aliases (rust-lang/cargo#15076) - took the functionality of the third party subcommand from the list_commands function (rust-lang/cargo#15075) - fix: wrong concat and field name (rust-lang/cargo#15074) - fix(publish): Report all unpublishable packages (rust-lang/cargo#15070) - docs(cargo-clippy): correct typo (rust-lang/cargo#15072) - docs(cargo-package): alwasy include the lockfile (rust-lang/cargo#15067) - docs(ref): Deprecate 'package.authors' (rust-lang/cargo#15068) - fix(build-std): parse as comma-separated list (rust-lang/cargo#15065) - Fix benchsuite issue with newer versions of git (rust-lang/cargo#15069) - Document that cargo automatically registers variables used in env! macro to trigger rebuilds (rust-lang/cargo#15062) - perf(cargo-package): match certain path prefix with pathspec (rust-lang/cargo#14997) - Clarify note in example (rust-lang/cargo#15054) - chore(deps): update msrv (3 versions) to v1.82 (rust-lang/cargo#15050)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-git
Area: anything dealing with git
Command-package
S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR try to resolve?
This revives #14962. See benchmark chart in #14962 (comment).
#14962 was closed because we found more bugs in
cargo package
, and #14962 could potentially make them even harder to fix. Two of them have been fixed so this is good to ship IMO with its own good.An improvement #14955.
check_repo_state
checks the entire git repo status. This is usually fine if you have only a few packages in a workspace.For huge monorepos, it may hit performance issues.
For example,
on awslabs/aws-sdk-rust@2cbd34d
the workspace has roughly 434 members to publish.
git ls-files
reported us 204379 files in this Git repository. That means git may need to check status of all files 434 times. That would be204379 * 434 = 88,700,486
checks!Moreover, the current algorithm is finding the intersection of
PathSource::list_files
andgit status
.It is an
O(n^2)
check.Let's assume files are evenly distributed into each package, so roughly 470 files per package.
If we're unlucky to have some dirty files, say 100 files. We will have to do
470 * 100 = 47,000
times of path comparisons.Even worse, because we
git status
everything in the repo, we'll have to it for all members,even when those dirty files are not part of the current package in question. So it becomes
470 * 100 * 434 = 20,398,000
!Solution
Instead of comparing with the status of the entire repository, this patch use the magic pathspec1 to tell git only reports paths that match a certain path prefix.
This wouldn't help the
O(n^2)
algorithm,but at least it won't check dirty files outside the current package. Also, we don't
git status
against entire git worktree/index anymore.How should we test and review this PR?
Run this command against awslabs/aws-sdk-rust@2cbd34d,
and see if it is getting better.
I've verified checksums of
.crate
files generated from master (d85d761) and this commit (3dabdcd). They are the same.Additional information
There are some other alternatives, like making
PathSource::list_files
additionally reports dirty files. While we already have rooms to do it, this approach should be the most straightforward one at this moment.Some other approaches like
--no-vcs
to skip vcs at allO(n^2)
algorithm