Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading of a git source is slow due to always checking if the submodule is updated #14603

Closed
epage opened this issue Sep 26, 2024 · 5 comments · Fixed by #14605
Closed

Loading of a git source is slow due to always checking if the submodule is updated #14603

epage opened this issue Sep 26, 2024 · 5 comments · Fixed by #14605
Labels
A-git Area: anything dealing with git C-enhancement Category: enhancement Performance Gotta go fast! S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review

Comments

@epage
Copy link
Contributor

epage commented Sep 26, 2024

When you have some git dependencies or patches, loading them takes a long time

For the traces from #14238, notice how slow the first resolve is compared to the second. A lot of the time is taken up checking each git dependencies submodules
image

Note: this slow down is most relevant for tooling like rust-analyzer or rust-defined completions when it may be called in interactive contexts without any compilation happening or "cargo script" where cargo build is performed every time you run your script.

Related: #14395

@epage epage added A-git Area: anything dealing with git C-enhancement Category: enhancement Performance Gotta go fast! S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review labels Sep 26, 2024
@epage
Copy link
Contributor Author

epage commented Sep 26, 2024

@epage
Copy link
Contributor Author

epage commented Sep 26, 2024

When we GitDatabase::copy_into a checkout, we skip it if the checkout is "fresh"

// If the existing checkout exists, and it is fresh, use it.
// A non-fresh checkout can happen if the checkout operation was
// interrupted. In that case, the checkout gets deleted and a new
// clone is created.
let checkout = match git2::Repository::open(dest)
.ok()
.map(|repo| GitCheckout::new(self, rev, repo))
.filter(|co| co.is_fresh())
{
Some(co) => co,
None => GitCheckout::clone_into(dest, self, rev, gctx)?,
};

We then unconditionally check for submodules
checkout.update_submodules(gctx)?;

submoduels should be as immutable as the checkout and we should be able to guard the whole process with the freshness check.

The main risk in doing that is if you switch back and forth between old cargo and Cargo with the submodules covered by the freshness check and the submodule update is interrupted. New cargo will see that the freshness check passxes and assumes the submodule is valid and goes on its way. We could do extra fancy things with the file used for this like in #14395 but it doesn't seem worth it.

@epage
Copy link
Contributor Author

epage commented Sep 26, 2024

CC @osiewicz

@osiewicz
Copy link
Contributor

osiewicz commented Sep 27, 2024

It seems like old cargo will "heal" itself from the corrupted checkout, as it'll remove it and refetch everything. Given that, I would agree that we could just proceed as is? Alternatively, could we bundle submodule freshness check with the checkout freshness check?

@epage
Copy link
Contributor Author

epage commented Sep 27, 2024

It seems like old cargo will "heal" itself from the corrupted checkout, as it'll remove it and refetch everything. Given that, I would agree that we could just proceed as is? Alternatively, could we bundle submodule freshness check with the checkout freshness check?

The corner case is

  1. With old cargo
  2. See lack of freshness file
  3. Fetch commit
  4. Checkout commit
  5. Write freshness file
  6. Perform submodule update and get interrupted
  7. With new cargo
  8. See freshness file and assume commit is fetched, checked out, and submodules are updated, bypassing all other steps

This kind of quick back and forth is likely when working directly with Cargo and an IDE running Cargo in the background.

That said, I think we would be fine with this. This will be limited to a specific set of cargo versions run in a specific order, with an interruption. Over time, the combination of versions will become less likely.

If we did something else about this, some potential mitigations

  • For a transition period, always do a submodule update but write the file after it
  • Store in the file whether the submodule update happened
  • Use a second file this would cause problems with old Cargo's reading the check out iiuc

osiewicz added a commit to osiewicz/cargo that referenced this issue Nov 11, 2024
epage added a commit to osiewicz/cargo that referenced this issue Nov 11, 2024
github-merge-queue bot pushed a commit that referenced this issue Nov 11, 2024
Fixes #14603

### What does this PR try to resolve?

As is, we unconditionally validate freshness of the submodules of a
checkout, even though we could assume that a fresh checkout has to have
up-to-date submodules as well.

### How should we test and review this PR?

N/A

### Additional information

N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git C-enhancement Category: enhancement Performance Gotta go fast! S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants