Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse existing GIT checkouts if an update happens #4373

Closed
sdroege opened this issue Aug 7, 2017 · 5 comments
Closed

Reuse existing GIT checkouts if an update happens #4373

sdroege opened this issue Aug 7, 2017 · 5 comments
Labels
A-git Area: anything dealing with git

Comments

@sdroege
Copy link
Contributor

sdroege commented Aug 7, 2017

Currently cargo seems to do a full clone again when a GIT repository was updated, wasting quite a bit of bandwidth. It would be good if it could pass --reference to git with the previous version.

This is currently a bit problematic if you're on mobile data :)

@alexcrichton alexcrichton added the A-git Area: anything dealing with git label Aug 7, 2017
@ehuss
Copy link
Contributor

ehuss commented Mar 23, 2020

I'm not sure what the context of this issue is (even in 2017). Git dependencies are updated incrementally whenever they are updated.

Perhaps this issue was related to git submodules, which do re-clone each time. There is a more detailed issue tracking that in #7987, so I'm going to close this. If there is something else, feel free to re-open with more detail or file a new issue.

@ehuss ehuss closed this as completed Mar 23, 2020
@sdroege
Copy link
Contributor Author

sdroege commented Mar 23, 2020

I think this is still relevant, or something else is going on. When running cargo update on a crate that has git dependencies, it's very fast if the repo / target commit did not change. Otherwise it seems to clone the whole repo again, even if there were only small changes it takes quite a while for the whole dependency to be updated.

@ehuss
Copy link
Contributor

ehuss commented Mar 23, 2020

@sdroege is it a public repo that I can look at? Do you know roughly how large it is, and how much network traffic cargo update is using? How long does it typically take to update? Are there multiple copies of the repo in ~/.cargo/git/db/?

Can you run with the CARGO_LOG=cargo::sources::git::utils=trace environment variable set? There isn't much logging, but there are a few messages there that might illuminate what is going on.

It should do an incremental update, but perhaps there is something broken with how it is interacting with the repo? I'm also curious what kind of remote server it is.

@sdroege
Copy link
Contributor Author

sdroege commented Mar 23, 2020

@sdroege is it a public repo that I can look at?

Just a short reply for now, will get back with more details later once I have more time. But a good example would be the gtk-rs examples repo here: https://github.com/gtk-rs/examples/ . Whenever there is an update in any of the git dependencies (there are many), it takes quite a while to do cargo update. When there are no changes it's quite fast.

And there are lots of checkouts then, e.g. in ~/.cargo/git/checkouts/gtk-b89af3a825b1a0bb/ (that's gtk-rs/gtk) I have right now:

067fdc0/  1567929/  2b7af18/  4942479/  5e088bf/  69514f9/  8b9e632/  91968e9/  be2215b/  d98bfa5/  e4783ec/
0c77f7d/  201acd9/  380ff03/  4ece335/  61b2f4f/  6af60e0/  8ed8005/  970ad70/  c2c61ee/  dec1bfc/  f3201e9/
0dd5f7e/  292268a/  3b218be/  54d7de7/  64658cf/  7b2dbba/  8fcea3f/  bb52e13/  c5214a4/  e21124f/

Each a full clone of the repository at a specific commit.

@ehuss
Copy link
Contributor

ehuss commented Mar 23, 2020

It clones a separate copy on disk for each checkout, but they are sourced from the same local repository in the db folder. The network traffic should only do an incremental update into that db folder, so it should be relatively quick. Given the initial description, I was assuming this was related to network downloads. The gtk examples repo takes less than a second for me to check out separate copies.

I did do some tests with a roughly 450MB repo, and it was a little sluggish (about 12 seconds compared to command-line git taking about 5 seconds to locally clone). There is probably not much we can do about that directly, since it is managed by libgit2.

If you can reproduce problems where it is actually re-downloading the entire repo, or otherwise using excessive network, or takes an abnormally long amount of time to update, let me know!

Otherwise, #7150 is the meta-tracking issue for excessive cache usage. Cargo would probably need some tracking to know which projects are using various checkouts to know if it is ok to delete old ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git
Projects
None yet
Development

No branches or pull requests

3 participants