-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
git submodules are not cached #7987
Comments
Hm, after looking into this closer, I think I understand it better. Bare repos cannot include submodules. I don't understand why (it seems like it could just have a I wonder if worktrees might be an option. The git-cli docs tell you NOT to do that (support is "incomplete"), so it may not be possible or too risky. Are there any other ideas on how to share submodules across multiple checkouts? |
I think that we may need to take over management of submodules away from git. Ideally they'd use the same caching/etc mechanism as main git repos, meaning we'd have entries in the database for submodules too. We'd then manually check out modules from the database into repositories or do something like a "git clone using hardlinks from that other path" or something like that, sort of how we checkout git repos from the db today which is in theory very fast. I think we probably rely too much on native git submodule management here right now which hinders the caching? I'd be hesitant to dip our toes too much into fancy features like worktrees personally |
TIL about |
I'm currently struggling with possibly the worst case of this issue. The gltf crate repo has the glTF-Sample-Models repo as a submodule. Across different projects I have 4 different git dependencies of this crate in use, which means that I have 4 different checkouts of the repo: Any fixes to this issue would be greatly appreciated. |
Having some kind of setting or environment variable to request a shallow clone might be a good stopgap solution. Not all repository hosts support shallow clones, but enough do for this to be useful (say, in CI). EDIT: I see #1171 talks about this for both top-level repositories and their submodules. |
Update `ots` to the latest release and vendor ots and all dependencies. Cargo build will recursively clone submodules for `git` dependencies. `ots` itself and some of its dependencies have quite large git repos, e.g. `ots` has 80MB of test font files. Vendoring the sources reduces the required network bandwidth and disk space usage greatly. See also rust-lang/cargo#7987 for more details on the effects on disk usage (not super relevant for us, since we rarely update our dependencies). Ideally each of the C dependencies would have their own crate, but that can be done later.
Problem
If a package has a git dependency with a large submodule, any change to the git repo that updates the submodule causes the entire submodule repo to be re-downloaded from scratch, and an entire separate copy is retained. This can be very expensive for both network download time and disk space.
Steps
rocksdb = {git = "https://github.com/tikv/rust-rocksdb.git", rev="fe7be35ba191684c989effdc6ee8e39a3978e650"}
cargo fetch
3cd18c44d160a3cdba586d6502d51b7cc67efc59
cargo fetch
5adf5b847e13cea2a59a1b4921aa5bf38591d1a3
cargo fetch
Possible Solution(s)
The repo in
git/db/…
should probably contain the submodule. Currently it appears that it checks out a fresh copy for every commit ingit/checkout/…
. I think it is because cargo is using Submodule::open here. I wonder if using Submodule::update would be the solution?The text was updated successfully, but these errors were encountered: