-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for garbage collection #12633
Comments
For when we get to From @bjorn3 at https://hachyderm.io/@bjorn3/111047792430714997
|
Quick scan of brew
One complaint that came up was " " |
From https://hachyderm.io/@[email protected]/111048319933010803
I'm assuming our global package cache is to help with CI but we should probably explicitly document out priority use cases. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Add cache garbage collection ### What does this PR try to resolve? This introduces a new garbage collection system which can track the last time files were used in cargo's global cache, and delete old, unused files either automatically or manually. ### How should we test and review this PR? This is broken up into a large number of commits, and each commit should have a short overview of what it does. I am breaking some of these out into separate PRs as well (unfortunately GitHub doesn't really support stacked pull requests). I expect to reduce the size of this PR if those other PRs are accepted. I would first review `unstable.md` to give you an idea of what the user side of this looks like. I would then skim over each commit message to give an overview of all the changes. The core change is the introduction of the `GlobalCacheTracker` which is an interface to a sqlite database which is used for tracking the timestamps. ### Additional information I think the interface for this will almost certainly change over time. This is just a stab to create a starting point where we can start testing and discussing what actual user flags should be exposed. This is also intended to start the process of getting experience using sqlite, and getting some testing in real-world environments to see how things might fail. I'd like to ask for the review to not focus too much on bikeshedding flag names and options. I expect them to change, so this is by no means a concrete proposal for where it will end up. For example, the options are very granular, and I would like to have fewer options. However, it isn't clear how that might best work. The size-tracking options almost certainly need to change, but I do not know exactly what the use cases for size-tracking are, so that will need some discussion with people who are interested in that. I decided to place the gc commands in cargo's `cargo clean` command because I would like to have a single place for users to go for deleting cache artifacts. It may be possible that they get moved to another command, however introducing new subcommands is quite difficult (due to shadowing existing third-party commands). Other options might be `cargo gc`, `cargo maintenance`, `cargo cache`, etc. But there are existing extensions that would interfere with. There are also more directions to go in the future. For example, we could add a `cargo clean info` subcommand which could be used for querying cache information (like the sizes and such). There is also the rest of the steps in the original proposal at https://hackmd.io/U_k79wk7SkCQ8_dJgIXwJg for rolling out sqlite support. See #12633 for the tracking issue
In https://rust-lang.zulipchat.com/#narrow/stream/246057-t-cargo/topic/Stabilizing.20global.20cache.20tracking/near/422500781 I am proposing to stabilizing just the recording of the cache data as a first step. This doesn't enable automatic or manual gc. |
Stabilize global cache data tracking. This stabilizes the global cache last-use data tracking. This does not stabilize automatic or manual gc. Tracking issue: #12633 ## Motivation The intent is to start getting cargo to collect data so that when we do stabilize automatic gc, there will be a wider range of cargo versions that will be updating the data so the user is less likely to see cache misses due to an over-aggressive gc. Additionally, this should give us more exposure and time to respond to any problems, such as filesystem issues. ## What is stabilized? Cargo will now automatically create and update an SQLite database, located at `$CARGO_HOME/.global-cache`. This database tracks timestamps of the last time cargo touched an index, `.crate` file, extracted crate `src` directory, git clone, or git checkout. The schema for this database is [here](https://github.com/rust-lang/cargo/blob/a7e93479261432593cb70aea5099ed02dfd08cf5/src/cargo/core/global_cache_tracker.rs#L233-L307). Cargo updates this file on any command that needs to touch any of those on-disk caches. The testsuite for this feature is located in [`global_cache_tracker.rs`](https://github.com/rust-lang/cargo/blob/a7e93479261432593cb70aea5099ed02dfd08cf5/tests/testsuite/global_cache_tracker.rs). ## Stabilization risks There are some risks to stabilizing, since it commits us to staying compatible with the current design. The concerns I can think of with stabilizing: This commits us to using the database schema in the current design. The code is designed to support both backwards and forwards compatible extensions, so I think it should be fairly flexible. Worst case, if we need to make changes that are fundamentally incompatible, then we can switch to a different database filename or tracking approach. There are certain kinds of errors that are ignored if cargo fails to save the tracking data (see [`is_silent_error`](https://github.com/rust-lang/cargo/blob/64ccff290fe20e2aa7c04b9c71460a7fd962ea61/src/cargo/core/global_cache_tracker.rs#L1796-L1813)). The silent errors are only shown with --verbose. This should help deal with read-only filesystem mounts and other issues. Non-silent errors always show just a warning. I don't know if that will be sufficient to avoid problems. I did a fair bit of testing of performance, and there is a bench suite for this code, but we don't know if there will be pathological problems in the real world. It also incurs an overhead that all builds will have to pay for. I've done my best to ensure that this should be reliable when used on network or unusual filesystems, but I think those are still a high-risk category. SQLite should be configured to accommodate these cases, as well as the extensive locking code (which has already been enabled). A call for public testing was announced in December at https://blog.rust-lang.org/2023/12/11/cargo-cache-cleaning.html. At this time, I don't see any issues in https://github.com/rust-lang/cargo/labels/Z-gc that should block this step.
I would wager most rust users associate the words "garbage collection" with memory rather than cached files that have gone stale. It's unfortunate that the term is being overloaded here |
It depends. Some of us are familiar with the |
The feature is in development and how we present it to the user is not yet decided. In #13060, we are exploring how to present it in the CLI, including looking at prior art from other tools. |
Thanks for redirecting me to the naming discussion here @epage I think(?) one of the distinguishing characteristics of garbage collection implementations (whether they be for memory, git, nix, etc.) is that they remove things that are "unreachable" in some sense, and thus can be confidently disposed of as not used. That particular characteristic is specifically not true of this feature, as discussed in #13176. Having said that, in practice I'm skeptical if calling this feature "garbage collection" is actually going to confuse people. Nevertheless, it does seem like one of those "might as well be more accurate" kind of situations. So calling it "cache cleaning" or similar. |
I have proposed to stabilize the automatic side of this feature in #14287. |
Summary
Original proposal: https://hackmd.io/@rust-cargo-team/SJT-p_rL2Nightly: garbage collection
Implementation: #12634
Documentation: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#gc
Issues: Z-gc
The
-Zgc
flag enable garbage collection for deleting old, unused files in cargo's cache.Unresolved Issues
Future Extensions
No response
About tracking issues
Tracking issues are used to record the overall progress of implementation.
They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions.
A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature.
Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.
The text was updated successfully, but these errors were encountered: