Skip to content

Stabilize automatic garbage collection. #14287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ehuss
Copy link
Contributor

@ehuss ehuss commented Jul 22, 2024

This proposes to stabilize automatic garbage collection of Cargo's global cache data in the cargo home directory.

What is being stabilized?

This PR stabilizes automatic garbage collection, which is triggered at most once per day by default. This automatic gc will delete old, unused files in cargo's home directory.

It will delete files that need to be downloaded from the network after 3 months, and files that can be generated without network access after 1 month. These thresholds are intended to balance the intent of reducing cargo's disk usage versus deleting too often forcing cargo to do extra work when files are missing.

Tracking of the last-use data is stored in a sqlite database in the cargo home directory. Cargo updates timestamps in that database whenever it accesses a file in the cache. This part is already stabilized.

This PR also stabilizes the gc.auto.frequency configuration option. The primary use case for when a user may want to set that is to set it to "never" to disable gc should the need arise to avoid it.

When gc is initiated, and there are files to delete, there will be a progress bar while it is deleting them. The progress bar will disappear when it finishes. If the user runs with -v verbose option, then cargo will also display which files it deletes.

If there is an error while cleaning, cargo will only display a warning, and otherwise continue.

What is not being stabilized?

The manual garbage collection option (via cargo clean gc) is not proposed to be stabilized at this time. That still needs some design work. This is tracked in #13060.

Additionally, there are several low-level config options currently implemented which define the thresholds for when it will delete files. I think these options are probably too low-level and specific. This is tracked in #13061.

Garbage collection of build artifacts is not yet implemented, and tracked in #13136.

Background

This feature is tracked in #12633 and was implemented in a variety of PRs, primarily #12634.

The tests for this feature are located in https://github.com/rust-lang/cargo/blob/master/tests/testsuite/global_cache_tracker.rs.

Cargo started tracking the last-use data on stable via #13492 in 1.78 which was released 2024-05-02. This PR is proposing to stabilize automatic deletion in 1.82 which will be released in 2024-10-17.

Risks

Users who frequently use versions of Rust older than 1.78 will not have the last-use data tracking updated. If they infrequently use 1.78 or newer, and use the same cache files, then the last-use tracking will only be updated by the newer versions. If that time frame is more than 1 month (or 3 months for downloaded data), then cargo will delete files that the older versions are still using. This means the next time they run the older version, it will have to re-download or re-extract the files.

The effects of deleting cache data in environments where cargo's cache is modified by external tools is not fully known. For example, CI caching systems may save and restore cargo's cache. Similarly, things like Docker images that try to save the cache in a layer, or mount the cache in a read-only filesystem may have undesirable interactions.

The once-a-day performance hit might be noticeable to some people. I've been using this for several months, and almost never notice it. However, slower systems, or situations where there is a lot of data to delete might take a while (on the order of seconds hopefully).

@ehuss ehuss added T-cargo Team: Cargo Z-gc Nightly: garbage collection labels Jul 22, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jul 22, 2024

r? @weihanglo

rustbot has assigned @weihanglo.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-documenting-cargo-itself Area: Cargo's documentation S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 22, 2024
@ehuss
Copy link
Contributor Author

ehuss commented Jul 22, 2024

@rfcbot fcp merge

@rfcbot
Copy link
Collaborator

rfcbot commented Jul 22, 2024

Team member @ehuss has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period An FCP proposal has started, but not yet signed off. disposition-merge FCP with intent to merge labels Jul 22, 2024
@epage
Copy link
Contributor

epage commented Jul 22, 2024

@rfcbot concern field-name

Something that came up on the tracking issue is whether people will get confused with the gc name (#12633 (comment)). I suspect that won't be a problem but I want to make sure we acknowledge it and agree it isn't a problem first.

I am also somewhat concerned about how to organize and name the config for if/when we get to GC within target directories. I wonder if this would affect the top-level table name (e.g. calling it global-cache). In other ways, only stabilizing auto gives us a lot of leeway in what it does.

(sorry, thought of this after the conversation about stabilizing this)

@cessen
Copy link

cessen commented Jul 23, 2024

Something that came up on the tracking issue is whether people will get confused with the gc name

I agree that this isn't likely to be an issue in practice.

However, I also feel like we might as well be more accurate and call it something like "cache cleaning", which is what it actually is. As I mentioned in the other thread, garbage collection is usually related to some concept of reachability and resultant high confidence that something is no longer used. This doesn't reflect how this feature actually works, or ever reasonably could work.

And unless I'm missing something here (which is always possible), I don't think there's any cost to simply naming this feature more accurately, aside from taking a bit of time to decide on that name.

@elenakrittik
Copy link

Sorry if this is the wrong place to ask, but will there be a way to disable this behaviour?

@Skgland
Copy link

Skgland commented Jul 23, 2024

Sorry if this is the wrong place to ask, but will there be a way to disable this behaviour?

I think the PR description already answers this:

This PR also stabilizes the gc.auto.frequency configuration option. The primary use case for when a user may want to set that is to set it to "never" to disable gc should the need arise to avoid it.

@ehuss
Copy link
Contributor Author

ehuss commented Jul 23, 2024

Sure, would be happy to rename it. I opened #14292 to have that suggestion, please express your thoughts over there.


* `"never"` --- Never deletes old files.
* `"always"` --- Checks to delete old files every time Cargo runs.
* An integer followed by "seconds", "minutes", "hours", "days", "weeks", or "months" --- Checks to delete old files at most the given time frame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“months” is an approximate number.

cargo/src/cargo/core/gc.rs

Lines 373 to 381 in ea14e86

let factor = match right {
"second" | "seconds" => 1,
"minute" | "minutes" => 60,
"hour" | "hours" => 60 * 60,
"day" | "days" => 24 * 60 * 60,
"week" | "weeks" => 7 * 24 * 60 * 60,
"month" | "months" => 2_629_746, // average is 30.436875 days
_ => return None,
};

I can foresee someone will interpret months as monthly and think it will be clean on the same day when set, while it is not true especially in February. I don't think this is a thing we can't change after stabilization. Just calling it out if someone disagrees.

Copy link

@teohhanhui teohhanhui Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just remove months? It'd be counter-intuitive to anyone trying to use it... The user can already specify something like 180 days for ~6 months, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For myself, being able to say "6 months" is much easier than calculating out the number of days and reading the number of days.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epage But would anyone expect it to be 182.62125 days? Principle of least surprise...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an inherent approximation when giving a unit. The larger the unit, the larger the approximation. If you say "6 months", you shouldn't care whether thats 180, 186, 182, or 182.62125.

btw laughing emoji's in a technical discussion like this come across as rude.

Copy link

@teohhanhui teohhanhui Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw laughing emoji's in a technical discussion like this come across as rude.

Sigh... Here we go again. Intent does not carry across text (or emojis), so please don't jump to conclusions like that.

I was not even disagreeing with what you said. Just pointing out that it'd be surpirising for the user, as the OP of this thread has already pointed out (a different surprising aspect).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epage But would anyone expect it to be 182.62125 days? Principle of least surprise...

168 would be surprising; a bit over 180 not so much. The kind of surprise we're trying to avoid is purging data way earlier than expected.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For myself, being able to say "6 months" is much easier than calculating out the number of days and reading the number of days.

If the user goes to the trouble of customizing this in the config, I don't think having to calculate the number of days would be much of an extra hurdle. In effect, removing months would just simplify things with no real downside (and prevent future support questions where people are arguing over this again / trying to figure out what's going on with this approximation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A month equals 30 days is easier for me to accept

@ehuss ehuss force-pushed the stabilize-automatic-gc branch from 8dfac6f to c2af991 Compare November 8, 2024 15:00
@bors
Copy link
Contributor

bors commented Nov 9, 2024

☔ The latest upstream changes (presumably #14388) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented Nov 25, 2024

☔ The latest upstream changes (presumably 4c39aaf) made this pull request unmergeable. Please resolve the merge conflicts.

@rustbot

This comment has been minimized.

@ehuss ehuss force-pushed the stabilize-automatic-gc branch 3 times, most recently from 65f219a to 332a16d Compare March 31, 2025 15:11
@epage
Copy link
Contributor

epage commented Mar 31, 2025

@rfcbot resolve field-name

@rfcbot rfcbot added final-comment-period FCP — a period for last comments before action is taken and removed proposed-final-comment-period An FCP proposal has started, but not yet signed off. labels Mar 31, 2025
@rfcbot
Copy link
Collaborator

rfcbot commented Mar 31, 2025

🔔 This is now entering its final comment period, as per the review above. 🔔

@ijackson
Copy link
Contributor

ijackson commented Apr 8, 2025

I am very late to this party, but:

Has consideration been given to using the file access time (on Unix at least) to assist with avoiding deleting things that older cargos have accessed recently?

I looked for related search terms here and in #12633 and didn't find the answer.

@rfcbot rfcbot added finished-final-comment-period FCP complete to-announce and removed final-comment-period FCP — a period for last comments before action is taken labels Apr 10, 2025
@rfcbot
Copy link
Collaborator

rfcbot commented Apr 10, 2025

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@weihanglo
Copy link
Member

weihanglo commented Apr 10, 2025

@ijackson

Cargo started collecting last use data since 1.78.0, which was almost one year ago. Personally I feel like that is long enough for us to ship this feature in 1.88, which is targeted at 2025-05-09. Adding access time assistant may complicate the mechanism a bit, like figuring out which file's atime Cargo should look at. Also, atime is generally not that reliable I guess? There are mount options like noatime and relatime change the behavior a lot, and
for mount(8) it says:

Since Linux 2.6.30, the kernel defaults to the behavior
provided by this option (unless noatime was specified), and
the strictatime option is required to obtain traditional
semantics. In addition, since Linux 2.6.30, the file's last
access time is always updated if it is more than 1 day old.

Does that mean by default atime is never older than 1 day? I am not an expert of file system or kernel, but it sounds that atime is always no less than one day old. If that is true, atime is a bit too unreliable.

@ijackson
Copy link
Contributor

@ijackson

Cargo started collecting last use data since 1.78.0, which was almost one year ago. Personally I feel like that is long enough for us to ship this feature in 1.88,

The difficulty is that, as I understand it, this feature is poorly compatible with continued use of older versions, that don't record last usage time. Those older versions might end up re-downloading things, which (depending on circumstances) could be a serious regression.

I think therefore that this is a breaking change.

I don't think it's good enough to say "it's been a year". I need to use Rust 1.31 for MSRV testing, for example, and simply breaking it (or making it behave much worse) is not OK.

Happily I think we can fix this feature, at least on Unix, if we use atimes.

Also, atime is generally not that reliable I guess? There are mount options like noatime and relatime change the behavior a lot, and for mount(8) it says:

Yes, relatime is the default nowadays. relatime is a clever algorithm which ensures that atimes can be used for their intended purpose without generating excessive disk traffic.

The algorithm ensures than the atime is never wrong by more than a day. (And it is never too recent.)

For a cleanup algorithm like cargo's, this is perfect. We should avoid deleting things whose atimes are more recent than the timestamp recorded in the database.

Those files are ones which some other program (probably, an old version of cargo) has accessed without making a database record.

I am not an expert of file system or kernel, but it sounds that atime is always no less than one day old. If that is true, atime is a bit too unreliable.

atime is not unreliable. On normally-configured systems, it is excellent for this purpose. (Of course people can configure their system in weird ways, but I think we can reasonably tell them that the consequences are on them.)

@ijackson
Copy link
Contributor

For a cleanup algorithm like cargo's, this is perfect. We should avoid deleting things whose atimes are more recent than the timestamp recorded in the database.

I should clarify the effect of the relatime imprecision on such an algorithm. The effect is as follows. Suppose the cutoff for deleting old unused data is 30 days, meaning we don't delete things whose recorded-in-the-database last access is, or whose atime is, is <30d ago.

Then with relatime, files which were accessed by old cargo 29 days ago might have an atime which is 30 days ago, and might be deleted. That's OK - it's just a 1-day imprecision in the cutoff, which was a tuning parameter anyway.

Files only accessed more than 30 days ago will definitely have an atime at least 30 days old, so are fair game for deletion. Files accessed less than 29 days ago will have an atime of less than 30 days, so will be retained (even if the access was by old cargo and went unrecorded).

This would be a pretty good set of behaviours.

On the need for the database

On Unix with working atimes, the database of access times is not necessary. But, the database is necessary if atimes are completely disabled. And determining if atimes are working is nontrivial.

So we need both the database (for systems with noatime) and the atime (for systems which alternately run new cargo, and old, pre-database, versions of cargo).

With my suggestion, systems which alternate old and new cargo, and disable atime, don't work well. That seems fair to me, because noatime is not a reasonable configuration choice for a general purpose computer whose role involves subsystems that perform cacheing.

@weihanglo
Copy link
Member

Really appreciate the detailed reply!

…this feature is poorly compatible with continued use of older versions, that don't record last usage time. Those older versions might end up re-downloading things…

Just curious whether setting cache.auto-clean-frequency="never" config value or CARGO_CACHE_AUTO_CLEAN_FREQUENCY=never is a viable solution on your side, and how hard it is in general.

@ijackson
Copy link
Contributor

Really appreciate the detailed reply!
...
Just curious whether setting cache.auto-clean-frequency="never" config value or CARGO_CACHE_AUTO_CLEAN_FREQUENCY=never is a viable solution on your side, and how hard it is in general.

That would restore the behaviour to the previous one. So in some sense it eliminates the regression. But it also means some other cleanup of old things is needed, and I don't believe there is any other sensible mechanism. (In practice a find | xargs rm rune might work I suppose.)

I'm getting a feeling that the cargo team feel the "switch cargo versions" use case is a minority one. I think this is a misconception. Most serious Rust developers who maintain or contribute to a variety of packages are going to be dealing with different MSRVs and/or different nightly versions (eg for cargo expand tests).

It would seem unbalanced, to me, to provide such poor support to those users, while having spent a great deal of effort (the explicit usage tracking) to be able to fully support filesystems mounted noatime, which I believe will be very rare.

@joshtriplett
Copy link
Member

joshtriplett commented Apr 14, 2025

I'm getting a feeling that the cargo team feel the "switch cargo versions" use case is a minority one.

It absolutely is, and we have data to back that up: https://blog.rust-lang.org/2025/02/13/2024-State-Of-Rust-Survey-results.html and in particular https://blog.rust-lang.org/images/2025-02-13-rust-survey-2024/which-version-of-rust-do-you-use.png

In particular, using a version of stable older than a year old (1.75 was a year old at the time of the survey) was 2.0%.

That doesn't make it an invalid use case, just a demonstrably uncommon one.

You may be overestimating the number of people who actively provide support and testing for MSRVs that old. And, for that matter, even of those who do, how many test it locally rather than doing so in CI.

That said, I would say that if anything, the Cargo team devotes substantially more than 2% of its time and effort towards reliably supporting older versions of Cargo, and thinking about the impact of changes on older versions of Cargo.

atime is not unreliable.

I don't think it's reliable enough for us to rely on by default, for multiple reasons.

First, it's not reliably available or updated on all the systems Cargo runs on.

On Windows, atime does not seem to be reliably available. Many versions of Windows disabled access times by default, or (later) did so by default on large filesystems. And even on versions that do enable it by default on filesystems of all sizes, it sounds like it may be inconsistent and somewhat unreliable because it's stored in two different places and not kept in sync by different APIs that work with it.

On macOS, I haven't managed to find much reliable data (which is a problem in itself), but I've found at least some mentions that by default it doesn't update atime unless it's older than mtime. I haven't found any indication that it also has the relatime-like behavior of also updating if older than some threshold. That doesn't mean it doesn't have that behavior, but I haven't found any evidence one way or another.

And, to quote a comment in Cargo's caching implementation, "People tend to run cargo in all sorts of strange environments".

Leaving portability aside, the other failure mode of using atime, even as a secondary check (e.g. "if atime is less than N days old don't GC it even if the database says otherwise"), is that too many things may end up updating it. On systems other than Linux, it's common to have background indexing or scanning mechanisms; a system with those running might never do garbage collection if we pay attention to atime. Even on Linux, a recursive grep or other search would update atime.[1]

[1] Relatedly, one good reason to use noatime is that even with relatime, grepping a day-old source tree requires a hundred thousand write operations.

Failing to garbage collect because something touched the atime is not a trivial failure mode. It might seem like failure in that direction is always safer than failure in the direction of GCing too much, but that's not necessarily the case. We see a lot of people complaining about how much disk space Cargo takes; it's a very common complaint about working in Rust. If we do something by default that causes a substantial proportion of systems to fail to reliably GC, we'll have made the feature sufficiently less useful that we're likely to have failed to address the complaint people have.

EDIT: the next two paragraphs have a caveat, that if you've only ever handled a file with older Cargo it will never have a database entry, so it may be removed immediately. We may want to fix that to take mtime into account.

Leaving all that aside, the proposed behavior of Cargo is "It will delete files that need to be downloaded from the network after 3 months". So, to the best of my understanding, in order for this to cause a problem for people using older versions of Cargo, there would have to be items in Cargo's home directory that are exclusively accessed by an old version of Cargo, for 3 months, while never being accessed by any new version of Cargo. With new versions of Cargo that have the MSRV-aware resolver, that seems much more likely, if some older version of a crate were used only by older Cargo; however, for versions of Cargo older than 1.78 (where we introduced the access tracking), that seems somewhat less likely, unless you have crates you exclusively build using old Cargo and never test with any recent Cargo.

Even then, the total impact of this would be that after 3 months some files need re-downloading, and then the issue won't arise again for 3 more months. And note that GC does not happen automatically if you build with --offline or --frozen; it only occurs when running with network access assumed.

So, it's reasonable for us to consider:

  • What is the proportion of systems that would fail to GC correctly if we look at atimes, vs the proportion of systems that want atimes because of older versions of Cargo? To what degree would adding such a feature cause us to continue to get widespread complaints about Rust using substantial amounts of disk space?
  • If, hypothetically, we added a check for atimes, would we need to add and maintain a configuration option to ignore atimes, to deal with systems where it isn't desirable? I expect that we probably would.
  • Would we end up turning the check for atimes on or off by default?
    • We could, in theory, check atimes by default only on Linux, or only on non-macOS UNIX targets, but such inconsistencies tend to create problems of their own.
  • If it defaulted to off, is it worth having at all, versus having the option to disable GC entirely using cache.auto-clean-frequency="never"?

I would venture that the answers to the above questions are:

  • More systems would be negatively impacted by having atimes on than by having them off.
  • If we add this, I do think we'd need an option to turn it off.
  • If that option has the same value on all targets, I think it should be off by default.
    • Turning the option on by default on Linux would be worth considering carefully. However, such inconsistencies by target have a non-trivial cost. This would effectively mean that if you're running both a pre-1.78 version of Cargo and a new version of Cargo, you may want to disable GC, unless you're running on Linux; that kind of inconsistency leads to complexity for users.
  • If it defaults to off on all targets, I'm not sure it's worth having at all, since anyone who uses it could also just disable GC.

My current position is that this seems like more complexity than we should incur, compared to recommending that people who regularly need to run pre-1.78 versions of Cargo use cache.auto-clean-frequency = "never".

I would propose that we add a release note along these lines:

"When building, Cargo downloads and caches crates needed as dependencies. Historically, these downloaded files would never be cleaned up, leading to an unbounded amount of disk usage in Cargo's home directory. In this version, Cargo introduces a garbage collection mechanism to automatically clean up old files (e.g. .crate files). Cargo will remove files downloaded from the network if not accessed in 3 months, and files obtained from the local system if not accessed in 1 month. Note that this automatic garbage collection will not take place if running offline (using --offline or --frozen).

Cargo 1.78 and newer track the access information needed for this garbage collection. If you regularly use versions of Cargo older than 1.78, in addition to running current versions of Cargo, and you expect to have some crates accessed exclusively by the older versions of Cargo and don't want to re-download those crates every ~3 months, you may wish to set cache.auto-clean-frequency = "never"."

(We should also include a "For more information, see the cargo documentation for cache.auto-clean-frequency" link at the end.)

@ijackson
Copy link
Contributor

ijackson commented Apr 15, 2025

First, an apology

An earlier version of this message started with some comments that, while well-intentioned, landed very badly, and seemed quite unpleasant to some folks. So, I would like to apologise. I've deleted/reworded things now ,and I will definitely try to avoid any such situation in the future.

Introduction

Much of your message is an argument against enabling atime. I don't think that's relevant, since that's not up to cargo.

The remainder seems mostly to be arguments against relying solely on atime. But, no-one is proposing that.

Focusing on the suggested improvement

Let me try to narrow this down to my suggested improvement:

[A] failure mode of using atime, even as a secondary check (e.g. "if atime is less than N days old don't GC it even if the database says otherwise"), is that too many things may end up updating it. On systems other than Linux, it's common to have background indexing or scanning mechanisms; a system with those running might never do garbage collection if we pay attention to atime.

Then we could check the atime only on non-MacOS Unix platforms, where we expect it to be reliable.

Even on Linux, a recursive grep or other search would update atime.

That is expected (and possibly intended). This does not mean that considering the atime is going to cause operational problems.

Behaviour of the current proposal without looking at atimes

Leaving all that aside, the proposed behavior of Cargo is "It will delete files that need to be downloaded from the network after 3 months". So, to the best of my understanding, in order for this to cause a problem for people using older versions of Cargo, there would have to be items in Cargo's home directory that are exclusively accessed by an old version of Cargo, for 3 months,

I think we may have a disconnect about what "after 3 months" means.

I take this to mean "files will be deleted unless the database entry records a last access in the last 3 months".

For files which are accessed only by old cargo, there will never be any database entry. Therefore this condition will be met. Therefore if we're not looking at file timestamps, they will be deleted immediately. So the same files would be re-downloaded on each switch between old and new cargo.

But maybe the current code looks at file modification times to find deletion candidates? In that case the behaviour is as you suggest. But it could be made almost-pareto better on Linux by also considering the atime.

Files accessed only by old cargo

With new versions of Cargo that have the MSRV-aware resolver, that seems much more likely, if some older version of a crate were used only by older Cargo; however, for versions of Cargo older than 1.78 (where we introduced the access tracking), that seems somewhat less likely, unless you have crates you exclusively build using old Cargo and never test with any recent Cargo.

I like to do serious engineering. Serious engineering involves providing stability to my downstreams. That means testing my MSRV. It means that there are crates that are only used when I'm working on MSRV support.

(I am not surprised that only a small percentage of respondents to the Rust Survey report using older compilers. After all, only a small percentage of respondents are seriously and heavily involved in software engineering in Rust, maintaining high-quality libraries, etc.)

MSRV testing can (indeed, usually, must) be done without use of the MSRV-aware resolver, since the MSRV-aware resolver is very new and is not available in my MSRV. The basic approach is to use -Z minimal-versions, with a pinned nightly from nearly the MSRV. To avoid everyone who works on the project having to mess about with that, a Cargo.lock.minimal can be committed in-tree.

Note that while obviously MSRV testing happens in CI, it is not unusual for new work to cause MSRV violations. Developers will then want to build and test with the MSRV locally. So a usual workflow does involve switching back and forth between very old and very new versions.

Tradeoffs

We see a lot of people complaining about how much disk space Cargo takes

This is not surprising since currently cargo doesn't ever delete anything. We can't infer anything from this, about the likely receptions of specific details of reclamation strategies.

If we do something by default that causes a substantial proportion of systems to fail to reliably GC

No-one is proposing this. I am quite prepared to believe that Windows and MacOS have automated systems that mean that atimes can often be far too recent. I care about operating systems that already work reliably - like the ones I use. I think cargo should work well, and reliably, there, even if it must operate in a degraded way on less reliable operating systems.

I think atime is either reliable, or disabled, on the vast majority of Linux systems. There, automated scanning systems can be expected to use O_NOATIME (which has existed for decades - there's even an rsync option to use it).

@mati865
Copy link
Contributor

mati865 commented Apr 15, 2025

There are more good reasons not to rely on atime (at least by default or without checking mount flags) like CoW filesystems. Ther recommendations for Btrfs and ZFS are to disable atime and there are distributions out there that already do that by default.

@joshtriplett
Copy link
Member

I take this to mean "files will be deleted unless the database entry records a last access in the last 3 months".

For files which are accessed only by old cargo, there will never be any database entry. Therefore this condition will be met. Therefore if we're not looking at file timestamps, they will be deleted immediately. So the same files would be re-downloaded on each switch between old and new cargo.

When cargo "adopts" old otherwise-untracked files into the database, it does so using the current date. It will not remove them until 3 months after that. So no, it will not remove such files immediately.

We did consider the possibility of never "adopting" old files automatically, and we'll likely consider it further. However, that would have the downside of leaving many files around that for many people would never be used again (because they were used with old versions of cargo that are no longer installed).

I like to do serious engineering.

Please don't imply that everyone who has different values or priorities than yours is doing unserious engineering, or doesn't care about supporting their users or providing stability for their users.

As mentioned, we do care about people using old MSRVs. If there was no solution for that use case, that would be an issue. However, there is always the option of setting cache.auto-clean-frequency = "never", which changes the question to whether we need to do more than that. Adding more complexity and more failure modes to this solution (including the possibility of having behavior differ between targets, and the complexity of more configurability) would provide ever-diminishing value once common "old MSRV" configurations advance past the version of Cargo that started tracking last-used data; however, the maintenance costs would continue on long into the future after that.

Then we could check the atime only on non-MacOS Unix platforms, where we expect it to be reliable.

I did mention that possibility in my comment. Alongside mentions of the ongoing cost of inconsistency between targets. We're willing to incur those costs when necessary, especially if there's no other available solution, but as mentioned above there is a solution for old MSRVs, just not the solution one might pick if optimizing more for that case at the expense of other tradeoffs.

@obi1kenobi
Copy link
Member

As another person who regularly wrangles a bunch of different Rust versions across several operating systems and targets, the proposed solution and workarounds sound entirely reasonable and adequate to me. I can't wait to use this!

As things stand, I personally would prefer to have this feature as-is sooner, rather than having further incremental improvements but getting the feature later. I'd feel doubly so if delaying the feature only produces more debate and no significant improvement :)

@epage
Copy link
Contributor

epage commented Apr 22, 2025

We understand that this feature will be periodically disruptive to some workflows. In weighing things out though, we've decided to move forward with this. We do think its important that this gets called out in the release notes to raise awareness of this change.

Specifically, this will affect people that

  • Have an MSRV older than 1.78 (which will be over a year old)
  • Either they pin their compiler to their MSRV or they maintain a separate set of dependencies when verifying their MSRV
  • While also running the latest Cargo version in other situations

The impact of this change will be that every 3 months they will need to redownload the index and .crate files and will need to re-decompress .crate files for dependencies exclusively used by a pre-1.78 Cargo. This clean up won't happen when in --offline but a fetch may be needed before going into --offline mode. While we've been tracking the caches since 1.78, we won't clean up immediately on the release of this feature being stabilized but 3 months later as we only start tracking untracked files on clean up. For infrequent verifying of MSRVs on old enough Cargo's, the Git index will be used and is generally slow already, enough so to likely dwarf the time spent re-downloading deleted content.

The workarounds for this include

  • Turning off GC
  • Periodically running cargo +stable fetch on a project with pre-1.78-only dependencies
  • Switch to a workflow that doesn't run into this problem (e.g. tracking MSRV-compatible deps in your Cargo.lock)

Potential solutions include

  • Tracking atime but this comes with a lot of caveats and technical issues that make this is a risky change for a problem that will diminish over time as MSRVs are raised.
  • Deferring tracking of untracked files. However, tracking these will be done at some point anyways and likely not at a time frame to satisfy someone who has a 7 year old MSRV

We also need to balance this against users who would immediately benefit from cleaning up of untracked files that either

  • Have an MSRV that is 1.78+
  • Share dependencies between their MSRV verification and regularly development

Previously, when we needed to weigh the trade offs against different workflows, we found it better to do so in favor of people without an MSRV or those that always develop with their MSRV's dependencies (source).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-documenting-cargo-itself Area: Cargo's documentation disposition-merge FCP with intent to merge finished-final-comment-period FCP complete S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-cargo Team: Cargo to-announce Z-gc Nightly: garbage collection
Projects
Status: FCP blocked
Development

Successfully merging this pull request may close these issues.