-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically purge target directories after reaching max size #346
Comments
Is there a way of determining the size of a folder that is faster than iterating over all the files and summing the size? If so then we can run Cargo-Sweep each time we hit the configured size, it keeps only artifacts newer then a timestamp. If not I can add a argument to Cargo-Sweep to remove older artifacts until the folder is under a target size. Also do you happen to know if the target folder support "last access time"? If like my computer it does not then the next step is on Cargo, if it does support it then I can make progress with Cargo-Sweep. |
The easiest way is to do whatever cleanup routine we choose after we reach, let's say, 90% of total disk usage on the partition. Querying the free space on a partition should be instantaneous I think.
This would be great!
It's really unreliable on the current machines, and AFAIK making it reliable will slow things down a lot. |
touch some files when we use them This is a small change to improve the ability for a third party subcommand to clean up a target folder. I consider this part of the push to experiment with out of tree GC, as discussed in #6229. how it works? -------- This updates the modification time of a file in each fingerprint folder and the modification time of the intermediate outputs every time cargo checks that they are up to date. This allows a third party subcommand to look at the modification time of the timestamp file to determine the last time a cargo invocation required that file. This is far more reliable then the current practices of looking at the `accessed` time. `accessed` time is not available or disabled on many operating systems, and is routinely set by arbitrary other programs. is this enough to be useful? -------- The current implementation of cargo sweep on master will automatically use this data with no change to the code. With this PR, it will work even on systems that do not update `accessed` time. This also allows a crude script to clean some of the largest subfolders based on each files modification time. is this worth adding, or should we just build `clean --outdated` into cargo? ------ I would love to see a `clean --outdated` in cargo! However, I think there is a lot of design work before we can make something good enough to deserve the cargo teams stamp of approval. Especially as an in tree version will have to work with many use cases some of witch are yet to be designed (like distributed builds). Even just including `cargo-sweep`s existing functionality opens a full bike shop about what arguments to take, and in what form (`cargo-sweep` takes a days argument, but maybe we should have a minutes or a ISO standard time or ...). This PR, or equivalent, allows out of tree experimentation with all different interfaces, and is basically required for any `LRU` based system. (For example [Crater](rust-lang/crater#346) wants a GC that cleans files in an `LRU` manner to maintain a target folder below a target size. This is not a use case that is widely enough needed to be worth adding to cargo but one supported by this PR.) what are the downsides? ---- 1. There are legitimate performance concerns about writing so many small files during a NOP build. 2. There are legitimate concerns about unnecessary wrights on read-only filesystems. 3. If we add this, and it starts seeing widespread use, we may be de facto stabilizing the folder structure we use. (This is probably true of any system that allows out of tree experimentation.) 4. This may not be an efficient way to store the data. (It does have the advantage of not needing different cargos to manipulate the same file. But if you have a better idea please make a suggestion.)
The lru cleaning has a PR. The PR to have Cargo maintain the mtime was merged, but was then put behind a feature flag as it broke the playground. @ehuss reports that the feature as implemented slows down the playground by ~11sec, and a more limited version (just enough for cargo-sweep to work) slows it down by 2sec. I think Crater is also using Docker and AWS in a similar way. |
Yep, we're basically using the exact same setup. |
When I've suggested this previously it has been in combination with a proposal that we do some form of sorting of crates with their dependencies to reduce the impact of auto-cleaning (e.g. chances are you'll have finished a bunch of crates that then don't need rebuilding). More interestingly, one could imagine selectively cleaning out dependencies when they're done (e.g. once you're done with serde version 1.0.9 you clean just that). |
On discord @aidanhs "I can say that crater doesn't put the target directory in the image so there's no CoW stuff. in fact I think the crater image is entirely read-only and used for linking libraries" Which suggests that a quick experiment to see how big the overhead is with the existing flag may be worth it, if someone has the time. I like the idea of a Crater graph aware solution, I think it can build on #193. I can volunteer to help in as much as it involves adding to the "target folder gc" ecosystem. (I am mostly interested in helping that, Crater is just an example at the moment.) One thought I had is that if Crater is walking over a topological sort of the things to build and knows it is done with all the things it built before "foo" then it can use cargo-sweep with the exact time it built "foo". Or even just dell all files with a creation time before "foo". I think |
discord: @pietroalbini at 12:21 PM
|
At the moment the target directories used by Crater don't have a size limit, so they reach hundreds of gigabytes in size, forcing us to have 4TB disks on the agents. We should implement a way to keep them within a configurable size.
@aidanhs suggested removing them after the configured size is reached. This will slow down the runs a lot, but if we keep the max size high enough we can maybe clear them max 1 or 2 times during a run, which shouldn't hit the speed too much.
The text was updated successfully, but these errors were encountered: