Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a garbage collection mechanism to the CLI #1217

Merged
merged 1 commit into from
Mar 21, 2024
Merged

Conversation

charliermarsh
Copy link
Member

Summary

Detects unused cache entries, which can come in a few forms:

  1. Directories that are out-dated via our versioning scheme.
  2. Old source distribution builds (i.e., we have a more recent version).
  3. Old wheels (stored in archive-v0, but not symlinked-to from anywhere in the cache).

Closes #1059.

@charliermarsh charliermarsh added the enhancement New feature or improvement to existing functionality label Feb 1, 2024
}
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This phase is pretty bad (tightly coupled to the cache, but awkwardly so in that it doesn't read the manifest files that point to the latest entry, it just looks for the most recent directory and deletes all the others).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not read the manifest files?

I think being tightly coupled here is okay, since this is the puffin-cache crate after-all. :)

(I do think our cache representation is a little leaky, but that's a problem for another day.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason is that the manifest files and readers are all in puffin-distribution (which depends on cache crate), but this code is in the cache crate itself :(

It might be necessary though. Removing the "wrong" directories here will break the cache, which is really bad.

@charliermarsh
Copy link
Member Author

I'll also add some tests for this. Parts of it are really tedious to test, but it's probably important.

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I can see how testing this would be annoying. Maybe one way is to just explicitly build cache directories and then run prune and assert the expected result. But then your tests are tightly coupled with the cache representation.

crates/puffin-cache/src/lib.rs Outdated Show resolved Hide resolved
}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not read the manifest files?

I think being tightly coupled here is okay, since this is the puffin-cache crate after-all. :)

(I do think our cache representation is a little leaky, but that's a problem for another day.)

crates/puffin-cache/src/lib.rs Outdated Show resolved Hide resolved
crates/puffin-cache/src/lib.rs Outdated Show resolved Hide resolved
crates/puffin/src/commands/clean.rs Outdated Show resolved Hide resolved
@charliermarsh charliermarsh force-pushed the charlie/gc branch 5 times, most recently from 5707e88 to 61d2e4c Compare March 21, 2024 17:55
@charliermarsh
Copy link
Member Author

I'm merging this for now without "Old source distribution builds (i.e., we have a more recent version)." That's more complex, and was delaying it, but this is already useful.

@charliermarsh charliermarsh added the cli Related to the command line interface label Mar 21, 2024
@charliermarsh
Copy link
Member Author

I'm not sure why CI / check system | python3.13 on windows (pull_request) is now failing, but it shouldn't be related to this PR. I'll follow-up separately.

@charliermarsh charliermarsh merged commit 0f96386 into main Mar 21, 2024
30 of 31 checks passed
@charliermarsh charliermarsh deleted the charlie/gc branch March 21, 2024 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli Related to the command line interface enhancement New feature or improvement to existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a garbage collection command for the cache
2 participants