-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More sized-efficient pixi.lock (with care to be optimized for git packfile). #1509
Comments
Do you have an idea on how we could change the format to achieve this? I think one of the biggest issues is the presence of sha hashes because those compress terribly. We tried to minimize the places where these occur for that reason. |
Not off the top of my head -- I definitely acknowledge it's a hard problem, so I was at least somewhat relieved to find that it still compressed reasonably well in the packfile. Looking at the file itself it seems like there is still maybe a lot of meta information that is redundant with package management meta-data from conda/pypi as well. From an information theory perspective does the lockfile need to have more than a table of: All the information about the kind of package, where to find it, it's own transitive dependencies, etc. seem like they could be re-computed from the Maybe there need to be two files here? One is a strict minimal |
There has been some talk in the python packaging space along these lines, where the lock-file would still need to be rendered into a suitable format. Could be food for thought :) |
We are facing a bit the same issue when we intend to use
In In So the "environment" part of the I clearly can understand that in some other cases, you might need more information to have a fully and secure reproducible environment. Maybe there is a way to set the cursor for the underlying use-case.
If this is something possible, this could be one of the trade-off where less information are stored at a small cost of recomputing some potential information when the CI is triggered. |
Like discussed in the discord. I'm also curious what we can do if we deduplicate some of the common prefixes. |
BTW @jleibs what (commands) are you using to compare in the packfile? |
By saving only one of those two hashes, we reduce the lock file size a bit. This should be especially noticeable in git repos, since hashes compress poorly. This PR is a contribution to improving prefix-dev/pixi#1509
Problem description
The pixi.lock file in our repository is now over 1 megabyte (https://github.com/rerun-io/rerun/blob/main/pixi.lock).
It still compresses reasonably in git object storage within the packfile (taking up about 3MB of storage across history), but it is fast-becoming a meaningful contributor to repository growth.
This is a tricky one to do something about, as we ultimately care more about contribution to the delta-compressed packfile than we care about the actual size of the file-on-disk. Compression strategies that make the file smaller in a single checkout but harder to compress would still be a net-negative.
The text was updated successfully, but these errors were encountered: