-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fsck: Mark commits with missing or deleted object partial #1533
Conversation
For this I'd initially thought of moving the guts of fsck into the library, much like we did in 5c94098 Adding new public API for commit parents feels somewhat special case to this...or, hm, maybe someone would want to know "what commits refer to this object" for space reduction purposes. |
old_parents = g_hash_table_lookup (inout_parents, key); | ||
if (old_parents == NULL) | ||
{ | ||
/* For the common case of a single pointer we skip using an array to save memory. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One worry I have with this is that there's also going to be a class of objects which are basically in every single commit. It feels like we may want to distinguish between client-side versus server-side repositories.
In the server side case, if I have say 100 commits of history, and I'm taking the same base component set across my builds (say bash
binaries) or whatever, then every one of those objects is going to point to all of the parents.
Throwing together some numbers, currently:
# ostree ls -R fedora-27:fedora/27/x86_64/workstation | wc -l
129280
Say we have server side 100 commits (not unreasonable?) and between them about 75% of the files are in common...that's about:
gjs> (((100000*0.75)*100)*8)/(1024*1024)
57.220458984375
57 megs of pointer data. Not a big deal I guess. I just wanted to work through it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, the flathub repo currently has 2822 refs, and 7530 commits. However, we don't share anywhere near 75% of files, because apps are different, and we don't keep long histories.
There are certain objects that are very common though, like the "default" dirmeta, which will be in essentially every commit... Not going to be a lot of these though.
src/libostree/ostree-repo-traverse.c
Outdated
g_autofree GVariant **new_parents = NULL; | ||
gsize len, i; | ||
|
||
len = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor but would you mind using declare-and-initialize for new code?
Yeah, for historical reasons the API with a separate hashtable is a bit ugly, but it seems like the backtracking could be generally useful even outside fsck. |
☔ The latest upstream changes (presumably eb506c7) made this pull request unmergeable. Please resolve the merge conflicts. |
This is a version of ostree_repo_traverse_commit_union that also remembers where the objects came from, by recording the parent relationships in a hashtable. This can be used to later find which commits each object was from, which we want to use in fsck.
This means we can later use various operations to heal the repository because ostree does not assume all objects are there. This the begining of a fix for ostreedev#345
bc6959f
to
b4da9c9
Compare
Rebased 🏄♂️ |
I also added a test to One thing I hit on is that this only works for |
☀️ Test successful - status-atomicjenkins |
@@ -30,6 +30,20 @@ ostree --repo=bare-repo init --mode=bare-user | |||
ostree --repo=bare-repo remote add origin --set=gpg-verify=false $(cat ${test_tmpdir}/httpd-address) | |||
log_timestamps ostree --repo=bare-repo pull --disable-static-deltas origin ${host_nonremoteref} | |||
|
|||
echo "ok pull" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to update some number-of-tests count when you add a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests aren't yet being parsed by something that understands TAP...someday later.
This means we can later use various operations to heal the repository because ostree does not assume all objects are there.
This the begining of a fix for #345