-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
irmin-pack: introduce chunked suffix abstraction #2115
Conversation
c931439
to
d5d13a0
Compare
Codecov Report
@@ Coverage Diff @@
## main #2115 +/- ##
==========================================
+ Coverage 64.77% 64.86% +0.08%
==========================================
Files 133 134 +1
Lines 15890 15946 +56
==========================================
+ Hits 10293 10343 +50
- Misses 5597 5603 +6
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, LGTM!
Ah, CI is failing because of |
d5d13a0
to
a602ce2
Compare
a602ce2
to
b70ae53
Compare
{ inventory } | ||
|
||
let appendable_ao t = (Inventory.appendable t.inventory).ao | ||
let end_poff t = appendable_ao t |> Ao.end_poff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong - the end_poff
is used upstreamed as the end poff of the whole suffix abstraction, but here it returns only the end poff of the last chunk. I should be the sum over all chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing this up. I think this is worth some discussion.
The way I understand how the suffix's poff
is currently used in the control file is as a control mechanism to know/verify what data we have written to disk (as is seen in the append only file's consistency check). In this sense, the current code is correct. It tracks the physical offset of the appendable chunk for consistency checks. No other chunk can be changed so for other chunks their end_poff
is necessarily equivalent to their Io.size_of_path
(the awkward code in Ao_chunk.open_ro
that does exactly this).
Summing the values would give the chunked suffix's length and final offset but doesn't really correlate with a physical offset
While working on this I intentionally wanted to keep the existing code working with minimal extra changes but I do think we need to change this. Here is my proposal for a future PR:
- rename control file field to
suffix_consistency_poff
(or something else -- open to ideas!) - rename
end_poff
in chunked file toconsistency_poff
- move consistency checking up from append only to chunked suffix and remove
end_poff
from append only'sapiopen api (it would still be an available property)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dispatcher needs to know the end offset for the suffix file in order to verify that a read is within bounds. This does not correspond to a physical offset, indeed. It's "a virtual offset corresponding to the physical end offset, if the suffix was a single file". I'll call it suffix_end_off
.
We can either (I am reiterating what you say):
- keep
end_poff
as is, keep the control file as is (tracking theend_poff
of the last chunk), but then we need a function to compute thesuffix_end_off
- this solution works for me, I do find the concept ofsuffix_end_off
a bit weird. - replace the
end_poff
in the chunked file and the control file with thesuffix_end_off
- I'm not sure if this is what you are proposing for a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I just looked in the dispatcher and see its usage of Suffix.end_poff
-- this does need to be updated to be the suffix end offset. Before chunks, this makes sense as an end_poff
, but there is no "end poff" for a chunked suffix. I don't necessarily love adding another virutalized set of offsets, but the suffix's offset range will necessarily be virtualized for chunks since it doesn't map directly to a single range of physical offsets any longer. And it is neatly contained within Chunked_suffix
which helps reduce the scope of offsets to consider.
Thanks for pointing out dispatcher's use of the end offset for the suffix. I think the following proposal addresses everything:
- Add
end_off
toChunked_suffix
to represent the last offset of the suffix. This can be a sum of the chunks poffs but I don't think tracking it in the control file is needed. - Rename
Chunked_suffix.end_poff
toconsistency_poff
(or something else -- but I think "end poff" doesn't make sense in the context of a chunked suffix) and also the control file field to reflect the name change. This is the offset we need to track in the control file for consistency checks since we just need to check the appendable chunk's poff upon open. As a part of this, also move the consistency check from append only to chunked suffix (like mentioned previously).
I will make the first change in this PR since you rightly pointed out a correctness issue.
The second one can be a future PR since it is mostly a cosmetic name change, but it will also get rid of some of the awkward end_poff
code (like in Ao_chunk.open_ro
) which will be nice. Edit: on second glance, I see that we set let persisted_end_poff = end_poff
directly in append only so perhaps this needs more consideration (immediate thought is to use Io
for the non-appendable chunks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make the first change in this PR since you rightly pointed out a correctness issue.
it does not need to be in this PR necessarily, as there is nothing that breaks here. I included a proposal for this in #2118, you can either review it there or add your own here, as you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I missed that. I'll just review in your PR since you have already made the change. 👍
c55a987
to
a4cfd81
Compare
@metanivek can you add a changelog entry to mention that the file disk format changed to chunk files? |
43929d1
to
8f2229f
Compare
This is ready for a final review before merging. I have resolved all of the comments that I have addressed in the PR. I will add the list of things to revisit to #2096 once this merges. |
a1204da
to
88b6888
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Ship it!
9026a9a
to
47f95af
Compare
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.5.0) CHANGES: ### Added - **irmin-pack** - Add `Irmin_pack_unix.Stats.Latest_gc` which is now the parameter of GC's `finished` callback (mirage/irmin#2089, @Ngoguey42) - Add `Gc.oldest_live_commit` which returns the key of the commit on which the latest gc was called on. (mirage/irmin#2110, @icristescu) - Add `split` to create a new suffix chunk. Subsequent writes will append to this chunk until `split` is called again. (mirage/irmin#2118, @icristescu) - Add `create_one_commit_store` to create a new store from the existing one, containing only one commit. (mirage/irmin#2125, @icristescu) ### Changed - **irmin-pack** - Upgraded on-disk format to version 4. (mirage/irmin#2110, @icristescu) - Detecting control file corruption with a checksum (mirage/irmin#2119, @art-w) - Change on-disk layout of the suffix from a single file to a multiple, chunked file design (mirage/irmin#2115, @metanivek) - Modify GC to work with new chunked suffix. See `examples/gc.ml` for a demonstration of how it works with the new `split` function. (mirage/irmin#2126, @metanivek) ### Fixed
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.5.0) CHANGES: ### Added - **irmin-pack** - Add `Irmin_pack_unix.Stats.Latest_gc` which is now the parameter of GC's `finished` callback (mirage/irmin#2089, @Ngoguey42) - Add `Gc.oldest_live_commit` which returns the key of the commit on which the latest gc was called on. (mirage/irmin#2110, @icristescu) - Add `split` to create a new suffix chunk. Subsequent writes will append to this chunk until `split` is called again. (mirage/irmin#2118, @icristescu) - Add `create_one_commit_store` to create a new store from the existing one, containing only one commit. (mirage/irmin#2125, @icristescu) ### Changed - **irmin-pack** - Upgraded on-disk format to version 4. (mirage/irmin#2110, @icristescu) - Detecting control file corruption with a checksum (mirage/irmin#2119, @art-w) - Change on-disk layout of the suffix from a single file to a multiple, chunked file design (mirage/irmin#2115, @metanivek) - Modify GC to work with new chunked suffix. See `examples/gc.ml` for a demonstration of how it works with the new `split` function. (mirage/irmin#2126, @metanivek) ### Fixed
This PR is part of #2096 and introduces a chunked suffix abstraction that works with our current single suffix world. The goal with this change is to introduce the abstraction as a step towards multiple chunks and modifying our GC to work with chunks.