New version of ocaml-git#395
Merged
Merged
Conversation
Member
Author
|
It seems that the current status of this PR can perfectly works with
And I think it's done but this PR seems to be a good bedrock for the release of |
- ENCODER / DECODER These interfaces are used to describe a non-blocking interface. From a new perspective, these interfaces are useless where `angstrom` and `encore` are sufficient to obtain a non-blocking encoder/decoder (see [Angstrom.state] and [Encore.Lavoisier.state]) - META A special interface (with an argument) to expose the format of a Git object from a /meta/ syntax (the module argument). However, since `encore.0.6`, this way is not used anymore (a GADT is used instead) - DESC A description of the produced format from a /meta/ decoder language such as `angstrom` or a /meta/ encoder language such as `encore.lavoisier`. Due to `encore.0.6`, such interface is not used anymore. Instead, each Git object should provide a [val format : t Encore.t] which can derive to an `angstrom`'s parser or a `lavoisier`'s encoder. - INFLATE / DEFLATE Due to the fact that Zlib is used by a middle layer of Git (eg. PACK file), it's unnecessary to /functorize/ `ocaml-git` over these interfaces. - FILE / DIR / MAPPER / FS The first entry-point of this PR, deletion a needed implementation of a file-system. By this way, such POSIX-close interfaces must be removed!
- null: [= digest_string ""], this value is used to initialise some values such as an [Hash.t array] with an impossible value (Git should __never__ create an Git object with the hash [digest_string ""] - length: a better name for [digest_size] - feed: be able to feed a [Bigstringaf.t] value
Member
Author
|
204 commits after, I think it's done. |
bb93091 to
2e518b0
Compare
The interface will include only: - S.DIGEST - S.BASE The interface defines only one new type [hash] - and [Make] constraints it. A type [t] exists outside the /functor/ and it is reused by the /functor/.
We define a type [t] which represents the Git Blob object. Independently of the hash implementation used, the object exists. Some functions are exposed to manipulate it (with a documentation).
The interface will include only: - S.DIGEST - S.BASE - Some function to manipulate a tree - a [format] value which describes the format a Git Tree object The interface defines only one new type [hash] - and [Make] constraints it. A type [t] exists outside the /functor/ and it is reused by the /functor/.
We define a type [t] which represents the Git Tree object. Independently of the hash implementation used, the object exists (it is parameterized by ['hash]). Some functions are exposed to manipulate it.
The description of the Tree format is represented by an [encore]'s value [format] (and exposed by the interface). This patch is a translation from the /meta-syntax/ to [encore]'s combinators.
This module is a replacement of old [Helper] module. It provides a way to calculate the hash of a Git object given by its OCaml representation. The way to calculate the hash is: 1) serialise the Git object 2) start with a /header/ ([kind length\000]) 3) feed the context with the serialised Git object A Git object can be big. Instead to entirely serialise it, we /stream/ it to limit the memory footprint. NOTE: the memory footprint is __not__ really limited (see [test/tree/test] to understand why) but, at least, the serialised value is cut to many /small/ [string].
NOTE we don't need temporary buffers anymore to calculate the hash of a Git object.
Member
Author
|
|
This was referenced Sep 2, 2020
dinosaure
added a commit
to dinosaure/opam-repository
that referenced
this pull request
Jan 9, 2021
… git-unix (3.0.0) CHANGES: - Rewrite of `ocaml-git` (@dinosaure, mirage/ocaml-git#395) - Delete useless constraints on digestif's signature (@dinosaure, mirage/ocaml-git#399) - Add support of CoHTTP with UNIX and MirageOS (@ulugbekna, mirage/ocaml-git#400) - Add progress reporting on fetch command (@ulugbekna, mirage/ocaml-git#405) - Lint dependencies on packages (`git-cohttp-unix` and `git-cohttp-mirage`) and update to the last version of CoHTTP (@hannesm, mirage/ocaml-git#407) - Fix internal `Cstruct_append` implementation (@dinosaure, mirage/ocaml-git#401) - Implement shallow commit (@dinosaure, mirage/ocaml-git#402) - Update to `conduit.3.0.0` (@dinosaure, mirage/ocaml-git#408) (deleted by the integration of `mimic`) - Delete use of `ocurl` (@dinosaure, mirage/ocaml-git#410) - Delete the useless **old** `git-mirage` package (@hannesm, mirage/ocaml-git#411) - Fix about unresolved endpoint with `conduit.3.0.0` (@dinosaure, mirage/ocaml-git#412) - Refactors fetch command (@ulugbekna, mirage/ocaml-git#404) - Fix ephemerons about temporary devices (@dinosaure, mirage/ocaml-git#413) - Implementation of `ogit-fetch` as an example (@ulugbekna, mirage/ocaml-git#406) - Rename `nss` to `git-nss` (@dinosaure, mirage/ocaml-git#415) - Refactors `git-nss` (@ulugbekna, mirage/ocaml-git#416) - Update README.md (@ulugbekna, mirage/ocaml-git#417) - Replace deprecated `Fmt` functions (@ulugbekna, mirage/ocaml-git#421) - Delete physical equality (@ulugbekna, mirage/ocaml-git#422) - Rename `prelude` argument by `uses_git_transport` (@ulugbekna, mirage/ocaml-git#423) - Refactors Smart decoder (@ulugbekna, mirage/ocaml-git#424) - Constraint to use `fmt.0.8.7` (@dinosaure, mirage/ocaml-git#425) - Small refactors in `git-nss` (@dinosaure, mirage/ocaml-git#427) - Delete `conduit.3.0.0` and replace it by `mimic` (@dinosaure, mirage/ocaml-git#428) - Delete the useless `verify` function on `fetch` and `push` (@dinosaure, mirage/ocaml-git#429) - Delete `pin-depends` on `awa` (@dinosaure, mirage/ocaml-git#431)
dinosaure
added a commit
to dinosaure/opam-repository
that referenced
this pull request
Jan 9, 2021
…t-unix and git-mirage (3.0.0) CHANGES: - Rewrite of `ocaml-git` (@dinosaure, mirage/ocaml-git#395) - Delete useless constraints on digestif's signature (@dinosaure, mirage/ocaml-git#399) - Add support of CoHTTP with UNIX and MirageOS (@ulugbekna, mirage/ocaml-git#400) - Add progress reporting on fetch command (@ulugbekna, mirage/ocaml-git#405) - Lint dependencies on packages (`git-cohttp-unix` and `git-cohttp-mirage`) and update to the last version of CoHTTP (@hannesm, mirage/ocaml-git#407) - Fix internal `Cstruct_append` implementation (@dinosaure, mirage/ocaml-git#401) - Implement shallow commit (@dinosaure, mirage/ocaml-git#402) - Update to `conduit.3.0.0` (@dinosaure, mirage/ocaml-git#408) (deleted by the integration of `mimic`) - Delete use of `ocurl` (@dinosaure, mirage/ocaml-git#410) - Delete the useless **old** `git-mirage` package (@hannesm, mirage/ocaml-git#411) - Fix about unresolved endpoint with `conduit.3.0.0` (@dinosaure, mirage/ocaml-git#412) - Refactors fetch command (@ulugbekna, mirage/ocaml-git#404) - Fix ephemerons about temporary devices (@dinosaure, mirage/ocaml-git#413) - Implementation of `ogit-fetch` as an example (@ulugbekna, mirage/ocaml-git#406) - Rename `nss` to `git-nss` (@dinosaure, mirage/ocaml-git#415) - Refactors `git-nss` (@ulugbekna, mirage/ocaml-git#416) - Update README.md (@ulugbekna, mirage/ocaml-git#417) - Replace deprecated `Fmt` functions (@ulugbekna, mirage/ocaml-git#421) - Delete physical equality (@ulugbekna, mirage/ocaml-git#422) - Rename `prelude` argument by `uses_git_transport` (@ulugbekna, mirage/ocaml-git#423) - Refactors Smart decoder (@ulugbekna, mirage/ocaml-git#424) - Constraint to use `fmt.0.8.7` (@dinosaure, mirage/ocaml-git#425) - Small refactors in `git-nss` (@dinosaure, mirage/ocaml-git#427) - Delete `conduit.3.0.0` and replace it by `mimic` (@dinosaure, mirage/ocaml-git#428) - Delete the useless `verify` function on `fetch` and `push` (@dinosaure, mirage/ocaml-git#429) - Delete `pin-depends` on `awa` (@dinosaure, mirage/ocaml-git#431)
dinosaure
added a commit
to dinosaure/opam-repository
that referenced
this pull request
Jan 9, 2021
…t-unix and git-mirage (3.0.0) CHANGES: - Rewrite of `ocaml-git` (@dinosaure, mirage/ocaml-git#395) - Delete useless constraints on digestif's signature (@dinosaure, mirage/ocaml-git#399) - Add support of CoHTTP with UNIX and MirageOS (@ulugbekna, mirage/ocaml-git#400) - Add progress reporting on fetch command (@ulugbekna, mirage/ocaml-git#405) - Lint dependencies on packages (`git-cohttp-unix` and `git-cohttp-mirage`) and update to the last version of CoHTTP (@hannesm, mirage/ocaml-git#407) - Fix internal `Cstruct_append` implementation (@dinosaure, mirage/ocaml-git#401) - Implement shallow commit (@dinosaure, mirage/ocaml-git#402) - Update to `conduit.3.0.0` (@dinosaure, mirage/ocaml-git#408) (deleted by the integration of `mimic`) - Delete use of `ocurl` (@dinosaure, mirage/ocaml-git#410) - Delete the useless **old** `git-mirage` package (@hannesm, mirage/ocaml-git#411) - Fix about unresolved endpoint with `conduit.3.0.0` (@dinosaure, mirage/ocaml-git#412) - Refactors fetch command (@ulugbekna, mirage/ocaml-git#404) - Fix ephemerons about temporary devices (@dinosaure, mirage/ocaml-git#413) - Implementation of `ogit-fetch` as an example (@ulugbekna, mirage/ocaml-git#406) - Rename `nss` to `git-nss` (@dinosaure, mirage/ocaml-git#415) - Refactors `git-nss` (@ulugbekna, mirage/ocaml-git#416) - Update README.md (@ulugbekna, mirage/ocaml-git#417) - Replace deprecated `Fmt` functions (@ulugbekna, mirage/ocaml-git#421) - Delete physical equality (@ulugbekna, mirage/ocaml-git#422) - Rename `prelude` argument by `uses_git_transport` (@ulugbekna, mirage/ocaml-git#423) - Refactors Smart decoder (@ulugbekna, mirage/ocaml-git#424) - Constraint to use `fmt.0.8.7` (@dinosaure, mirage/ocaml-git#425) - Small refactors in `git-nss` (@dinosaure, mirage/ocaml-git#427) - Delete `conduit.3.0.0` and replace it by `mimic` (@dinosaure, mirage/ocaml-git#428) - Delete the useless `verify` function on `fetch` and `push` (@dinosaure, mirage/ocaml-git#429) - Delete `pin-depends` on `awa` (@dinosaure, mirage/ocaml-git#431)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The new version of
ocaml-gitThe initial goal of this PR is to MirageOS-ize
ocaml-git. Indeed, if youlook into details, the current implementation of
ocaml-gitused by MirageOS isthe
Memimplementation which is a simpleHashtbl.t(and only needs an hashalgorithm and the caml-runtime).
This PR wants to provide a possible other way to use
ocaml-gitwith MirageOS.So, the main problem is the needed implementation to
Makea Git store which iscurrently too POSIX-compliant.
Side-effect
However, the PR takes the opportunity to update and fix bugs which are
intrinsic:
Underlying needed layout
May be 2 years ago, I started to think the Git store as 2 spaces where:
the first one should contains recent objects (possibly volatile)
As know as loose objects - these objects take the opportunity of the
underlying file-system to store/search Git objects. The layout is close to a
simple radix tree over the hex-representation of the used hash algorithm
where:
the second one should contains lifelong objects
As know as pack files, which contains several objects.
Let's talk about minor (loose) and major (pack) heaps. From that, what these
spaces needs for
Unixworld and/or MirageOS world?For the minor heap, it should simple as it needs:
Where
append/appendv(atomic/non-atomic) create and fill the objectuidintot(which is a representation of the minor heap given by the user).For the UNIX world, a machinery of several syscalls is needed (
stat,create,writeandclose) and for MirageOS, we still able to use a simpleHashtbl.tor something better (about memory consumption/performance). But thereal constraint to fit into both worlds is:
As we said, for the Unix world, Git considers the file-system as a radix tree
where paths (keys) are the hash of the Git object.
For the major heap, it is a bit more complex where we can have several PACK
files to store several objects. Then, the indexation of these objects is done by
an
*.idxfile.So we can represent this space with:
By this interface, we assume that the creation of a PACK file (which contains
several Git objects) and the way to fill it should not be atomic (despite the
minor heap).
This interface is close to POSIX (but less close than what we currently have).
However, we can assume this interface as an Append-only interface. Again, this
interface can easily be replaced by a simple
Hashtbl.tor something better.For the Unix world, we can take the opportunity to use
Unix.O_APPEND.By this new design, the
Storeimplementation of Git can easily fit into aMirageOS without a huge requirement (as before when a real file-system was
needed).
However, an other space with some specific requirements exists. It's about the
way to store references in Git. Into details, this area is mutable (instead of
MinorandMajorand should ensure the {i atomicity} when we want to test andset a reference - similar to the [CAS][cas] atomic operation).
From all of these spaces, I think it's better to localise an error and to trace
what a simple
Git.read/Git.writereally does over these spaces.Git_unixprovides these spaces according the layout of a Git repository. And, even if in
reality these spaces work on a large common space (the file-system), we can
containerise them each others if we want.
New comers
Carton
May be 2 or 3 years ago, the idea to extract the design of the PACK file to be
usable by something else than Git came over the
cartonproject. Withdifferent iterations, the API was fixed one year ago and the plan to integrate
it into
ocaml-gitwas planified.The main goal of this sub-project is:
it is the possibility to load a PACK file into an unikernel with
caravan and have an other implementation of a read-only KV-store
for MirageOS.
By design,
cartonneeds only themapsyscall to read an object and theappendsyscall to generate a PACK file. It takes the opportunity to test thetype ('a, 's) iomore deeply (see limitation of such design, etc.) and itseems clear that the result is good enough to:
ocaml-git(not so) Minor updates
cartonleads the update ofdecompress.1.0.0andduff.0.3where:decompressfixed many bugs about the inflation/deflationand the process is faster than before. See these articles about
decompressdufffix the support of 32-bits to be able to usethis library (and by transitivity
ocaml-git) into some exotics architecturesTests over the PACK file
Of course, due to the separation between the Git's logic and the PACK file, we
are able to focus our tests over the format of the PACK file independently Git
assumptions (format of Git objects, hash algorithm, layout of Git repository).
Some fuzzers found into the official Git project was added to keep same
assumptions and the update take the opportunity to fix some bugs about the /PACK
engine/. All tests are available into
test/carton/directory.The intrinsic possibility about
ocaml-gitDue to the requirement of
cartonto be able to decode/encode a PACK file, thenew design on top of
cartonunlock the ability to reduce the definition of theMajor heap to the signature given above.
Loose object
Because the question about the PACK file is, now, resolved by
carton, weeasily can /formalise/ the way to extract a /loose/ object. Internally,
ocaml-gitcomes with a new sub-librarygit.loosewhich has 3 derivations:git.loose-lwtgit.loose-gitgit-unix.loose-unixThis sub-library (as
carton) unlocks the ability to shape this layout into theMinor interface given above. Of course, it adds the ability, again, to test this
part of
ocaml-gitwithout Git assumptions - where the layout is only aradix-tree of deflated objects.
Encore
A new release of [
encore][encore] is available where the API of this libraryis better than before. The question of
encoreis: how to produce an encoderand a decoder from a common description.
The new API take the opportunity of GADTs to propose a DSL to describe a format.
From it, we are able to derive an
angstromparser or alavoisierencoder.
From this update, I did not get any regressions from tests and the encoder was
simplified to focus on the initial goal of
encore: ensure the /isomorphism/between the encoder and the decoder.
This update takes the opportunity to fix a bug about
ocaml-gitwhen we needs to extract a large object. A test was added to ensure
that we properly fix the problem.
Finally, the update of
encoreunlocks the ability to compileocaml-gitwithjs_of_ocamland fix the issue about that.Conduit
See conduit about that.
Not So Smart (nss)
Since the version 2 of
ocaml-git, I discovered several bugs about the way topushorpulla Git repository. Even if in most of the case,ocaml-gitworks, it appears that the negotiation engine does something wrong.
I decided to rewrite it and fix problems about the negotiation engine.
Then, according the work from @hannesm, I decided to properly integrate a way to
use SSH (with
aws-ssh). Of course, on this way, the new version ofconduithelps me to do what I want.
But the biggest change is to delete the duplicate between the TCP, the HTTP and
the SSH implementation of the Smart protocol. Indeed, even if Git does the
same when it wants to
push/fetch, some details exist and the currentversion of
ocaml-gitalready integrate some (not right) divergences betweenthe TCP and the HTTP implementation.
Restart from zero and focus on what the negotiation engine really does to be
able to use into any layered protocols was the goal. Thanks to
colombeto give me the key about the right abstraction.Transparent integration with
ocaml-gitThe Smart protocol wants to do only 2 things:
From these 2 tasks, the idea of the Git format, the layout of the store or more
generally the idea of a Git repository is outside the scope of the protocol.
nsswants to provides only a way to get or send a PACK file from a context -by this way, requirements to do the negotiation are limited into few operations:
Again, the notion of a Git object is outside the implementation of the PACK file
(
carton), sonssdoes not need to know the format of a Git commit but onlythe way to get parents of a commit.
Then, from a set of commit, we should be able to create a PACK file (
push).About the
fetchoperation, it is a bit more complex when we must analyse thePACK file to produce an index of it. But, again, all of these operations are
available outside Git's notions - and, of course, outside the Git scope.
Regression
Of course, the first goal of
nsswas to fix negotiation bugs and delete theduplicate between TCP, SSH and HTTP protocols. All previous regression tests was
added and works and all buggy situations such as this trouble was
added over all protocols (mostly to ensure a good behaviour of our negotiation
engine).
However, the negotiation engine of Git and
ocaml-gitis not welldefined/formalised. We can imagine an other perspective such a version 3 of the
Smart protocol to be able to
fetch/push- but this is not the goal of thisPR, it's definitely a cool and close goal for
ocaml-githowever.Performances
I did not do some benchmarks but the only update to
decompress.1.0.0helps usabout performance of course. Then, the scheduling between the protocol process,
the reception of the PACK and the analyse of it (this what you do when you
git clone) seems better. A macro benchmark tells to us that this new implementationis faster than before.
However, I did not have the time to benchmark all of that and mostly trust on
the work done on
decompressto say that it's faster than before.Functor or not functor?
carton,nssorloosewere made into the same design, without the logic ofthe I/O scheduling. With this new view, functors are used in parsimony and
globally at the end of the development process to provide an easy-to-use API
over LWT or ASYNC.
More concretely, any types defined in these sub-libraries are outside the scope
of functors and their definitions don't depends on the I/O scheduling.
The Git core library follows this new design where the existence of the commit,
for example, does not depends from the application of a functor. The functor
only specialise the definition with the given new type.
At another layer, such as the Value module, we have less constraints
and it more easy for the compiler to infer a type equality even if we forget to
add a constraint.
In that case, every types provided by
gitand functions to manipulates them(without a knowledge of the hash algorithm used) are defined outside the scope
of functors.
Conclusion
I think this PR adds a lot of possibilities for MirageOS and it is a really
step-forward about performances and compatibilities with Git and its behaviour.
It paves the way for a better integration with MirageOS of course and open some
possibilities such as:
The split clears the way to add some others logic which are more close to Git
than the format of the PACK file, the loose file or the way to synchronise a Git
repository with a peer.
Finally, the Git core library is only about Git:
These pieces co-exist together but can be use separately.