Versioning: Commit + Repo Datastructures #23

jbenet · 2015-07-02T07:31:33Z

Versioning has been a long time coming.

We need to construct the necessary data types before we start making any tooling around it. The types

The SYNTAX of the "merkldag DSL" is still TBD (#22), but for now using go-like

first, some types we need

// Any is any merkledag Node

type Identity struct {
  Key SigningKey // link to a signing key

  Data struct {
    Name string // the "name" of the identity
  }
}

type Authorship struct {
  Author Identity

  Data struct {
    Date string // ISO timestamp in UTC?
  }
}

type Signature struct {
  Object Any        // link to the signed object
  Key    SigningKey // link to the signing key

  Data struct {
    Signature []byte // the signature bytes
  }
}

// generic type that terminates in a certain other leaf type
type Tree<LEAF_TYPE> struct {
  NAME Link<Tree | LEAF_TYPE>
  ...
}

the versioning data types

type Commit struct {
  Parents   []Commit     // "parent0" ... "parentN"
  Author    Authorship   // link to an Authorship
  Committer Authorship   // link to an Authorship
  Object    Any          // what we version ("tree" in git)

  Data struct {
    Comment String // describes the commit
  }
}

type VersionRepository struct {
  Refs Tree<Commit> // hierarchy of {branches, tags, heads, remotes, ... }
  Logs Tree<File>   // reflogs, etc... (maybe should be other than files...)
}

chriscool · 2015-07-02T18:31:43Z

It seems to me that in Git when you sign a commit the signature is part of the commit. So you cannot remove the signature without changing the commit sha1.

And commit trailers (like Signed-off-by) are very useful in Git and may deserve something special.

Also Tree and VersionRepository are defined but not used.

coder5876 · 2015-07-09T04:24:48Z

Should we have email as part of Identity too, like Git does? Probably don't want that since they key is a stronger identifier.
In the example Commit struct there is no Signature, I guess this would be optional depending on if you want to sign the commit or not. Having the Signature in the commit would make the signature part of the commit as @chriscool mentioned.

ion1 · 2015-09-22T18:59:25Z

Please consider adding a variant of a merge commit whose meaning is history rewriting.

The first parent will point to the new history. User interfaces are supposed to act as if it was the only parent unless the user requests otherwise. Where possible, the rewrite commit could be rendered as something like an unobtrusive collapsed bar between commit messages.

The hidden secondary parent will point to the history that was “rewritten”.

This would alleviate much of the need Git has for forced pushes in development branches to keep the commit history clean. It would also let one view what changed in the “rewrite” unlike with Git rewrites.

jbenet · 2015-09-23T10:41:00Z

@ion1 interesting idea!

ELLIOTTCABLE · 2016-04-01T06:29:57Z

@chriscool as a relevant note, I'd often argue against adding first-class tagging to a git-ish system.

(There's some relevant discussion on my own approach to avoiding that on top of Git itself, see ELLIOTTCABLE/.gitlabels.)

nothingmuch · 2016-05-25T15:08:38Z

Git's lack of multiple author support is an oft cited limitation, I think a logical AND of authorships would be useful to include instead of a post hoc way of embedding that in the identity, since that would require parsing, etc.

nothingmuch · 2016-05-25T15:13:05Z

@ion1, @ELLIOTTCABLE I think the most appealing way to address that is to have more than just a "parent" relationship between commits, which ties this into the debate about first class tagging and potentially also various trailers in the comments.

Since there's nothing preventing the Object field from being a commit, parallel histories could be related by decorating both of them from the outside with a third one, for example, but that's far from the only approach.

mcast · 2016-10-05T16:31:18Z

Do the data structures imply that the native Git objects would need to be translated when crossing the ipfs boundary?

I understand that the ipfs hashtree structure is different from the Git blobid, so the two aren't directly compatible. Is it necessary to generate a new id to store a git object (or pack, if the tools could find out what any of them might be called) in the DHT?

My concern is that if the data structures don't provide an exact isomorphism with the Git objects used in any given repo, there will be a lossy translation. It has to be lossless, doesn't it?

(On objectid->packid, serving something like a 302 Found or an extra returned header might help efficiency, then you only need DHT entries for the commits and the rest can come from a pack. Or maybe I need to read more about ipfs.)

ghost · 2016-10-05T17:11:25Z

@mcast with CID and IPLD, we'll be able to just reference the unchanged git objects/packs/blobs/trees.

kehao95 · 2018-01-08T04:40:40Z

Hi, it's been a while since the last update. Is there any update on this topic? Thanks for all your hard work. We would like to try IPFS in our product but we need the versioning feather to be ready. Where can I track the status of this feature?

osarrouy · 2018-02-21T11:39:37Z

Hi everyone. Same question than @kehao95 here :)

Anyway to track the status of this issue ?

Stebalien · 2018-02-21T21:41:28Z

Unfortunately, no. We don't have native versioning.

We do now have git object support in IPLD: https://github.com/ipfs/go-ipfs/blob/master/docs/plugins.md, https://github.com/ipfs/go-ipld-git/. However, that has some limitations (no sharding, for one).

RubenKelevra · 2020-06-17T19:42:51Z

It would be nice if we can add a diff file to each commit. This would enable us to remove the pinning for the sub-cid of the older version and just keep the diff pinned.

You may know the creation of patches/diffs of large binary files as very resource-intensive, but zstd now supports the ability to created diffs from two files up to 2 GB - which is extremely space-efficient and fast.

The diffs can just be used in one direction. So creating patches backward makes the most sense. This way IPFS can create on the fly older versions from the patch if necessary.

rht mentioned this issue Sep 15, 2015

git on IPFS (ipld-git) #45

Open

cryptix mentioned this issue Sep 20, 2015

@domschiener's project #47

Open

jbenet mentioned this issue Oct 28, 2015

IPFS as a backend to a web archiving ipfs-inactive/archives#28

Open

This was referenced Nov 17, 2015

ipscend screenshot ipfs-shipyard/ipscend#2

Closed

git/versioning things ipfs/awesome-ipfs#21

Closed

Git-style Versioning ipfs-inactive/faq#75

Closed

dignifiedquire mentioned this issue Nov 30, 2015

Sprint Nov 30 ipfs/team-mgmt#60

Closed

14 tasks

daviddias mentioned this issue Jan 2, 2016

Commit Data Structure ipfs/kubo#1188

Closed

daviddias mentioned this issue Jan 9, 2016

Add Group add ipfs-inactive/http-api-spec#17

Merged

5 tasks

daviddias changed the title ~~Commit + Repo Datastructures~~ Versioning: Commit + Repo Datastructures Feb 23, 2016

leerspace mentioned this issue Sep 14, 2017

How to use IPFS to keep track of history for every file? ipfs/kubo#4225

Open

jbenet added the Candidate Dev RFP label Jul 9, 2018

daviddias mentioned this issue Nov 28, 2018

chore: adds meeting notes for 2018-11-27 bi-weekly sync ipfs-inactive/dynamic-data-and-capabilities#51

Merged

daviddias mentioned this issue May 22, 2019

[Feature] Versioning #373

Closed

RubenKelevra mentioned this issue Jun 17, 2020

File/folder versioning ipfs/kubo#7486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Versioning: Commit + Repo Datastructures #23

Versioning: Commit + Repo Datastructures #23

jbenet commented Jul 2, 2015

chriscool commented Jul 2, 2015

coder5876 commented Jul 9, 2015

ion1 commented Sep 22, 2015

jbenet commented Sep 23, 2015

ELLIOTTCABLE commented Apr 1, 2016

nothingmuch commented May 25, 2016

nothingmuch commented May 25, 2016

mcast commented Oct 5, 2016

ghost commented Oct 5, 2016

kehao95 commented Jan 8, 2018

osarrouy commented Feb 21, 2018

Stebalien commented Feb 21, 2018

RubenKelevra commented Jun 17, 2020

Versioning: Commit + Repo Datastructures #23

Versioning: Commit + Repo Datastructures #23

Comments

jbenet commented Jul 2, 2015

first, some types we need

the versioning data types

chriscool commented Jul 2, 2015

coder5876 commented Jul 9, 2015

ion1 commented Sep 22, 2015

jbenet commented Sep 23, 2015

ELLIOTTCABLE commented Apr 1, 2016

nothingmuch commented May 25, 2016

nothingmuch commented May 25, 2016

mcast commented Oct 5, 2016

ghost commented Oct 5, 2016

kehao95 commented Jan 8, 2018

osarrouy commented Feb 21, 2018

Stebalien commented Feb 21, 2018

RubenKelevra commented Jun 17, 2020