Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove usage of merkledag.Link.Node pointer outside of merkledag #1902

Closed
wants to merge 60 commits into from

Conversation

mildred
Copy link
Contributor

@mildred mildred commented Oct 26, 2015

This prepares for inclusion of IPLD where the Node pointer won't be there. In any case, we should probably move away from using the struct attributes and use functions to access them instead. This will make it easier to plug in IPLD in place of merkledag.

tv42 and others added 30 commits September 11, 2015 10:09
This used to lead to large refcount numbers, causing Flush to create a
lot of IPFS objects, and merkledag to consume tens of gigabytes of
RAM.

License: MIT
Signed-off-by: Jeromy <[email protected]>
OS X sed is documented as "-i SUFFIX", GNU sed as "-iSUFFIX". The one
consistent case seems to be "-iSUFFIX", where suffix cannot empty (or
OS X will parse the next argument as the suffix).

This used to leave around files named `refsout=` on Linux, and was
just confusing.

License: MIT
Signed-off-by: Jeromy <[email protected]>
These secondary copies were never actually queried, and didn't contain
the indirect refcounts so they couldn't become the authoritative
source anyway as is. New goal is to move pinning into IPFS objects.

A migration will be needed to remove the old data from the datastore.
This can happen at any time after this commit.

License: MIT
Signed-off-by: Jeromy <[email protected]>
Pinner had method GetManual that returned a ManualPinner, so every
Pinner had to implement ManualPinner anyway.

License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
Platform-dependent behavior is not nice, and negative refcounts are
not very useful.

License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>

sharness: Don't assume we know all things that can create garbage

License: MIT
Signed-off-by: Jeromy <[email protected]>
WARNING: No migration performed! That needs to come in a separate
commit, perhaps amended into this one.

This is the minimal rewrite, only changing the storage from
JSON(+extra keys) in Datastore to IPFS objects. All of the pinning
state is still loaded in memory, and written from scratch on Flush. To
do more would require API changes, e.g. adding error returns.

Set/Multiset is not cleanly separated into a library, yet, as it's API
is expected to change radically.

License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
There was doublewrapping with an unneeded msgio. given that we
use a stream muxer now, msgio is only needed by secureConn -- to
signal the boundaries of an encrypted / mac-ed ciphertext.

Side note: i think including the varint length in the clear is
actually a bad idea that can be exploited by an attacker. it should
be encrypted, too. (TODO)

License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
* ID service stream
* make the relay service use msmux
* fix nc tests

Note from jbenet: Maybe we should remove the old protocol/muxer
and see what breaks. It shouldn't be used by anything now.

License: MIT
Signed-off-by: Jeromy <[email protected]>
Signed-off-by: Juan Batiz-Benet <[email protected]>
The addition of a locking interface to the blockstore allows us to
perform atomic operations on the underlying datastore without having to
worry about different operations happening in the background, such as
garbage collection.

License: MIT
Signed-off-by: Jeromy <[email protected]>
This commit improves (fixes) the FetchGraph call for recursively
fetching every descendant node of a given merkledag node. This operation
should be the simplest way of ensuring that you have replicated a dag
locally.

This commit also implements a method in the merkledag package called
EnumerateChildren, this method is used to get a set of the keys of every
descendant node of the given node. All keys found are noted in the
passed in KeySet, which may in the future be implemented on disk to
avoid excessive memory consumption.

License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Juan Batiz-Benet <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>

dont GC blocks used by pinner

License: MIT
Signed-off-by: Jeromy <[email protected]>

comment GC algo

License: MIT
Signed-off-by: Jeromy <[email protected]>

add lock to blockstore to prevent GC from eating wanted blocks

License: MIT
Signed-off-by: Jeromy <[email protected]>

improve FetchGraph

License: MIT
Signed-off-by: Jeromy <[email protected]>

separate interfaces for blockstore and GCBlockstore

License: MIT
Signed-off-by: Jeromy <[email protected]>

reintroduce indirect pinning, add enumerateChildren dag method

License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
tv42 and others added 18 commits September 11, 2015 12:11
License: MIT
Signed-off-by: Tommi Virtanen <[email protected]>
To test it, set up an S3 bucket (in an AWS region that is not US
Standard, for read-after-write consistency), run `ipfs init`, then
edit `~/.ipfs/config` to say

      "Datastore": {
        "Type": "s3",
        "Region": "us-west-1",
        "Bucket": "mahbukkit",
        "ACL": "private"
      },

with the right values. Set `AWS_ACCESS_KEY_ID` and
`AWS_SECRET_ACCESS_KEY` in the environment and you should be able to
run `ipfs add` and `ipfs cat` and see the bucket be populated.

No automated tests exist, unfortunately. S3 is thorny to simulate.

License: MIT
Signed-off-by: Tommi Virtanen <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
Last argument was dropped in ffd4c3f

License: MIT
Signed-off-by: Tommi Virtanen <[email protected]>
Without this, all entries will have nlink==0, which confuses a bunch
of tools. Most dramatically, systemd-nspawn enters a busy loop in its
lock utility function.

License: MIT
Signed-off-by: Tommi Virtanen <[email protected]>
This used to cause files e.g. being edited with `vi` to become 0-size.

License: MIT
Signed-off-by: Tommi Virtanen <[email protected]>
Callers assume this is safe to call whenever, let's make it so.

License: MIT
Signed-off-by: Tommi Virtanen <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
// GetNodeAndCache return the MDAG Node that the link points to and store a
// pointer to that node along with the link to speed up further retrivals. A
// timeout is to be specified to avoid taking too much time.
func (l *Link) GetNodeAndCache(ctx context.Context, serv DAGService, timeout time.Duration) (*Node, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout should be specified on the context using context.WithTimeout on the caller.

@mildred
Copy link
Contributor Author

mildred commented Oct 27, 2015

Note: We have two ways to go now:

  • Make go-ipld gold a pointer to the Node object. It won't be as straightforward, and that's why we must limit usage of that pointer to the merkledag package (and have other packages uses functions instead)

  • Remove the notion that a merkledag.Node can hold a pointer to its linked nodes, and refactor the API to reflect that.

    If we decide to refactor the API, the following functions will need to be modified:

    • DAGService.AddRecursive(Node)
    • DAGService.Remove(Node)
    • Node.AddNodeLink(string, Node)
    • Node.AddRawLink(string, Link)
    • Node.UpdateNodeLink(string, Link): Only the codehttp PUT handler uses it, and this is being refactored in PR fixing putHandler #1886
    • Link.GetNode(...) : has the ability to query the DAGService every time if the Node pointer is not there (may be slower, how much?)
    • Node.GetLinkedNode(..., string) : same as above

    A simple approach could be to introduce a NodeTree type that will hold a root node and a collection of Node dependencies. When constructing a DAG, we will use the NodeTree type instead of Node. AddRecursive(Node) will have to be renames AddTree(NodeTree).

@jbenet
Copy link
Member

jbenet commented Oct 28, 2015

let's do this on top of dev0.4.0 branch

@jbenet
Copy link
Member

jbenet commented Oct 28, 2015

see also discussion in ipld/go-ipld-deprecated#14

This prepares for inclusion of IPLD where the Node pointer won't be there.

License: MIT
Signed-off-by: Mildred Ki'Lya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants