Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Support for multiple blockstores. #3119

Open
kevina opened this issue Aug 24, 2016 · 3 comments
Open

Discussion: Support for multiple blockstores. #3119

kevina opened this issue Aug 24, 2016 · 3 comments
Labels
need/analysis Needs further analysis before proceeding need/community-input Needs input from the wider community

Comments

@kevina
Copy link
Contributor

kevina commented Aug 24, 2016

The filestore is a datastore, but it is only designed to handle a subset of the blocks that can be used in IPFS, therefore the main datastore is still needed and some sort of support for multiple block or datastores is needed so both the filestore and datastore can coexist. This is a required infrastructure change in order to land #2634.

The following describes how it is currently implemented now. Please let me know if you agree with and understand the changes. Once there is a general consensus I can separate out the non-filestore bits to support this infrastructure change so we can work through the implementation details.

Sorry if it is a bit long.

@whyrusleeping please CC anyone else who should be involved.

Overview

There are several ways to support the "filestore". What I believe makes the most sense and will be the easiest to implement will be to support a "cache" and then any number of additional "aux"
datastores with the following semantics:

  • When looking up a block the "cache" is first tried, if the block is not found then each "aux" datastores is tried in turn. The order of the "aux" datastores will explicitly set by the user.
  • Any operations that modify the datastore only act on the "cache".
  • The "aux" datastores are allowed to read-only. When they are not additional specialized API calls will be required for adding or removing data from the "aux" datastores.
  • Each of these datastore is given a name and can be accessed via its name from the repo.
  • Duplicate data should be avoided if possible, but not completely disallowed.

These rules imply that the garbage collector should only attempt to remove data from the "cache" and leave the other datastores alone.

High level implementation details

The multiplexing can either happen at the datastore or the blockstore level. I originally implemented it at the datastore level but changed it to the blockstore level to better interact with caching. The filestore is still implemented as a datastore (for now).

In the fsrepo Normal blocks are mounted under the /blocks prefix (this is unchanged) the filestore is mounted under the /filestore prefix (this is new). The fsrepo has been enhanced to be able to retrieve the underlying datastore based on its prefix. (This is required by the filestore.)

The top-level blockstore is now a multi-blockstore that works by checking a pre-configured set of prefixs in turn in order to find a matching key. Each mount is wrapped in its own blockstore with its
own caching semantics. The first mount "/blocks" is considered the cache and all Put and Deletes only go to the cache. The multiblock store interface is as follows:

type MultiBlockstore interface {
    GCBlockstore
    FirstMount() Blockstore // returns the first mount
    Mounts() []string       // lists the mounts
    Mount(prefix string) Blockstore  // returns a mount by name
    Locate(key key.Key) []LocateInfo // lists all locations of a block
}

The garbage collector uses FirstMount().AllKeysChan(ctx) to get the list of blocks for the list of to try and delete.

Any caching is currently only done on the first mount for now.

As an implementation detail it is worth noting that files are removed or added to the filestore directly using a specialized interface that bypasses the normal Blockstore and Filestore interface. This was discussed with @whyrusleeping (#2634 (comment)).

Duplicate blocks (that is blocks found under more than one mount) are not forbidden as doing so would be impractical. The Locate() method can be used to discover what mount a block is found. It will list all mounts which can be used to help eliminate the duplicates.

Other uses

The two mounts /blocks and /filestore are currently hard coded, with some effort this can be made into a more general purpose mechanize to support multiple blockstores.

One use case I can think of is to have a separate read-only datastore to store permanent content as a alternative to maintaining a large pin-set which currently has performance problems. The datastore could even be on a readonly filesystem to prevent any possibly of the data accidental being deleted either by user error or a software bug. Some additional design decisions will need to made for this so I am not proposing it right now, but merely offering it as a possibility.

Another possibility is to support a local cache on a local filesystem and a larger datastore on the cloud.

@kevina kevina added the need/community-input Needs input from the wider community label Aug 24, 2016
@jbenet
Copy link
Member

jbenet commented Aug 26, 2016

This is great! On my stack to review. I'm a bit behind (reviewing 0.4.3 to
ship and finishing ipld/cid)

@whyrusleeping whyrusleeping added the need/review Needs a review label Aug 31, 2016
@whyrusleeping whyrusleeping added the status/deferred Conscious decision to pause or backlog label Sep 14, 2016
kevina added a commit that referenced this issue Sep 25, 2016
Each datastore is mounted under a different mount point and a
multi-blockstore is used to check each mount point for the block.

The first mount checked of the multi-blockstore is considered the
"cache", all others are considered read-only.  This implies that the
garbage collector only removes block from the first mount.

This change also factors out the pinlock from the blockstore into its
own structure.  Only the multi-datastore now implements the
GCBlockstore interface.  In the future this could be separated out
from the blockstore completely.

For now caching is only done on the first mount, in the future this
could be reworked.  The bloom filter is the most problematic as the
read-only mounts are not necessary immutable and can be changed by
methods outside of the blockstore.

Right now there is only one mount, but that will soon change once
support for the filestore is added.

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
@Kubuxu Kubuxu added status/in-progress In progress and removed status/deferred Conscious decision to pause or backlog labels Sep 25, 2016
kevina added a commit to ipfs-filestore/go-ipfs that referenced this issue Oct 15, 2016
Each datastore is mounted under a different mount point and a
multi-blockstore is used to check each mount point for the block.

The first mount checked of the multi-blockstore is considered the
"cache", all others are considered read-only.  This implies that the
garbage collector only removes block from the first mount.

This change also factors out the pinlock from the blockstore into its
own structure.  Only the multi-datastore now implements the
GCBlockstore interface.  In the future this could be separated out
from the blockstore completely.

For now caching is only done on the first mount, in the future this
could be reworked.  The bloom filter is the most problematic as the
read-only mounts are not necessary immutable and can be changed by
methods outside of the blockstore.

Right now there is only one mount, but that will soon change once
support for the filestore is added.

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
kevina added a commit to ipfs-filestore/go-ipfs that referenced this issue Oct 15, 2016
Each datastore is mounted under a different mount point and a
multi-blockstore is used to check each mount point for the block.

The first mount checked of the multi-blockstore is considered the
"cache", all others are considered read-only.  This implies that the
garbage collector only removes block from the first mount.

This change also factors out the pinlock from the blockstore into its
own structure.  Only the multi-datastore now implements the
GCBlockstore interface.  In the future this could be separated out
from the blockstore completely.

For now caching is only done on the first mount, in the future this
could be reworked.  The bloom filter is the most problematic as the
read-only mounts are not necessary immutable and can be changed by
methods outside of the blockstore.

Right now there is only one mount, but that will soon change once
support for the filestore is added.

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
@kevina kevina added this to the Filestore implementation milestone Oct 19, 2016
@whyrusleeping whyrusleeping added status/ready Ready to be worked and removed status/in-progress In progress labels Nov 2, 2016
kevina added a commit that referenced this issue Nov 3, 2016
Each datastore is mounted under a different mount point and a
multi-blockstore is used to check each mount point for the block.

The first mount checked of the multi-blockstore is considered the
"cache", all others are considered read-only.  This implies that the
garbage collector only removes block from the first mount.

This change also factors out the pinlock from the blockstore into its
own structure.  Only the multi-datastore now implements the
GCBlockstore interface.  In the future this could be separated out
from the blockstore completely.

For now caching is only done on the first mount, in the future this
could be reworked.  The bloom filter is the most problematic as the
read-only mounts are not necessary immutable and can be changed by
methods outside of the blockstore.

Right now there is only one mount, but that will soon change once
support for the filestore is added.

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
@whyrusleeping
Copy link
Member

whyrusleeping commented Nov 4, 2016

NOTE: This is not directly related to this issue, but rather meant as a private note for @kevina.

Alright, so for the purpose of the filestore integration, heres what i would like to see.

A separate blockstore implementation for the filestore.

This will handle both filestore reads and writes and normal blockstore reads and writes based on the blocks it is given. (if the block is a FilestoreBlock with PosInfo then add it via the filestore.

ipfs add -P <args>

This will only operate if the filestore is enabled, and only if the daemon is not running. When run, instead of generating data blocks to put to disk, it should generate FilestoreBlocks with appropriate PosInfo, but everything else should behave normally. This includes pinning, recursive directory construction, and hashes (adding to the filestore should generate the same hashes as adding without).

reading

The filestoreblockstore should check both the normal blockstore and the filestore for the requested object, returning it as usual.

garbage collection

objects in the filestore should be treated no differently than objects in the blockstore. If a file in the filestore is changed/deleted/damaged, it should fail in roughly the same way that reading a corrupted block in the blocks/ directory would fail.

With the above, we should have basic filestore functionality integrated into ipfs. The only required changes to the existing codebase for integration should be in core/commands/add.go, core/{builder|core}.go, and potentially a little plumbing in the importer path for the PosInfo.

@meyerzinn
Copy link

meyerzinn commented Feb 14, 2017

Can we decouple garbage collection and datastores? I want to make a datastore that can be used by many nodes (Swift Object Store), thus giving the impression of a cluster--the only reason I have multiple nodes is for redundancy and chunking performance.

This could work well with Go plugins (https://tip.golang.org/pkg/plugin/), allow others to offer BlockStores as plugins. @whyrusleeping @kevina

Also, I want to manually handle garbage collection; the model I want to use mandates a separate index, which means IPFS's GC can't work properly for my use-case.

@Stebalien Stebalien added status/deferred Conscious decision to pause or backlog needs refinement and removed status/ready Ready to be worked status/deferred Conscious decision to pause or backlog need/review Needs a review labels Dec 18, 2018
@hsanjuan hsanjuan added need/analysis Needs further analysis before proceeding and removed needs refinement labels Apr 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/analysis Needs further analysis before proceeding need/community-input Needs input from the wider community
Projects
None yet
Development

No branches or pull requests

7 participants