Skip to content
This repository has been archived by the owner on Mar 12, 2020. It is now read-only.

[WIP] IPFS Cluster Integration #831

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

[WIP] IPFS Cluster Integration #831

wants to merge 12 commits into from

Conversation

sanderpick
Copy link
Member

@sanderpick sanderpick commented Jun 16, 2019

Adds the ability to sidecar IPFS Cluster on Textile.

Changelog

coming

Closes

Fixes

  • The --repo flag was no longer parsing ~ (missed in the kingpin refactor)

TODO

  • Implement IPFSConnector (ipfs/cluster.go)

@balupton

This comment has been minimized.

cmd/main.go Show resolved Hide resolved
@sanderpick

This comment has been minimized.

@sanderpick
Copy link
Member Author

sanderpick commented Jun 17, 2019

Ok, @hsanjuan - I've got all the pieces in place here. However, running into an authorization error when calling SyncAll (https://github.com/textileio/go-textile/blob/sander/cluster/cluster/main_test.go#L188):

17:31:31.842 ERROR  p2p-gorpc: error handling RPC: client does not have permissions to this method, service name: Cluster, method name: SyncAllLocal server.go:176
17:31:31.859 DEBUG    cluster: rpc auth error client does not have permissions to this method, service name: Cluster, method name: SyncAllLocal cluster.go:1597

If I understand correctly, the RPC auth is based solely on the cluster secret. They are matching in the test-generated config files.

I'm putting this down for now but let me know if you have any quick ideas...

@sanderpick
Copy link
Member Author

Something occurred to me at dinner... I think the intention here was to not use a secret since the host needs to be on the main IPFS network. Instead, use the new trusted peers mechanism. That seems to work, but so far I haven't been able to get the pin state of the local daemon to reflect in the cluster consensus. More later...

@hsanjuan
Copy link
Collaborator

Something occurred to me at dinner... I think the intention here was to not use a secret since the host needs to be on the main IPFS network. Instead, use the new trusted peers mechanism. That seems to work, but so far I haven't been able to get the pin state of the local daemon to reflect in the cluster consensus. More later...

Yes, correct. You will need to set TrustedPeers to the peer IDs of your other cafes (suggestions to improve this for your usecase accepted). Cluster secret is not used at all since you already have a host without a network protector.

Can I get write access here, I will have a look to the code and it may be useful now or in the future to commit some changes to the branch (simpler than a PR to the PR :P)

}

func (c *Connector) SetClient(client *rpc.Client) {
// noop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in Shutdown you will need to copy from https://github.com/ipfs/ipfs-cluster/blob/master/ipfsconn/ipfshttp/ipfshttp.go#L191

The main Cluster component will SetClient with an RPC client that allows this component to talk to any component in any peer in the Cluster. It is also the signal that the component can move forward with any internal tasks (the normal IPFSConnector implementation will wait for a bit and trigger an automatic ipfs swarm connect to every other daemon attached to every other known cluster peer).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since you don't use rpc at all, it's actually ok like this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

return statusMap, nil
}

func (c *Connector) ConnectSwarms(ctx context.Context) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For information: this will get triggered when a new cafe "bootstraps" (uses Cluster.Join(<peer maddr>)). It will be called in the cluster peer it bootstraps to. If textile never calls Join() to introduce new peers to the Cluster*, then it can be a noop.

*Join() is not strictly necessary with CRDTs, but it is a way of getting two libp2p hosts in the same cluster connected (and this should 1) allow pubsub to start working (otherwise the peer may never receive pubsub messages because it doesn't know any of the other peers subscribed to them), 2) Potentially allow dht-discovery of other cluster peers and better connectivity.

return nil
}

func (c *Connector) SwarmPeers(ctx context.Context) ([]peer.ID, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only used by ConnectGraph (which generates a .dot file with cluster and ipfs daemons connections).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice that's cool

}

func (c *Connector) RepoStat(ctx context.Context) (*api.IPFSRepoStat, error) {
stat, err := corerepo.RepoStat(ctx, c.node)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check: we call ipfs with size-only=true. https://github.com/ipfs/ipfs-cluster/blob/master/ipfsconn/ipfshttp/ipfshttp.go#L600

This makes the RepoStat call WAY faster (no full node count).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return err
}

if listenAddr != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I think we don't use it. This is another of the options (like secret) which are only used to configure the libp2p host and not in Cluster per-se.

However the config will fail to validate if unset. I think it makes sense to set it to the real listen endpoint of the cafe (like here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, cool. Should be easy enough to just copy the value used in the IPFS config.

numpinInfCfg *numpin.Config,
) (ipfscluster.Informer, ipfscluster.PinAllocator, error) {
switch name {
case "disk", "disk-freespace":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you want to default to this one. The others are not so useful, but maybe a good placeholder to have this for the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

peerName string,
) (ipfscluster.PinTracker, error) {
switch name {
case "map":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the moment defaulting to this one makes most sense. We want to level up the stateless one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

return err
}

ipfscluster.ReadyTimeout = raft.DefaultWaitForLeaderTimeout + 5*time.Second
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can ignore this (like set to 5 seconds or leave default). Raft is not involved anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@@ -436,6 +436,9 @@ Stacks may include:
initCafeOpen := initCmd.Flag("cafe-open", "Open the p2p cafe service for other peers").Bool()
initCafeURL := initCmd.Flag("cafe-url", "Specify a custom URL of this cafe, e.g., https://mycafe.com").Envar("CAFE_HOST_URL").String()
initCafeNeighborURL := initCmd.Flag("cafe-neighbor-url", "Specify the URL of a secondary cafe. Must return cafe info, e.g., via a Gateway: https://my-gateway.yolo.com/cafe, or a cafe API: https://my-cafe.yolo.com").Envar("CAFE_HOST_NEIGHBOR_URL").String()
initIpfsCluster := initCmd.Flag("cluster", "Treat the node as an IPFS Cluster peer").Bool()
initIpfsClusterBindMultiaddr := initCmd.Flag("cluster-bind-maddr", "Set the IPFS Cluster multiaddrs").Default("/ip4/0.0.0.0/tcp/9096").String()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically cluster won't bind to anything as it re-uses your already existing ipfs peer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. That makes sense. I can remove this flag then.


// noop if no bootstraps
// if bootstrapping fails, consensus will never be ready
// and timeout. So this can happen in background and we
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not true for CRDT consensus. it will work regardless of bootstrap. Old comment in cluster, we will fix it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@sanderpick
Copy link
Member Author

Thanks for the comments @hsanjuan! You should have write access now.

Copy link
Collaborator

@hsanjuan hsanjuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some more comments. I don't know if you want to keep the functions like SetupConsensus etc. since the user cannot really choose, so I think you can just create the right component directly simplifies and reduces the amount of code.


// @todo handle maxDepth
func (c *Connector) Pin(ctx context.Context, cid icid.Cid, maxDepth int) error {
return c.api.Pin().Add(ctx, path.New(cid.String()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For record. The original ipfs-cluster connector, by default, calls refs -r <cid> and then pin. The reason is many refs -r can happen in parallel, while only one pin does. For the general case, this way works better.

On the other hand, refs -r uses a non-parallel dag walking approach while pin uses the async, faster version.

The original cluster-pin also uses the RPC to request an update of the freespace metrics every 10th pin.

}

func (c *Connector) RepoStat(ctx context.Context) (*api.IPFSRepoStat, error) {
stat, err := corerepo.RepoStat(ctx, c.node)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfgs.ClusterCfg.ListenAddr = addr
}

return cfgMgr.SaveJSON(ConfigPath(repoPath))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't want your users to modify cluster configs, rather you want to give them something that "just works" and let them adjust some paramemeters via go-textile flags? how's the approach with the embedded ipfs daemon now, can they manually edit its configuration?

log.Errorf("bootstrap to %s failed: %s", bstrap, err)
} else {
for _, p := range cluster.Peers(ctx) {
err = cons.Trust(ctx, p.ID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this. The bootstrap concept in cluster is part of the daemon binary (not of the cluster library itself), so while we can fix this there (ipfs-cluster/ipfs-cluster#834), it won't help you.

clusterCfg := &ipfscluster.Config{}
crdtCfg := &crdt.Config{}
maptrackerCfg := &maptracker.Config{}
statelessCfg := &stateless.Config{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not worth registering if not using, same for numpinInformer.

@sanderpick sanderpick added the on-hold waiting for an event to happen label Nov 21, 2019
@sanderpick sanderpick removed their assignment Jan 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
on-hold waiting for an event to happen
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cafe peers should optionally use IPFS Cluster
3 participants