Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Batch implement a NodeAdder interface #46

Merged
merged 2 commits into from
Oct 26, 2018
Merged

Make Batch implement a NodeAdder interface #46

merged 2 commits into from
Oct 26, 2018

Conversation

hsanjuan
Copy link
Contributor

Needed for: ipfs/go-unixfs#41

@hsanjuan hsanjuan self-assigned this Oct 22, 2018
hsanjuan added a commit to ipfs/kubo that referenced this pull request Oct 22, 2018
@ghost ghost added the status/in-progress In progress label Oct 22, 2018
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're implementing a DAGService, we need to behave like one. I can think of three ways to fix this:

  1. Always flush before we perform a get/delete.
  2. Remove queued blocks (complicated).
  3. Split out a new NodeWriter interface.
  4. Split out

batch.go Outdated
@@ -124,6 +126,39 @@ func (t *Batch) Add(nd Node) error {
return t.err
}

// Remove removes a node from the DAGService. This is a non-batching
// operation.
func (t *Batch) Remove(ctx context.Context, c cid.Cid) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to remove the node from the queue (or flush the queue) and then remove the node.

batch.go Outdated

// RemoveMany calls RemoveMany in the underlying DAGService.
func (t *Batch) RemoveMany(ctx context.Context, nodes []cid.Cid) error {
return t.ds.RemoveMany(ctx, nodes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, needs to remove from the queue.

batch.go Outdated

// Get calls Get from the underlying DAGService.
func (t *Batch) Get(ctx context.Context, c cid.Cid) (Node, error) {
return t.ds.Get(ctx, c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. Needs to get from the queue.

batch.go Outdated

// GetMany calls GetMany from the underlying DAGService.
func (t *Batch) GetMany(ctx context.Context, nodes []cid.Cid) <-chan *NodeOption {
return t.ds.GetMany(ctx, nodes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@Stebalien
Copy link
Member

Personally, I think separating out a NodeWriter interface is probably the right way to go.

@hsanjuan
Copy link
Contributor Author

@Stebalien you mean NodeWriter being an interface with a single Add method, and using NodeWriter directly in places which only Add() (importers) instead of a full featured DAGService (and Batch would implement it) ?

@Stebalien
Copy link
Member

type NodeWriter interface {
    Add(context.Context, Node) error
    AddMany(context.Context, []Node) error
}

(we'd also need Batch to implement AddMany but that shouldn't be difficult).

We don't have to do it this way, I just don't want batch.Add(ctx, x); batch.Get(ctx, x.Cid()) to fail.

(as names, NodeWriter is even worse than NodeGetter...)

@hsanjuan
Copy link
Contributor Author

So it's one more interface to the mix (NodeAdder ?) vs. having a suboptimal DAGService implementation for methods we don't use right now.

Before, doing Get on the DAGService while adding stuff with the batch-importers would cause the same problems (Get on a recently added node could fail it was still uncommitted).

I'm leaning towards just doing Commit before non-add operations and documenting that Batch does it this way and if someone ever have an interest in improving it can always be done, but I'm honestly not sure what's best (both work, both are easy to implement).

@Stebalien
Copy link
Member

having a suboptimal DAGService implementation for methods we don't use right now.

No, having a broken DAGService implementation that sometimes works and sometimes doesn't. It doesn't matter if we're not using it, the issue is that it looks like it should work but won't.

Before, doing Get on the DAGService while adding stuff with the batch-importers would cause the same problems (Get on a recently added node could fail it was still uncommitted).

Yes, but that makes perfect sense. The "batch" is effectively a transaction. However, if we're going to claim that Batch implements DAGService, it needs to actually implement DAGService.

I'm leaning towards just doing Commit before non-add operations and documenting that Batch does it this way and if someone ever have an interest in improving it can always be done, but I'm honestly not sure what's best (both work, both are easy to implement).

We can, of course, do that. However, I really don't see the issue adding a new interface. It gives us a clean divide so applications can pick what they want to use:

  • NodeGetter - read-only
  • NodeAdder - append-only
  • DAGService - RW dag service

@hsanjuan
Copy link
Contributor Author

No, having a broken DAGService...

I meant with the Commit before Get fix etc. It's suboptimal because it may make Gets really slow/defeat the purpose of batching.

However, I really don't see the issue adding a new interface

Ok i'll do that then

@Stebalien
Copy link
Member

I meant with the Commit before Get fix etc. It's suboptimal because it may make Gets really slow/defeat the purpose of batching.

Ah, got it. Really, either way is fine with me as long as whatever we do is correct. However, this is really like the io.Reader/io.Writer split so I think it'll be fine.

@hsanjuan hsanjuan changed the title Make Batch implement a DAGService Make Batch implement a NodeAdder interface Oct 25, 2018
batch.go Outdated
// commiting them as needed.
func (t *Batch) AddMany(ctx context.Context, nodes []Node) error {
for _, n := range nodes {
err := t.Add(ctx, n)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could potentially append all nodes directly but that might get us way over maxSize/maxNodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't really be an issue. Going over max nodes/size should just trigger a flush (we don't want to buffer too much in memory).

Maybe we should just:

  1. Make Add call AddMany`.
  2. Move the main logic to AddMany (and process all nodes at once).

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can mess with performance later. We should probably also be obeying the context but that'll be a bit tricky.

@hsanjuan
Copy link
Contributor Author

Have a look to latest commit (cfe76e8) @Stebalien

I still need to update the gx tag before merging.

@Stebalien
Copy link
Member

@hsanjuan LGTM.

This adds a NodeAdder interface (which partially implements
a DAGService), and adjusts the Batch type to satisfy it.
@hsanjuan hsanjuan merged commit c96181f into master Oct 26, 2018
@ghost ghost removed the status/in-progress In progress label Oct 26, 2018
@hsanjuan hsanjuan deleted the fix/batchds branch October 26, 2018 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants