-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dedup keys in GetMany #4888
dedup keys in GetMany #4888
Conversation
Otherwise, GetMany on the children of a node with duplicate links may fail License: MIT Signed-off-by: Steven Allen <[email protected]>
License: MIT Signed-off-by: Steven Allen <[email protected]>
I can see this adding overhead. Can you elaborate on what you mean by "GetMany on the children of a node with duplicate links may fail". |
We check to see if we get back the correct number of blocks and, if we don't, write an error to the channel. However, lower layers will (correctly) deduplicate requested keys so we'll end up getting back fewer blocks than we requested and think something has gone wrong. |
Okay, deduplicating at every layer seams like unnecessary overhead, and my thoughts are that it maybe we should just not check at this level, but I am not familiar enough with the code to know for sure. |
We could just say "make sure to deduplicate your own keys". Alternatively, we could not return an error when we can't find all requested objects. Really, I think we should go with the second (that's what the datastore does) but that will take some thought. Also note, this is definitely not the performance bottleneck (unfortunately). |
@@ -201,7 +201,20 @@ func (n *dagService) GetMany(ctx context.Context, keys []*cid.Cid) <-chan *ipld. | |||
return getNodesFromBG(ctx, n.Blocks, keys) | |||
} | |||
|
|||
func dedupKeys(keys []*cid.Cid) []*cid.Cid { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be done without copying the whole array to map and reverse by running a search over the rest of the array and remembering which indexes are duplicate.
IDK if it would be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "remember which indexes are duplicate" part is exactly what we're doing when copying to the map. We could modify the array instead of creating a new one but then we'd be modifying an argument. We could also sort and then dedup but this is probably faster.
Good catch, but AFAIK the only user of |
A patch I was writing to try to parallelize fetching a bit.
You mean, replace it with an interface that fills in a promise array? This interface is useful when you don't care about the order (first come, first serve). Doing that with the |
dedup keys in GetMany
Otherwise, GetMany on the children of a node with duplicate links may fail