-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write providers to disk to avoid memory leaks #2860
Conversation
I'm thinking it might be best to just write |
d3a2034
to
fd96e0c
Compare
This also has the super cool advantage of persisting providers information across node reboots |
@Kubuxu @jbenet @kevina I'm converting keys and peerIDs to hex before passing them to the datastore Before merging this, I think we might want to decide if this is the way we want to do this moving forward, or if we should attempt a fix at the go-datastore level (allowing us to skip the hex encoding) |
@whyrusleeping #2601 is causing some problems for me in the filestore, in particular the filestore maintenance commands will report the mangled hash which can confuse users at best and at also cause problems when trying to access the block outside of the filestore, I think we should work to fix this once and for all. I image any fix will likely introduce a repo. change. Let's take this discussion over to #2601. |
@@ -81,7 +81,7 @@ func NewProviderManager(ctx context.Context, local peer.ID, dstore ds.Datastore) | |||
const providersKeyPrefix = "/providers/" | |||
|
|||
func mkProvKey(k key.Key) ds.Key { | |||
return ds.NewKey(providersKeyPrefix + hex.EncodeToString([]byte(k))) | |||
return ds.NewKey(providersKeyPrefix + base64.StdEncoding.EncodeToString([]byte(k))) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping I would use RawURLEncoding. (1) StdEncoding includes '/' in the Alphabet (see https://tools.ietf.org/html/rfc4648) URLEncoding does not (hence it's name) (2) The RAW form does not include the unnecessary '=' padding character.
SGTM but I would like one more pair of eyes on it. |
@whyrusleeping there is a failed test, should we worry about it? |
@kevina no it isn't connected |
@whyrusleeping |
KeysOnly: true, | ||
Prefix: providersKeyPrefix, | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping, using dstore.Query will work, but it is very slow. The Query mechanism has a lot of overhead. The data is being sent to you via an unbuffered channel in a separate go routine. In my own filestore code I was able to get a modest speedup by increasing the channel buffer size used the query, but it was still slow (speed up of 2 to 3x). By querying the leveldb directly I was able to get a 10-12x speedup when doing a filestore ls
. (See ipfs-filestore#10) I do not know how bad it will hurt performance here but it is something to keep it mind if you notice a slowdown after they code is deployed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevina even with KeysOnly set to true? Now that you mention this, i think i'll throw in some timers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping Yes, in fact that is how I was using Query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping would it be possible to improve the perf of that call, I depend on it in bloom PR (with AllKeysChan), better perf on it, shorter bloom build time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a quick hack you can try and increase the channel buffer size here https://github.com/ipfs/go-datastore/blob/master/query/query.go#L185.
A more long term solution might be to rewrite the query interface not use a direct iterator rather then using Goroutines, however Disk IO may nullify this benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://ewencp.org/blog/golang-iterators/ for some performance comparisons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, using a channel as an iterator sucks. If one of you wants to work on improving the perf of query that would be great.
We could change the interface to not use a channel, and have it instead just return the next value directly. Then on top of that we could provide a method for turning the direct query result into a channel buffered one for usecases that need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for not using channels there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you can avoid using a query entirely if you store the records as ipfs objects properly, and only use leveldb to store key -> hash_of_providers_object
, similar to how the pinset is stored
SGTM. As far as I can tell I don't see any real problems. |
License: MIT Signed-off-by: Jeromy <[email protected]>
86540b6
to
8f91069
Compare
License: MIT Signed-off-by: Jeromy <[email protected]>
License: MIT Signed-off-by: Jeromy <[email protected]>
Choo Choo! |
This reverts commit 5592144, reversing changes made to 3b2993d. License: MIT Signed-off-by: Lars Gierth <[email protected]>
This is a rough first hack on this. It works as intended, but i'm not happy with the actual way of putting the information on disk.
Currently, it puts the list of peers providing a given key to the datastore at
/providers/<KEY>
. This means that everytime a provider is added, we write marshal all the providers and write to the datastore.License: MIT
Signed-off-by: Jeromy [email protected]