-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hashes as hex strings or buffers? #533
Comments
All tests are passing. I'll let you guys be the judge, but this seems a bit better on memory and performance. |
Does not |
That's awesome that |
I think the consistency of everything being a buffer rather than depending on the case is a big win from the point of view of working with the API. Also makes things more consistent across modules. It seems like a slight extra annoyance to have to pass in the custom map to LRU, etc but this is something that most people won't be touching. I guess we might build in support for |
This can be closed, was added to master at b92839c |
Closed in b92839c |
This is a huge design error on my part. I originally used hex strings for some things because they were more efficient for JS maps/objects than constantly doing
buf.toString('hex')
. So, currently in bcoin, anything that needs to be constantly inserted into a hash table is a hex string, anything else is usually a buffer. Dealing with this on a case-by-case basis lead to really good performance in node and the browser, but as a result, the code is more confusing: e.g. "is that parameter supposed to be a hex-hash or a buffer-hash?"In the past, I tried to rectify this by using a binding to the C++ hash table. This custom hash table object would allow for Buffers as keys, but performance was so horribly bad (due to the overhead of calling out to C++) that I scrapped that idea entirely.
Anyway, it turns out I made a huge oversight:
buf.toString('binary')
is much more optimal than I thought...This is what gets called internally during
buf.toString('binary')
:https://github.com/nodejs/node/blob/3626944/src/string_bytes.cc#L665
This ends up calling:
https://github.com/nodejs/node/blob/3626944/src/string_bytes.cc#L167
Which ultimately calls:
https://github.com/nodejs/node/blob/3626944/deps/v8/src/heap/factory.cc#L472
It's a simple copy.
As for
buf.write(str, 'binary')
andBuffer.from(str, 'binary')
:https://github.com/nodejs/node/blob/3626944/src/string_bytes.cc#L321
Another simple copy of bytes. This means
buf.copy(out, 0)
andout.write(str, 'binary')
are roughly the same speed!So where to go from here? I suggest using Buffers for all hashes in bcoin. We use a "BufferMap" implementation that calls
toString('binary')
on every key before insertion into the Map. The overhead is very minimal.Concerns:
Lock
,MapLock
, andLRU
must allow for this newBufferMap
object.Validator
must offer a method to convert a hash to a Buffer.BDB
must return Buffers when parsing hashes in database keys.bio.read(buf, true)
orbr.readBytes(n, true)
).The second concern is the only unsolvable one I think. If anybody has any ideas please post them. I'm going to start rewriting things on a separate branch, using only Buffers, and see how it does in terms of performance and memory.
cc @tuxcanfly @nodar-chkuaselidze @boymanjor @pinheadmz
The text was updated successfully, but these errors were encountered: