-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
secp256k1: Add TinyGo support. #3223
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. This looks good other than a couple of inline nits I've identified.
For what it's worth, while this approach is fine as a method to immediately allow it to work with TinyGo, I think it would ultimately make more sense to modify the core window-based logic in the innards of ScalarBaseMultNonConst
to support a smaller window size in exchange for a bit of calculation so that it still works much more quickly on TinyGo than doing a round trip through the arbitrary point multiplication as this PR does. In that way, it would essentially allow a tradeoff between the memory usage and calculation speed. Moreover, it would allow it to avoid memory allocations which have GC implications.
For example, currently it has a window size of 256 with pure lookups which results in about 240KiB memory usage. If instead it went with a window size of something like 32 while calculating each window, it would only need about 1KiB memory usage and would still be quite a bit faster. A rough guess is that it would probably only be twice as slow using that approach versus the current method which is around 5x slower.
BenchmarkScalarBaseMultNonConstFast 64776 18540 ns/op 0 B/op 0 allocs/op
BenchmarkScalarBaseMultNonConstSlow 13072 91975 ns/op 64 B/op 2 allocs/op
The pre-computed table for speeding up ScalarBaseMultNonConst is several hundred kilobytes in the binary and even more when unpacked into working memory. Special-case ScalarBaseMultNonConst to fall back to ScalarMultNonConst when the 'tinygo' tag is specified, which is true when building a Go program with TinyGo.
Tweaking the window size is ideal, but I couldn't figure out a window size small enough to matter. See below.
fmt.Println("unsafe.Sizeof([32][256]JacobianPoint{})", unsafe.Sizeof([32][256]JacobianPoint{}))
fmt.Println("unsafe.Sizeof([32][32]JacobianPoint{})", unsafe.Sizeof([32][32]JacobianPoint{}))
fmt.Println("unsafe.Sizeof([32][1]JacobianPoint{})", unsafe.Sizeof([32][1]JacobianPoint{})) results in
which are larger than your 240KiB/1KiB numbers. What did I miss? |
You're right. I left off a factor of 4 there for the With a window size of |
For reference, I opend #3225 to make it so it remains zero allocation in the slow path (aka on TinyGo) as well to avoid GC implications. |
Thanks. I took another look at implementing this, but it seems to be quite some work. Am I right that the existing It also seems to me (and from casual glancing over the Bitcoin secp256k1 implementation) that pre-computing a small (say w=4 or 5) window is advantageous even for the general I suppose I either missed something, or the reason |
Yes, it would be quite a bit of work to implement. That's a big reason why I didn't have a big issue with the approach in this PR, but I figured it was worth mentioning.
Correct. GECC (Guide to Elliptic Curve Cryptography) section 3.30 provides some information and algorithms for window methods, but the width-w NAF algorithms it provides really aren't very optimized from what I recall. The existing binary NAF uses a significantly faster algorithm (#2695, ~93%) introduced by Prodinger, but it doesn't apply directly to width-w NAF. I mention that because it may or may not be the case that a small window that needs a much more expensive NAF calculation (as well as some other calculations for dealing with the endomorphism in Jacobian coords) will be all that much faster in practice. I suspect that it would be faster since point additions are relatively costly and using a window would cut down on those, but without actually implementing and testing it, it's hard to say with any certainty.
Well, the primary reason is just that I never got around to putting the effort into implementing and testing it given it's already extremely fast and I had reached the point of diminishing returns on optimizations. I believe there are also some other considerations and normalizations that would probably have to occur for a window approach due to the use of the endormophism along with Jacobian projective space, but it's been a few years since I wrote and optimized all of that code that I'd need to dig in again to verify all the math. On the topic of optimizations and diminishing returns, there are some others that would also likely result in some additional speedups in signature verification such as Shamir's trick (multiple point multiplication).
I've not looked at their implementation for any of this, but if there is a need to store additional information, a context value is a tried and true method for sure. Our existing Go code effectively more or less does that by housing the NAF state in a struct that is kept on the stack. |
The pre-computed table for speeding up
ScalarBaseMultNonConst
is several hundred kilobytes in the binary and even more when unpacked into working memory. Special-caseScalarBaseMultNonConst
to fall back toScalarMultNonConst
when the 'tinygo' tag is specified, which is true when building a Go program with TinyGo.