Chore: Remove one allocate per hash by using generics.#5829
Chore: Remove one allocate per hash by using generics.#5829algorandskiy merged 3 commits intoalgorand:masterfrom
Conversation
The way we currently hash various objects is with:
```
// HashRep appends the correct hashid before the message to be hashed.
func HashRep(h Hashable) []byte {
hashid, data := h.ToBeHashed()
return append([]byte(hashid), data...)
}
```
This means that every callers generally have to allocate in order to
convert their argument, which might be a `BlockHeader`, for example,
into a Hashable. (This happens transparently, of course.)
However, by writing HashRep as:
```
func HashRep[H Hashable](h H) []byte {
hashid, data := h.ToBeHashed()
return append([]byte(hashid), data...)
}
```
We use generics to make a HashRep for each Hashable type. So a
`Hashable` need not be created to call it. Thus we we get one fewer
allocations most of the time.
For this PR, I did this by writing `HashRepFast` instead, so that I
could commit some benchmarks. They show one for allocation.
I had to create several copies of existsing functions to make the
Benchmarks work. In the real PR, I'll remove all that extra stuff.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #5829 +/- ##
==========================================
- Coverage 55.72% 55.71% -0.01%
==========================================
Files 476 476
Lines 67131 67130 -1
==========================================
- Hits 37408 37403 -5
- Misses 27203 27204 +1
- Partials 2520 2523 +3 ☔ View full report in Codecov by Sentry. |
b28ee68 to
427c33a
Compare
427c33a to
eb94b80
Compare
algorandskiy
left a comment
There was a problem hiding this comment.
Could post some benchmark results into the PR description?
Added |
zeldovich
left a comment
There was a problem hiding this comment.
Looks like a good idea!
Another opportunity to save on allocations would be to call protocol.EncodeMsgp() instead of protocol.Encode() in ToBeHashed(). The extra allocation is coming from the check that protocol.Encode() is doing through CanMarshalMsg() to check whether obj directly implements msgp.Marshaler, or whether its msgp.Marshaler methods are promoted from some embedded struct field.
Alternatively, we could just decide that we don't have any more dangling embedded fields whose parent structs haven't gone through msgp. This used to happen when I was first incrementally adding support for msgp, but by now, everything has been msgp'ed already. So, we could change protocol.Encode() to drop that CanMarshalMsg() check and save an extra allocation.
The way we currently hash various objects is with:
This means that every callers generally have to allocate in order to convert their argument, which might be a
BlockHeader, for example, into a Hashable. (This happens transparently, of course.)However, by writing HashRep as:
We use generics to make a HashRep for each Hashable type. So a
Hashableneed not be created to call it. Thus we we get one fewer allocations most of the time.As an example, BenchmarkGenesisHash gives:
One fewer alloc, and a small little speedup.
Summary
Test Plan