Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EN Performance] Optimize Ledger.Get() by making Forest.Read() use ~5x fewer allocs/op and run ~20% faster #2477

Conversation

fxamacker
Copy link
Member

Changes

Change Forest.Read() and Forest.ReadSingleValue() to return []ledger.Value without deep copying payload keys. This avoids 4 heap allocation per key.

This change doesn't affect Ledger.Get() (the caller) because it discards the payload keys.

Closes #2475

Benchmark Comparisons (includes PR #2473 + this PR)

Bench comparison of master Ledger.Get() vs Ledger.GetSingleValue() reading a single value.

name          old time/op    new time/op    delta
LedgerGet1-4    6.77µs ± 1%    4.30µs ± 1%  -36.47%  (p=0.000 n=9+9)

name          old alloc/op   new alloc/op   delta
LedgerGet1-4    1.74kB ± 0%    0.69kB ± 0%  -60.55%  (p=0.000 n=10+10)

name          old allocs/op  new allocs/op  delta
LedgerGet1-4      21.0 ± 0%       3.0 ± 0%  -85.71%  (p=0.000 n=10+10)

Bench comparison of master Ledger.Get() vs new Ledger.Get() reading 100 values.

name            old time/op    new time/op    delta
LedgerGet100-4     519µs ± 0%     410µs ± 1%  -21.02%  (p=0.000 n=10+9)

name            old alloc/op   new alloc/op   delta
LedgerGet100-4     190kB ± 0%      95kB ± 0%  -50.04%  (p=0.000 n=10+10)

name            old allocs/op  new allocs/op  delta
LedgerGet100-4     1.52k ± 0%     0.32k ± 0%  -79.17%  (p=0.000 n=10+10)

Benchmarks used Go 1.17 on linux_amd64 (Haswell).

fxamacker added 4 commits May 24, 2022 18:20
Change Forest.Read to return []ledger.Value without deep copying
payload keys.  This avoids 4 heap allocation per key.

This change doesn't affect the caller (Ledger.Get) because it
discards the payload keys.

name        old time/op    new time/op    delta
TrieRead-4     524µs ± 1%     420µs ± 1%  -19.77%  (p=0.000 n=10+10)

name        old alloc/op   new alloc/op   delta
TrieRead-4     190kB ± 0%      95kB ± 0%  -50.04%  (p=0.000 n=10+10)

name        old allocs/op  new allocs/op  delta
TrieRead-4     1.52k ± 0%     0.32k ± 0%  -79.17%  (p=0.000 n=10+10)
Changed Forest.ReadSingleValue to return ledger.Value without deep
copying payload keys.  This avoid 4 heap allocation per key.

This change doesn't affect the caller (Ledger.GetSingleValue) because
it discards the payload key.
Copy link
Contributor

@ramtinms ramtinms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a great optimization 👏

@ramtinms ramtinms requested a review from tarakby May 25, 2022 16:55
Copy link
Contributor

@tarakby tarakby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement 👏🏼
I didn't know we have been returning the full Payload all this time but what we needed was just Value!

I was thinking that we could maybe push the same optimization further:

  • func (f *Forest) Read calls func (mt *MTrie) UnsafeRead which returns []*Payload, it then takes only the Value part. I was wondering if we could make UnsafeRead return []Value instead. We won't be able to track totalPayloadSize anymore, we would be able to track totalValueSize instead (is that a problem?).
  • UnsafeRead seems to handle the case of len(paths)==1 separately to avoid the recursive overhead. Do you think there is still advantage in handling that edge case inside Read ?

Happy to hear your thoughts about these suggestions, I might have missed something :)

@fxamacker
Copy link
Member Author

@tarakby Thanks for taking a look and great questions!

func (f *Forest) Read calls func (mt *MTrie) UnsafeRead which returns []*Payload, it then takes only the Value part. I was wondering if we could make UnsafeRead return []Value instead. We won't be able to track totalPayloadSize anymore, we would be able to track totalValueSize instead (is that a problem?).

The main optimization is to avoid payload key deep copy which happens in Forest.Read. For keys containing owner, controller, and key, there are 4 allocs per copy because of its data structure.

It probably won't give us much speedup or reduce allocs if we make UnsafeRead return []Value instead of []*Payload because:

  • UnsafeRead() doesn't deep copy returned payloads, so there isn't extra allocs even if it returns []Value.
  • Forest.Read() needs to create new []ledger.Value to ensure values are in the same order as received paths. So this alloc needs to happen even if UnsafeRead() returns []Value.

If we only return []Value from MTrie.UnsafeRead, callers won't have access to payload key anymore. The payload key could be useful for callers.

I don't know if it would be a problem to replace totalPayloadSize with totalValueSize in metrics reports. I can find out from @ramtinms if needed.

UnsafeRead seems to handle the case of len(paths)==1 separately to avoid the recursive overhead. Do you think there is still advantage in handling that edge case inside Read ?

Great question! Yes because they optimize different edge cases.

In MTrie.UnsafeRead(), len(paths) == 1 optimizes read when there is only one path to read in subtree as result of partitioning.

In Forest.Read(), len(paths) == 1 optimizes read when there is only one path to batch read for entire mtrie. It avoids the overhead of deduplicating paths and reconstructing values in order after read.

Copy link
Contributor

@tarakby tarakby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @fxamacker for the detailed reply!

UnsafeRead() doesn't deep copy returned payloads, so there isn't extra allocs even if it returns []Value.

Ah I missed this point! I agree there isn't big advantage of returning values over payload pointers then.

In Forest.Read(), len(paths) == 1 optimizes read when there is only one path to batch read for entire mtrie. It avoids the overhead of deduplicating paths and reconstructing values in order after read

I agree it's still worth it to skip the deduplication and ordering 👌🏼

@fxamacker fxamacker merged commit a20ba3f into fxamacker/optimize-reading-single-register May 25, 2022
@fxamacker fxamacker deleted the fxamacker/optimize-ledger-read-allocs branch May 25, 2022 23:22
@fxamacker fxamacker added the Execution Cadence Execution Team label Jul 14, 2022
@fxamacker fxamacker changed the title Optimize Ledger.Get() by making Forest.Read() use ~5x fewer allocs/op and run ~20% faster [EN Performance] Optimize Ledger.Get() by making Forest.Read() use ~5x fewer allocs/op and run ~20% faster Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Execution Cadence Execution Team Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants