[EN Performance] Optimize Ledger.Get() by making Forest.Read() use ~5x fewer allocs/op and run ~20% faster #2477

fxamacker · 2022-05-25T00:45:23Z

Changes

Change Forest.Read() and Forest.ReadSingleValue() to return []ledger.Value without deep copying payload keys. This avoids 4 heap allocation per key.

This change doesn't affect Ledger.Get() (the caller) because it discards the payload keys.

Closes #2475

Benchmark Comparisons (includes PR #2473 + this PR)

Bench comparison of master Ledger.Get() vs Ledger.GetSingleValue() reading a single value.

name          old time/op    new time/op    delta
LedgerGet1-4    6.77µs ± 1%    4.30µs ± 1%  -36.47%  (p=0.000 n=9+9)

name          old alloc/op   new alloc/op   delta
LedgerGet1-4    1.74kB ± 0%    0.69kB ± 0%  -60.55%  (p=0.000 n=10+10)

name          old allocs/op  new allocs/op  delta
LedgerGet1-4      21.0 ± 0%       3.0 ± 0%  -85.71%  (p=0.000 n=10+10)

Bench comparison of master Ledger.Get() vs new Ledger.Get() reading 100 values.

name            old time/op    new time/op    delta
LedgerGet100-4     519µs ± 0%     410µs ± 1%  -21.02%  (p=0.000 n=10+9)

name            old alloc/op   new alloc/op   delta
LedgerGet100-4     190kB ± 0%      95kB ± 0%  -50.04%  (p=0.000 n=10+10)

name            old allocs/op  new allocs/op  delta
LedgerGet100-4     1.52k ± 0%     0.32k ± 0%  -79.17%  (p=0.000 n=10+10)

Benchmarks used Go 1.17 on linux_amd64 (Haswell).

Change Forest.Read to return []ledger.Value without deep copying payload keys. This avoids 4 heap allocation per key. This change doesn't affect the caller (Ledger.Get) because it discards the payload keys. name old time/op new time/op delta TrieRead-4 524µs ± 1% 420µs ± 1% -19.77% (p=0.000 n=10+10) name old alloc/op new alloc/op delta TrieRead-4 190kB ± 0% 95kB ± 0% -50.04% (p=0.000 n=10+10) name old allocs/op new allocs/op delta TrieRead-4 1.52k ± 0% 0.32k ± 0% -79.17% (p=0.000 n=10+10)

Changed Forest.ReadSingleValue to return ledger.Value without deep copying payload keys. This avoid 4 heap allocation per key. This change doesn't affect the caller (Ledger.GetSingleValue) because it discards the payload key.

ramtinms

Seems like a great optimization 👏

tarakby

Nice improvement 👏🏼
I didn't know we have been returning the full Payload all this time but what we needed was just Value!

I was thinking that we could maybe push the same optimization further:

func (f *Forest) Read calls func (mt *MTrie) UnsafeRead which returns []*Payload, it then takes only the Value part. I was wondering if we could make UnsafeRead return []Value instead. We won't be able to track totalPayloadSize anymore, we would be able to track totalValueSize instead (is that a problem?).
UnsafeRead seems to handle the case of len(paths)==1 separately to avoid the recursive overhead. Do you think there is still advantage in handling that edge case inside Read ?

Happy to hear your thoughts about these suggestions, I might have missed something :)

fxamacker · 2022-05-25T22:35:09Z

@tarakby Thanks for taking a look and great questions!

func (f *Forest) Read calls func (mt *MTrie) UnsafeRead which returns []*Payload, it then takes only the Value part. I was wondering if we could make UnsafeRead return []Value instead. We won't be able to track totalPayloadSize anymore, we would be able to track totalValueSize instead (is that a problem?).

The main optimization is to avoid payload key deep copy which happens in Forest.Read. For keys containing owner, controller, and key, there are 4 allocs per copy because of its data structure.

It probably won't give us much speedup or reduce allocs if we make UnsafeRead return []Value instead of []*Payload because:

UnsafeRead() doesn't deep copy returned payloads, so there isn't extra allocs even if it returns []Value.
Forest.Read() needs to create new []ledger.Value to ensure values are in the same order as received paths. So this alloc needs to happen even if UnsafeRead() returns []Value.

If we only return []Value from MTrie.UnsafeRead, callers won't have access to payload key anymore. The payload key could be useful for callers.

I don't know if it would be a problem to replace totalPayloadSize with totalValueSize in metrics reports. I can find out from @ramtinms if needed.

UnsafeRead seems to handle the case of len(paths)==1 separately to avoid the recursive overhead. Do you think there is still advantage in handling that edge case inside Read ?

Great question! Yes because they optimize different edge cases.

In MTrie.UnsafeRead(), len(paths) == 1 optimizes read when there is only one path to read in subtree as result of partitioning.

In Forest.Read(), len(paths) == 1 optimizes read when there is only one path to batch read for entire mtrie. It avoids the overhead of deduplicating paths and reconstructing values in order after read.

tarakby

Thank you @fxamacker for the detailed reply!

UnsafeRead() doesn't deep copy returned payloads, so there isn't extra allocs even if it returns []Value.

Ah I missed this point! I agree there isn't big advantage of returning values over payload pointers then.

In Forest.Read(), len(paths) == 1 optimizes read when there is only one path to batch read for entire mtrie. It avoids the overhead of deduplicating paths and reconstructing values in order after read

I agree it's still worth it to skip the deduplication and ordering 👌🏼

fxamacker added 4 commits May 24, 2022 18:20

Fix tests

f9b8076

Update Forest.Read() callers to use new API

ee5ab90

Speedup and reduce allocs/op in ledger single read

31d144c

Changed Forest.ReadSingleValue to return ledger.Value without deep copying payload keys. This avoid 4 heap allocation per key. This change doesn't affect the caller (Ledger.GetSingleValue) because it discards the payload key.

fxamacker added the Performance label May 25, 2022

fxamacker self-assigned this May 25, 2022

fxamacker requested review from ramtinms, m4ksio and AlexHentschel as code owners May 25, 2022 00:45

ramtinms approved these changes May 25, 2022

View reviewed changes

ramtinms requested a review from tarakby May 25, 2022 16:55

tarakby reviewed May 25, 2022

View reviewed changes

tarakby approved these changes May 25, 2022

View reviewed changes

fxamacker merged commit a20ba3f into fxamacker/optimize-reading-single-register May 25, 2022

fxamacker deleted the fxamacker/optimize-ledger-read-allocs branch May 25, 2022 23:22

fxamacker mentioned this pull request May 27, 2022

[Execution] Forest.Read() can use less memory #2475

Closed

fxamacker added the Execution Cadence Execution Team label Jul 14, 2022

fxamacker changed the title ~~Optimize Ledger.Get() by making Forest.Read() use ~5x fewer allocs/op and run ~20% faster~~ [EN Performance] Optimize Ledger.Get() by making Forest.Read() use ~5x fewer allocs/op and run ~20% faster Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EN Performance] Optimize Ledger.Get() by making Forest.Read() use ~5x fewer allocs/op and run ~20% faster #2477

[EN Performance] Optimize Ledger.Get() by making Forest.Read() use ~5x fewer allocs/op and run ~20% faster #2477

fxamacker commented May 25, 2022

ramtinms left a comment

tarakby left a comment •

edited

Loading

fxamacker commented May 25, 2022

tarakby left a comment

[EN Performance] Optimize Ledger.Get() by making Forest.Read() use ~5x fewer allocs/op and run ~20% faster #2477

[EN Performance] Optimize Ledger.Get() by making Forest.Read() use ~5x fewer allocs/op and run ~20% faster #2477

Conversation

fxamacker commented May 25, 2022

Changes

Benchmark Comparisons (includes PR #2473 + this PR)

ramtinms left a comment

Choose a reason for hiding this comment

tarakby left a comment • edited Loading

Choose a reason for hiding this comment

fxamacker commented May 25, 2022

tarakby left a comment

Choose a reason for hiding this comment

tarakby left a comment •

edited

Loading