db: more cache budget for BODIES and EXTRA columns #11548

ordian · 2020-03-05T15:27:16Z

It seems that our assumption about db sizes was not adequate to reality.

** Compaction Stats [col0] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0   16.52 MB   0.5      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L3      1/0   28.62 MB   0.1      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L4      3/0   150.05 MB   0.8      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L5     35/0    1.04 GB   0.6      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L6    811/0   49.12 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
 Sum    851/0   50.35 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0

** Compaction Stats [col2] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      2/2   315.72 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L2      1/1   13.96 MB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L3      1/0   30.18 MB   0.1      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L4     13/0   628.76 MB   0.2      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L5    121/0    6.31 GB   0.3      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L6   1025/0   63.35 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
 Sum   1163/3   70.31 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
** Compaction Stats [col3] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0   330.66 KB   0.5      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L2      1/0    5.23 MB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L3      1/0   62.90 MB   0.2      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L4     15/0   772.35 MB   0.3      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L5    150/0    7.70 GB   0.3      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L6   1272/0   78.24 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
 Sum   1440/0   86.76 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0

Changing the default memory distribution seems to help with #11494.

dvdplm · 2020-03-17T15:17:50Z

Help for reviewers:
col0 is COL_STATE, where all the 80M+ accounts and their balances/code is stored, 50.3Gb in the DB above.
col2 is COL_BODIES, which stores block bodies, 70.3Gb (this is indeed surprising to me)
col3 is COL_EXTRAS, which stores block "details" and receipts and is also where we keep track of the current best/oldest known block; 86.7Gb – I suspect that this column should not be allowed to grow unbounded and afaict it is not ever pruned which makes no sense to me (bug?)

dvdplm

If I understand you correctly this PR stems from the observation that the column sizes do not match the cache memory allocated to them; I think you're saying "the three columns are roughly of the same size, they should have roughly the same amount of memory assigned"?

It is a really good question but I have several hard-to-answer questions:

isn't it true that the read/write pattern to COL_STATE is much less regular than to the others? It is really tricky to make any kind of caching efficient when users control the kinds of queries by deploying solidity code and making token transfers from addresses that can be anywhere in the DB? Allocating as much memory as possible to speed up seeks in COL_STATE still seems like a smart move.
why isn't COL_BODIES ever pruned? This might be the dumbest question ever, but when a node warp syncs it doesn't have all block bodies back to genesis, does it? We backfill ancient blocks, but what is the actual purpose of that: crucial for security or "nice to have"?
I was under the illusion that COL_EXTRA was a column where we tossed random bits and pieces we didn't have a better place for, e.g. the best block etc. TIL that is not at all the case but why do we need to store all transaction receipts for ever? Can't we prune this? From a cursory glance at the code using COL_EXTRA it seems like we're mostly writing to it, but most reads seem to be for the "first", "best" and "ancient" keys; if it is indeed mostly appended to, spending cache on it is likely wasted? Maybe we need a COL_RECEIPT?

EDIT: There is no benchmarking data here – do you have any? What changes with the cache redistribution?

ordian · 2020-03-17T15:59:50Z

I think you're saying "the three columns are roughly of the same size, they should have roughly the same amount of memory assigned"

The problem is that it's not just the memory assigned, it's the size of levels L0 and L1 in rocksdb, which affects the overall db (column) layout.

dvdplm · 2020-03-17T17:46:58Z

The problem is that it's not just the memory assigned, it's the size of levels L0 and L1 in rocksdb, which affects the overall db (column) layout.

I don't know what you mean, ELI5 pls?

AtkinsChang · 2020-03-24T15:13:20Z

@ordian FYI the stats of overlayrecent database sync with this patch

** Compaction Stats [col0] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.0      0.0     0.0      0.0       3.1      3.1       0.0   1.0      0.0     88.5     36.02             32.43        16    2.251       0      0
  L4     11/0   679.83 MB   1.0      7.7     3.1      4.6       6.2      1.5       0.0   2.0     87.6     69.7     90.54             64.85         8   11.317     69M  2006K
  L5     45/0    1.77 GB   1.0      3.7     1.3      2.4       3.4      1.0       0.0   2.5     75.7     69.3     50.11             39.12        21    2.386     30M   876K
  L6    867/0   52.10 GB   0.0     17.6     1.1     16.5      16.5      0.1       0.0  15.1     60.9     57.4    295.08            256.43        23   12.830     17M    13M
 Sum    923/0   54.54 GB   0.0     29.0     5.5     23.5      29.2      5.7       0.0   9.4     63.0     63.4    471.75            392.83        68    6.937    117M    16M
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0

** Compaction Stats [col2] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0    1.97 MB   0.5      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0    114.7      0.02              0.00         1    0.017       0      0
  L2     25/0    1.55 GB   0.9      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L3      1/0   64.24 MB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L4     13/0   690.88 MB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L5    143/0    7.23 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L6   1193/0   72.86 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
 Sum   1376/0   82.38 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0    114.7      0.02              0.00         1    0.017       0      0
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0

** Compaction Stats [col3] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0    2.53 MB   0.5      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     74.5      0.03              0.00         1    0.034       0      0
  L2     12/0   780.07 MB   0.5      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L3     23/0    1.45 GB   0.9      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L4     18/0   923.59 MB   0.1      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L5    173/0    9.01 GB   0.1      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L6   1466/0   90.62 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
 Sum   1693/0   102.74 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     74.5      0.03              0.00         1    0.034       0      0
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0

@dvdplm this may help
https://github.com/facebook/rocksdb/blob/master/options/options.cc#L525

dvdplm · 2020-03-24T20:27:40Z

@dvdplm this may help
https://github.com/facebook/rocksdb/blob/master/options/options.cc#L525

Sort of, I've read that file many times but I still have a hard time getting an intuition for what config values are relevant to us. https://github.com/facebook/rocksdb/wiki/Leveled-Compaction is a good read too, but only somewhat related to this PR.

Do you have similar stats for a DB without this patch? Do you expect the level distribution to be different? Why?

ordian · 2020-03-24T20:33:38Z

What I meant is that it's not just the memory assigned, if you look how we use memory budget,
https://github.com/paritytech/parity-common/blob/939151e23b132110628739e8458e6cece1f1c8d0/kvdb-rocksdb/src/lib.rs#L208
we set the optimize_level_style_compaction, which in turn sets the size of L0 and L1 layers of rocksdb column, which in turn affects the whole db layout. So the https://github.com/facebook/rocksdb/wiki/Leveled-Compaction is actually relevant here.

AtkinsChang · 2020-03-25T14:58:21Z

@dvdplm Sorry that I only left few word without descriptive information. I just want to explain why it change the level distribution.
Like what Ordian said, the function optimize_level_style_compaction we used is actually setting file size of L0 to memory_budget / 2 and total size of L1 to memory budget.
If I understand correctly, It changes all the level distribution because of the modification of max_bytes_for_level_base.

But I got another question that optimize_level_style_compaction aim to achieve:
2 memtables -> L0
2 L0 -> L1
So it turns off L0 L1's compression and tunes its size. But these settings seems to be overwritten. I can't not find the reason in upstream or rocksdb official wiki.

I don't have non-patched state db now.

db: more cache budget for BODIES and EXTRA columns

d612e72

ordian added A0-pleasereview 🤓 Pull request needs code review. M4-core ⛓ Core client code / Rust. labels Mar 5, 2020

REVERTME: produce binaries for this branch

68ea084

dvdplm reviewed Mar 17, 2020

View reviewed changes

dvdplm added A1-onice 🌨 Pull request is reviewed well, but should not yet be merged. and removed A0-pleasereview 🤓 Pull request needs code review. labels Apr 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db: more cache budget for BODIES and EXTRA columns #11548

db: more cache budget for BODIES and EXTRA columns #11548

ordian commented Mar 5, 2020

dvdplm commented Mar 17, 2020

dvdplm left a comment •

edited

Loading

ordian commented Mar 17, 2020

dvdplm commented Mar 17, 2020

AtkinsChang commented Mar 24, 2020

dvdplm commented Mar 24, 2020

ordian commented Mar 24, 2020

AtkinsChang commented Mar 25, 2020 •

edited

Loading

db: more cache budget for BODIES and EXTRA columns #11548

Are you sure you want to change the base?

db: more cache budget for BODIES and EXTRA columns #11548

Conversation

ordian commented Mar 5, 2020

dvdplm commented Mar 17, 2020

dvdplm left a comment • edited Loading

Choose a reason for hiding this comment

ordian commented Mar 17, 2020

dvdplm commented Mar 17, 2020

AtkinsChang commented Mar 24, 2020

dvdplm commented Mar 24, 2020

ordian commented Mar 24, 2020

AtkinsChang commented Mar 25, 2020 • edited Loading

dvdplm left a comment •

edited

Loading

AtkinsChang commented Mar 25, 2020 •

edited

Loading