Performance Issue of Storage Read using `entries()` and `entriesAt()` #4959

stechu · 2022-06-19T21:27:55Z

Toolchain: NodeJS + Typescript

One of the core performance of MantaPay is fast syncing UTXOs from the ledger to users' wallets.
The core data-structure that we implements on chain is a storage double map:

#[pallet::storage]
	pub(super) type Shards<T: Config> =
		StorageDoubleMap<_, Identity, u8, Identity, u64, (Utxo, EncryptedNote), ValueQuery>;

(https://github.com/Manta-Network/Manta/blob/manta/pallets/manta-pay/src/lib.rs#L147)

We tried two methods of syncing Shards.

Implement our own RPC call the wraps iterating over the map entries.
Using api.query.mantaPay.shards.entries().

In theory, the second method should be much faster since it is able to download the entire storage tree. However, we find the first one has 3x performance advantage even downloading the entire storage map. Below is the benchmark data:

`Shards` Size	RPC Latency (ms)	RPC Throughput (MB/sec)	`entries()` latency (ms)	entries() Throughput (MB/sec)
4096	173	3.161127168	N.A.	N.A.
8192	304	3.495065789	955	1.079842932
16384	601	3.483777038	1832	1.125818777
32768	1024	timeout (says 30 sec)	3306	1.247731397

@jacogr @shawntabrizi would love to get your thoughts on why the performance disparity and whether we can improve query storage entries() performance.

The text was updated successfully, but these errors were encountered:

jacogr · 2022-06-20T03:24:22Z

How have you implemented the first RPC call? Is this on the Node side?

stechu · 2022-06-20T03:35:22Z

How have you implemented the first RPC call? Is this on the Node side?

Yes, we basically use iter_prefix_from iterate over the StorageDoubleMap
https://github.com/Manta-Network/Manta/blob/manta/pallets/manta-pay/src/rpc.rs#L30
https://github.com/Manta-Network/Manta/blob/manta/pallets/manta-pay/src/lib.rs#L487

jacogr · 2022-06-20T03:41:44Z

Ok, the RPC would be more effective. So what the .entries() does -

it first retrieves all the keys for the map
for these keys it retrieves all the values
it then zips the key/value results

With this there are some caveats -

for keys, we can retrieve at most 1000 per call (there is a node limit here), this means to retrieve maps >1k entries, multiple key calls are made in series (it needs to last key as a starting point for the next call)
for the values, we retrieve at most 250 entries at once, looping through. (This is once again to protect against sizing, the response can be at most 16MB at once - with larger amount of values we have bumped up against that limit before and the full call would fail - sadly this is not dynamic per map, but rather a global settings, we just don't know so the 250 limit seems to work in all cases, small and large maps)

So in the case of entries, it will at best make 2 RPC calls (assuming < 250 keys) and depending on the map size it will generally have to do multiples to retrieve all information. This is not a speedy operation. Taking 4096 keys -

5 key retrieval calls in series (4 x 1000 + 1 x 96)
17 value retrieval calls (once the keys are retrieved, 16 x 250 + 1 x 96)

The iteration for the keys are obviously the same as on the Node side, no matter which method - it still needs to loop through the trie to retrieve all keys, so that overhead would be comparable. Getting all the data however is less effective from an API perspective, since even with small maps <= 250 values it needs at least 2 calls for the keys and then values.

stechu · 2022-06-20T03:45:41Z

Is there a way to get the entire subtrie from the storage? I think around 3MB/s is ok for us, but definitely higher is desirable.

jacogr · 2022-06-20T03:48:32Z

You can only retrieve via keys. There used to be state_getPairs which is deprecated (and I believe marked unsafe) - since it is unlimited, it was unbounded and then could also overrun the 16MB response limit, since it literally had some serious performance issues on the Node itself.

stechu · 2022-06-20T03:55:54Z

You can only retrieve via keys. There used to be state_getPairs which is deprecated (and I believe marked unsafe) - since it is unlimited, it was unbounded and then could also overrun the 16MB response limit, since it literally had some serious performance issues on the Node itself.

Make sense! I guess I am asking is there a way we can do something more efficient on the node side. For example, leveraging the subtrie structure more efficiently.

jacogr · 2022-06-20T04:01:01Z

It is a bit out of my wheelhouse, but as far as I know it would still need to iterate through the trie to find the keys - there are child tries, e.g. as implemented in Polkadot crowdloans I'm not sure how these operate from an iteration perspective on the Node. (It certainly is much easier to drop it completely, but not sure if these are faster iteration-wise, my exposure to them is mostly on the RPCS which have similar key/values methods)

shawntabrizi · 2022-06-20T04:26:20Z

My 2 cents is that you shouldn't be abusing a node like this. It just won't scale as the substrate DB is never optimized for reading items like this. You should be using a different indexed database for these kind of queries, and keep this DB in sync with your chain.

jacogr · 2022-07-06T17:43:05Z

Closing as answered.

polkadot-js-bot · 2022-07-13T19:21:02Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue if you think you have a related problem or query.

stechu changed the title ~~Read Performance Issue of Storage Read using entries() and entriesAt()~~ Performance Issue of Storage Read using entries() and entriesAt() Jun 19, 2022

jacogr added Support Tracks issues or requests related to troubleshooting, answering questions, and user assistance. substrate labels Jun 20, 2022

jacogr closed this as completed Jul 6, 2022

polkadot-js locked as resolved and limited conversation to collaborators Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Issue of Storage Read using `entries()` and `entriesAt()` #4959

Performance Issue of Storage Read using `entries()` and `entriesAt()` #4959

stechu commented Jun 19, 2022

jacogr commented Jun 20, 2022 •

edited

Loading

stechu commented Jun 20, 2022

jacogr commented Jun 20, 2022 •

edited

Loading

stechu commented Jun 20, 2022

jacogr commented Jun 20, 2022 •

edited

Loading

stechu commented Jun 20, 2022 •

edited

Loading

jacogr commented Jun 20, 2022 •

edited

Loading

shawntabrizi commented Jun 20, 2022

jacogr commented Jul 6, 2022

polkadot-js-bot commented Jul 13, 2022

Performance Issue of Storage Read using entries() and entriesAt() #4959

Performance Issue of Storage Read using entries() and entriesAt() #4959

Comments

stechu commented Jun 19, 2022

jacogr commented Jun 20, 2022 • edited Loading

stechu commented Jun 20, 2022

jacogr commented Jun 20, 2022 • edited Loading

stechu commented Jun 20, 2022

jacogr commented Jun 20, 2022 • edited Loading

stechu commented Jun 20, 2022 • edited Loading

jacogr commented Jun 20, 2022 • edited Loading

shawntabrizi commented Jun 20, 2022

jacogr commented Jul 6, 2022

polkadot-js-bot commented Jul 13, 2022

Performance Issue of Storage Read using `entries()` and `entriesAt()` #4959

Performance Issue of Storage Read using `entries()` and `entriesAt()` #4959

jacogr commented Jun 20, 2022 •

edited

Loading

jacogr commented Jun 20, 2022 •

edited

Loading

jacogr commented Jun 20, 2022 •

edited

Loading

stechu commented Jun 20, 2022 •

edited

Loading

jacogr commented Jun 20, 2022 •

edited

Loading