try-runtime: dynamic storage query sizes #13923

liamaharon · 2023-04-14T09:56:11Z

Closes #13246
polkadot companion: paritytech/polkadot#7111

Changes

Created get_storage_data_dynamic_batch_size, allowing users to efficiently process a large amount of jsonrpc payloads without knowing the node max payload size configuration. The function calls itself recursively, increasing batch_size when the node responds successfully and reducing it when the node returns an error.
```
async fn get_storage_data_dynamic_batch_size(
	client: &Arc<HttpClient>,
	payloads: Vec<(String, ArrayParams)>,
	batch_size: usize,
) -> Result<Vec<Option<StorageData>>, String>
```
Implemented get_storage_data_dynamic_batch_size for fetching storage keys.
Switched to using a Http provider for these huge requests. Using a WsClient I continuously encountered unhandleable errors from within the jsonrpsee library such as "961d622f: accumulated message length exceeds maximum" and the node would also randomly refuse to continue servicing WsClient, returning incorrectly formatted errors (missing request IDs) that were also causing more internal errors inside the jsonrpsee library. I'm not sure whether this issue is in the jsonrpsee library or the way the Substrate RPC is configured or worth this is worth opening an issue to investigate, let me know if I should.
Removed parallelisation from rpc_child_get_keys. It runs extremely fast on just one thread (2 seconds on a Macbook Pro), and the logic to run it across multiple threads was quite complex and causing some difficult to debug issues. I've benchmarked this to confirm negligible performance impact, I do not think parallelising this piece of code is worth it at this time.

Notes

I compared the speed of loading storage from a local node against master. Performance is identical (+/- <2% between runs)
I tested this works fetching live state from wss://rpc.polkadot.io:443. it took 76min on my home internet, upload speed was be far the biggest bottleneck.

utils/frame/remote-externalities/src/lib.rs

…x-try-runtime-rpc-panic

liamaharon · 2023-04-17T14:20:50Z

Switching this to a draft and implementing a dynamic batch size as described here #13246

ggwpez

Looks good on the whole.
Could you please also test the difference with the parity nodes? https://gist.github.com/ggwpez/81db110fe4390ed9a7622f5857dfc4ff
I dont expect there to be any - besides it being more fault tolerant.

utils/frame/remote-externalities/src/lib.rs

ggwpez · 2023-04-19T08:45:01Z

utils/frame/remote-externalities/src/lib.rs

+			} else {
+				uri.clone()
+			};
+			let http_client = HttpClientBuilder::default().build(uri).map_err(|e| {


I compared the speed of loading storage from a local node against master. Performance is identical (+/- <2% between runs)

So why do we use a WS client here formerly @niklasad1?
Is it supposed to be faster?

Yeah, the websocket client is slightly faster and this is used for instance by the staking miner v1 as well.

I'm still not convinced that throwing extra threads on this will be of any benefit at all, this entire could be be simplified a lot by using FuturesUnordered instead of spawning manual threads for waiting on async I/O

Inserting data into storage is very cpu heavy so I think it's worth it having a dedicated core working on that.

Otherwise I agree that there's likely little to no improvement adding more threads to do network io. Switching to FuturesUnordered for the network io tasks seems like a great idea we could make as part of a future refactor.

utils/frame/remote-externalities/src/lib.rs

…x-try-runtime-rpc-panic

Co-authored-by: Niklas Adolfsson <[email protected]>

…h/substrate into liam-fix-try-runtime-rpc-panic

liamaharon · 2023-04-20T06:00:46Z

Looks good on the whole. Could you please also test the difference with the parity nodes? https://gist.github.com/ggwpez/81db110fe4390ed9a7622f5857dfc4ff I dont expect there to be any - besides it being more fault tolerant.

Confirmed these are working

liamaharon · 2023-04-20T06:26:51Z

Hey @niklasad1, the polkadot staking-miner rpc expects that remote-externalities uses a WsClient: https://github.com/paritytech/polkadot/blob/5fd2bf8c9a0665b361eb26823030a5e0e65459b4/utils/staking-miner/src/rpc.rs#L108

Do you suggest that I refactor the staking-miner and open a companion PR, or try to modify this PR so that remote-externalities supports either a WsClient or HttpClient?

niklasad1 · 2023-04-20T07:41:26Z

Hey @niklasad1, the polkadot staking-miner rpc expects that remote-externalities uses a WsClient: https://github.com/paritytech/polkadot/blob/5fd2bf8c9a0665b361eb26823030a5e0e65459b4/utils/staking-miner/src/rpc.rs#L108

I think it's fine to use the HTTP client just for the "batch requests/remote externalities" just change https://github.com/paritytech/polkadot/blob/master/utils/staking-miner/src/main.rs#L315-#L323 to take an URL instead (in case it would be a ws:// but you already added converting ws to http URL under the hood)

liamaharon · 2023-04-21T09:11:28Z

bot merge

paritytech-processbot · 2023-04-21T09:11:36Z

Waiting for commit status.

liamaharon · 2023-04-21T09:16:33Z

bot merge force

* improve batch rpc error message * wip aimd storage data fetch * complete aimd function refactor * make batch_request function async * improve function name * fix load_child_remote issue * slight efficiency improvement * improve logs and variable name * remove redundant comment * improve comment * address pr comments * Update utils/frame/remote-externalities/src/lib.rs Co-authored-by: Niklas Adolfsson <[email protected]> * simplify client handling * fix type issue * fix clippy issue * try to trigger ci * try to trigger ci --------- Co-authored-by: Niklas Adolfsson <[email protected]>

improve batch rpc error message

6fd902e

liamaharon added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Apr 14, 2023

liamaharon requested a review from kianenigma April 14, 2023 09:56

liamaharon changed the title ~~try-runtime: gracefully handle batch rpc error panic~~ try-runtime: gracefully handle batch rpc error Apr 14, 2023

bkchr approved these changes Apr 14, 2023

View reviewed changes

utils/frame/remote-externalities/src/lib.rs Outdated Show resolved Hide resolved

liamaharon added 2 commits April 17, 2023 09:45

Merge branch 'master' of github.com:paritytech/substrate into liam-fi…

16dc3bc

…x-try-runtime-rpc-panic

wip aimd storage data fetch

944ff14

liamaharon changed the title ~~try-runtime: gracefully handle batch rpc error~~ try-runtime: retry and backoff Apr 17, 2023

liamaharon marked this pull request as draft April 17, 2023 14:20

liamaharon removed the request for review from kianenigma April 17, 2023 14:21

liamaharon added 3 commits April 18, 2023 12:07

complete aimd function refactor

b25099b

make batch_request function async

edfd252

improve function name

c4c30b0

liamaharon added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Apr 18, 2023

liamaharon added 2 commits April 18, 2023 16:54

fix load_child_remote issue

18ea06e

slight efficiency improvement

ed1297f

liamaharon changed the title ~~try-runtime: retry and backoff~~ try-runtime: retry and backoff storage queries Apr 18, 2023

improve logs and variable name

f9ff3ad

liamaharon requested review from bkchr and ggwpez April 18, 2023 14:10

liamaharon marked this pull request as ready for review April 18, 2023 14:11

liamaharon requested a review from kianenigma April 18, 2023 14:12

remove redundant comment

d762582

liamaharon changed the title ~~try-runtime: retry and backoff storage queries~~ try-runtime: dynamic storage query sizes Apr 18, 2023

liamaharon added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Apr 18, 2023

ggwpez approved these changes Apr 19, 2023

View reviewed changes