Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

try-runtime: dynamic storage query sizes #13923

Merged
merged 21 commits into from
Apr 21, 2023

Conversation

liamaharon
Copy link
Contributor

@liamaharon liamaharon commented Apr 14, 2023

Closes #13246
polkadot companion: paritytech/polkadot#7111

Changes

  • Created get_storage_data_dynamic_batch_size, allowing users to efficiently process a large amount of jsonrpc payloads without knowing the node max payload size configuration. The function calls itself recursively, increasing batch_size when the node responds successfully and reducing it when the node returns an error.
    async fn get_storage_data_dynamic_batch_size(
    	client: &Arc<HttpClient>,
    	payloads: Vec<(String, ArrayParams)>,
    	batch_size: usize,
    ) -> Result<Vec<Option<StorageData>>, String>
  • Implemented get_storage_data_dynamic_batch_size for fetching storage keys.
  • Switched to using a Http provider for these huge requests. Using a WsClient I continuously encountered unhandleable errors from within the jsonrpsee library such as "961d622f: accumulated message length exceeds maximum" and the node would also randomly refuse to continue servicing WsClient, returning incorrectly formatted errors (missing request IDs) that were also causing more internal errors inside the jsonrpsee library. I'm not sure whether this issue is in the jsonrpsee library or the way the Substrate RPC is configured or worth this is worth opening an issue to investigate, let me know if I should.
  • Removed parallelisation from rpc_child_get_keys. It runs extremely fast on just one thread (2 seconds on a Macbook Pro), and the logic to run it across multiple threads was quite complex and causing some difficult to debug issues. I've benchmarked this to confirm negligible performance impact, I do not think parallelising this piece of code is worth it at this time.

Notes

  • I compared the speed of loading storage from a local node against master. Performance is identical (+/- <2% between runs)
  • I tested this works fetching live state from wss://rpc.polkadot.io:443. it took 76min on my home internet, upload speed was be far the biggest bottleneck.

@liamaharon liamaharon added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Apr 14, 2023
@liamaharon liamaharon changed the title try-runtime: gracefully handle batch rpc error panic try-runtime: gracefully handle batch rpc error Apr 14, 2023
utils/frame/remote-externalities/src/lib.rs Outdated Show resolved Hide resolved
@liamaharon liamaharon changed the title try-runtime: gracefully handle batch rpc error try-runtime: retry and backoff Apr 17, 2023
@liamaharon
Copy link
Contributor Author

liamaharon commented Apr 17, 2023

Switching this to a draft and implementing a dynamic batch size as described here #13246

@liamaharon liamaharon marked this pull request as draft April 17, 2023 14:20
@liamaharon liamaharon removed the request for review from kianenigma April 17, 2023 14:21
@liamaharon liamaharon added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Apr 18, 2023
@liamaharon liamaharon changed the title try-runtime: retry and backoff try-runtime: retry and backoff storage queries Apr 18, 2023
@liamaharon liamaharon requested review from bkchr and ggwpez April 18, 2023 14:10
@liamaharon liamaharon marked this pull request as ready for review April 18, 2023 14:11
@liamaharon liamaharon changed the title try-runtime: retry and backoff storage queries try-runtime: dynamic storage query sizes Apr 18, 2023
@liamaharon liamaharon added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Apr 18, 2023
Copy link
Member

@ggwpez ggwpez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on the whole.
Could you please also test the difference with the parity nodes? https://gist.github.com/ggwpez/81db110fe4390ed9a7622f5857dfc4ff
I dont expect there to be any - besides it being more fault tolerant.

utils/frame/remote-externalities/src/lib.rs Outdated Show resolved Hide resolved
} else {
uri.clone()
};
let http_client = HttpClientBuilder::default().build(uri).map_err(|e| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared the speed of loading storage from a local node against master. Performance is identical (+/- <2% between runs)

So why do we use a WS client here formerly @niklasad1?
Is it supposed to be faster?

Copy link
Member

@niklasad1 niklasad1 Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the websocket client is slightly faster and this is used for instance by the staking miner v1 as well.

I'm still not convinced that throwing extra threads on this will be of any benefit at all, this entire could be be simplified a lot by using FuturesUnordered instead of spawning manual threads for waiting on async I/O

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserting data into storage is very cpu heavy so I think it's worth it having a dedicated core working on that.

Otherwise I agree that there's likely little to no improvement adding more threads to do network io. Switching to FuturesUnordered for the network io tasks seems like a great idea we could make as part of a future refactor.

utils/frame/remote-externalities/src/lib.rs Show resolved Hide resolved
utils/frame/remote-externalities/src/lib.rs Outdated Show resolved Hide resolved
utils/frame/remote-externalities/src/lib.rs Outdated Show resolved Hide resolved
utils/frame/remote-externalities/src/lib.rs Outdated Show resolved Hide resolved
utils/frame/remote-externalities/src/lib.rs Show resolved Hide resolved
@liamaharon
Copy link
Contributor Author

Looks good on the whole. Could you please also test the difference with the parity nodes? https://gist.github.com/ggwpez/81db110fe4390ed9a7622f5857dfc4ff I dont expect there to be any - besides it being more fault tolerant.

Confirmed these are working

@liamaharon
Copy link
Contributor Author

Hey @niklasad1, the polkadot staking-miner rpc expects that remote-externalities uses a WsClient: https://github.com/paritytech/polkadot/blob/5fd2bf8c9a0665b361eb26823030a5e0e65459b4/utils/staking-miner/src/rpc.rs#L108

Do you suggest that I refactor the staking-miner and open a companion PR, or try to modify this PR so that remote-externalities supports either a WsClient or HttpClient?

@niklasad1
Copy link
Member

niklasad1 commented Apr 20, 2023

Hey @niklasad1, the polkadot staking-miner rpc expects that remote-externalities uses a WsClient: https://github.com/paritytech/polkadot/blob/5fd2bf8c9a0665b361eb26823030a5e0e65459b4/utils/staking-miner/src/rpc.rs#L108

I think it's fine to use the HTTP client just for the "batch requests/remote externalities" just change https://github.com/paritytech/polkadot/blob/master/utils/staking-miner/src/main.rs#L315-#L323 to take an URL instead (in case it would be a ws:// but you already added converting ws to http URL under the hood)

@liamaharon liamaharon added A4-companion A PR that should be considered alongside another (usually more comprehensive and detailed) PR. and removed A4-companion A PR that should be considered alongside another (usually more comprehensive and detailed) PR. labels Apr 20, 2023
@liamaharon
Copy link
Contributor Author

bot merge

@paritytech-processbot
Copy link

Waiting for commit status.

@liamaharon
Copy link
Contributor Author

bot merge force

@paritytech-processbot paritytech-processbot bot merged commit 2059dca into master Apr 21, 2023
@paritytech-processbot paritytech-processbot bot deleted the liam-fix-try-runtime-rpc-panic branch April 21, 2023 09:16
nathanwhit pushed a commit to nathanwhit/substrate that referenced this pull request Jul 19, 2023
* improve batch rpc error message

* wip aimd storage data fetch

* complete aimd function refactor

* make batch_request function async

* improve function name

* fix load_child_remote issue

* slight efficiency improvement

* improve logs and variable name

* remove redundant comment

* improve comment

* address pr comments

* Update utils/frame/remote-externalities/src/lib.rs

Co-authored-by: Niklas Adolfsson <[email protected]>

* simplify client handling

* fix type issue

* fix clippy issue

* try to trigger ci

* try to trigger ci

---------

Co-authored-by: Niklas Adolfsson <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Retry and back-off for try-runtime batch request
4 participants