This repository was archived by the owner on Nov 15, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Document benchmarking CLI #11246
Merged
Merged
Document benchmarking CLI #11246
Changes from 6 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
d969e62
Decrese default repeats
ggwpez 87bbfb0
Add benchmarking READMEs
ggwpez 33a18d6
Update docs
ggwpez 5ad37b7
Update docs
ggwpez 2c47521
Update README
ggwpez d601258
Merge remote-tracking branch 'origin/master' into oty-bench-readme
e593603
Review fixes
ggwpez File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,46 @@ | ||
| License: Apache-2.0 | ||
| # The Benchmarking CLI | ||
|
|
||
| This crate contains commands to benchmark various aspects of Substrate and the hardware. | ||
| All commands are exposed by the Substrate node but can be exposed by any Substrate client. | ||
| The goal is to have a comprehensive suite of benchmarks that cover all aspects of Substrate and the hardware that its running on. | ||
|
|
||
| Invoking the root benchmark command prints a help menu: | ||
| ```sh | ||
| $ cargo run --profile=production -- benchmark | ||
|
|
||
| Sub-commands concerned with benchmarking. | ||
|
|
||
| USAGE: | ||
| substrate benchmark <SUBCOMMAND> | ||
|
|
||
| OPTIONS: | ||
| -h, --help Print help information | ||
| -V, --version Print version information | ||
|
|
||
| SUBCOMMANDS: | ||
| block Benchmark the execution time of historic blocks | ||
| machine Command to benchmark the hardware. | ||
| overhead Benchmark the execution overhead per-block and per-extrinsic | ||
| pallet Benchmark the extrinsic weight of FRAME Pallets | ||
| storage Benchmark the storage speed of a chain snapshot | ||
| ``` | ||
|
|
||
| All examples use the `production` profile for correctness which makes the compilation *very* slow; for testing you can use `--release`. | ||
| For the final results the `production` profile and reference hardware should be used, otherwise the results are not comparable. | ||
|
|
||
| The sub-commands are explained in depth here: | ||
| - [block] Compare the weight of a historic block to its actual resource usage | ||
| - [machine] Gauges the speed of the hardware | ||
| - [overhead] Creates weight files for the *Block*- and *Extrinsic*-base weights | ||
| - [pallet] Creates weight files for a Pallet | ||
| - [storage] Creates weight files for *Read* and *Write* storage operations | ||
|
|
||
| License: Apache-2.0 | ||
|
|
||
| <!-- LINKS --> | ||
|
|
||
| [pallet]: ../../../frame/benchmarking/README.md | ||
| [machine]: src/machine/README.md | ||
| [storage]: src/storage/README.md | ||
| [overhead]: src/overhead/README.md | ||
| [block]: src/block/README.md |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| # The `benchmark block` command | ||
|
|
||
| The whole benchmarking process in Substrate aims to predict the resource usage of an unexecuted block. | ||
| This command measures how accurate this prediction was by executing a block and comparing the predicted weight to its actual resource usage. | ||
| It can be used to measure the accuracy of the pallet benchmarking. | ||
|
|
||
| In the following it will be explained once for Polkadot and once for Substrate. | ||
|
|
||
| ## Polkadot # 1 | ||
| <sup>(Also works for Kusama, Westend and Rococo)</sup> | ||
|
|
||
|
|
||
| Suppose you either have a synced Polkadot node or downloaded a snapshot from [Polkachu]. | ||
| This example uses a pruned ParityDB snapshot from the 2022-4-19 with the last block being 9939462. | ||
| For pruned snapshots you need to know the number of the last block (to be improved [here]). | ||
| Pruned snapshots normally store the last 256 blocks, archive nodes can use any block range. | ||
|
|
||
| In this example we will benchmark just the last 10 blocks: | ||
| ```sh | ||
| cargo run --profile=production -- benchmark block --from 9939453 --to 9939462 --db paritydb | ||
| ``` | ||
|
|
||
| Output: | ||
| ```pre | ||
| Block 9939453 with 2 tx used 4.57% of its weight ( 26,458,801 of 579,047,053 ns) | ||
| Block 9939454 with 3 tx used 4.80% of its weight ( 28,335,826 of 590,414,831 ns) | ||
| Block 9939455 with 2 tx used 4.76% of its weight ( 27,889,567 of 586,484,595 ns) | ||
| Block 9939456 with 2 tx used 4.65% of its weight ( 27,101,306 of 582,789,723 ns) | ||
| Block 9939457 with 2 tx used 4.62% of its weight ( 26,908,882 of 582,789,723 ns) | ||
| Block 9939458 with 2 tx used 4.78% of its weight ( 28,211,440 of 590,179,467 ns) | ||
| Block 9939459 with 4 tx used 4.78% of its weight ( 27,866,077 of 583,260,451 ns) | ||
| Block 9939460 with 3 tx used 4.72% of its weight ( 27,845,836 of 590,462,629 ns) | ||
| Block 9939461 with 2 tx used 4.58% of its weight ( 26,685,119 of 582,789,723 ns) | ||
| Block 9939462 with 2 tx used 4.60% of its weight ( 26,840,938 of 583,697,101 ns) | ||
| ``` | ||
|
|
||
| ### Output Interpretation | ||
|
|
||
| <sup>(Only results from reference hardware are relevant)</sup> | ||
|
|
||
| Each block is executed multiple times and the results are averaged. | ||
| The percent number is the interesting part and indicates how much weight was used as compared to how much was predicted. | ||
| The closer to 100% this is without exceeding 100%, the better. | ||
| If it exceeds 100%, the block is marked with "**OVER WEIGHT!**" to easier spot them. This is not good since then the benchmarking under-estimated the weight. | ||
| This would mean that an honest validator would possibly not be able to keep up with importing blocks since users did not pay for enough weight. | ||
| If that happens the validator could lag behind the chain and get slashed for missing deadlines. | ||
| It is therefore important to investigate any overweight blocks. | ||
|
|
||
| In this example you can see an unexpected result; only < 5% of the weight was used! | ||
| The measured blocks can be executed much faster than predicted. | ||
| This means that the benchmarking process massively over-estimated the execution time. | ||
| Since they are off by so much, it is an issue [polkadot#5192]. | ||
|
|
||
| The ideal range for these results would be 85-100%. | ||
|
|
||
| ## Polkadot # 2 | ||
|
|
||
| Let's take a more interesting example where the blocks use more of their predicted weight. | ||
| Every day when validators pay out rewards, the blocks are nearly full. | ||
| Using an archive node here is the easiest. | ||
|
|
||
| The Polkadot blocks TODO-TODO for example contain large batch transactions for staking payout. | ||
|
|
||
| ```sh | ||
| cargo run --profile=production -- benchmark block --from TODO --to TODO --db paritydb | ||
| ``` | ||
|
|
||
| ```pre | ||
| TODO | ||
| ``` | ||
|
|
||
| ## Substrate | ||
|
|
||
| It is also possible to try the procedure in Substrate, although it's a bit boring. | ||
|
|
||
| First you need to create some blocks with either a local or dev chain. | ||
| This example will use the standard development spec. | ||
| Pick a non existing directory where the chain data will be stored, eg `/tmp/dev`. | ||
| ```sh | ||
| cargo run --profile=production -- --dev -d /tmp/dev | ||
| ``` | ||
| You should see after some seconds that it started to produce blocks: | ||
| ```pre | ||
| … | ||
| ✨ Imported #1 (0x801d…9189) | ||
| … | ||
| ``` | ||
| You can now kill the node with `Ctrl+C`. Then measure how long it takes to execute these blocks: | ||
| ```sh | ||
| cargo run --profile=production -- benchmark block --from 1 --to 1 --dev -d /tmp/dev --pruning archive | ||
| ``` | ||
| This will benchmark the first block. If you killed the node at a later point, you can measure multiple blocks. | ||
| ```pre | ||
| Block 1 with 1 tx used 72.04% of its weight ( 4,945,664 of 6,864,702 ns) | ||
| ``` | ||
|
|
||
| In this example the block used ~72% of its weight. | ||
| The benchmarking therefore over-estimated the effort to execute the block. | ||
| Since this block is empty, its not very interesting. | ||
|
|
||
| ## Arguments | ||
|
|
||
| - `--from` Number of the first block to measure (inclusive). | ||
| - `--to` Number of the last block to measure (inclusive). | ||
| - `--repeat` How often each block should be measured. | ||
| - [`--db`] | ||
| - [`--pruning`] | ||
|
|
||
| License: Apache-2.0 | ||
|
|
||
| <!-- LINKS --> | ||
|
|
||
| [Polkachu]: https://polkachu.com/snapshots | ||
| [here]: https://github.com/paritytech/substrate/issues/11141 | ||
| [polkadot#5192]: https://github.com/paritytech/polkadot/issues/5192 | ||
|
|
||
| [`--db`]: ../shared/README.md#arguments | ||
| [`--pruning`]: ../shared/README.md#arguments |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| # The `benchmark machine` command | ||
|
|
||
| Different Substrate chains can have different hardware requirements. | ||
| It is therefore important to be able to quickly gauge if a piece of hardware fits a chains' requirements. | ||
| The `benchmark machine` command archives this by measuring key metrics and making them comparable. | ||
|
|
||
| Invoking the command looks like this: | ||
| ```sh | ||
| cargo run --profile=production -- benchmark machine --dev | ||
| ``` | ||
|
|
||
| ## Output | ||
|
|
||
| The output on reference hardware: | ||
|
|
||
| ```pre | ||
| +----------+----------------+---------------+--------------+-------------------+ | ||
| | Category | Function | Score | Minimum | Result | | ||
| +----------+----------------+---------------+--------------+-------------------+ | ||
| | CPU | BLAKE2-256 | 1023.00 MiB/s | 1.00 GiB/s | ✅ Pass ( 99.4 %) | | ||
| +----------+----------------+---------------+--------------+-------------------+ | ||
| | CPU | SR25519-Verify | 665.13 KiB/s | 666.00 KiB/s | ✅ Pass ( 99.9 %) | | ||
| +----------+----------------+---------------+--------------+-------------------+ | ||
| | Memory | Copy | 14.39 GiB/s | 14.32 GiB/s | ✅ Pass (100.4 %) | | ||
| +----------+----------------+---------------+--------------+-------------------+ | ||
| | Disk | Seq Write | 457.00 MiB/s | 450.00 MiB/s | ✅ Pass (101.6 %) | | ||
| +----------+----------------+---------------+--------------+-------------------+ | ||
| | Disk | Rnd Write | 190.00 MiB/s | 200.00 MiB/s | ✅ Pass ( 95.0 %) | | ||
| +----------+----------------+---------------+--------------+-------------------+ | ||
| ``` | ||
|
|
||
| The *score* is the average result of each benchmark. It always adheres to "higher is better". | ||
|
|
||
| The *category* indicate which part of the hardware was benchmarked: | ||
| - **CPU** Processor intensive task | ||
| - **Memory** RAM intensive task | ||
| - **Disk** Hard drive intensive task | ||
|
|
||
| The *function* is the concrete benchmark that was run: | ||
| - **BLAKE2-256** The throughput of the [Blake2-256] cryptographic hashing function with 32 KiB input. The [blake2_256 function] is used in many places in Substrate. The throughput of a hash function strongly depends on the input size, therefore we settled to use a fixed input size for comparable results. | ||
| - **SR25519 Verify** Sr25519 is an optimized version of the [Curve25519] signature scheme. Signature verification is used by Substrate when verifying extrinsics and blocks. | ||
| - **Copy** The throughput of copying memory from one place in the RAM to another. | ||
| - **Seq Write** The throughput of writing data to the storage location sequentially. It is important that the same disk is used that will later-on be used to store the chain data. | ||
| - **Rnd Write** The throughput of writing data to the storage location in a random order. This is normally much slower than the sequential write. | ||
|
|
||
| The *score* needs to reach the *minimum* in order to pass the benchmark. This can be reduced with the `--tolerance` flag. | ||
|
|
||
| The *result* indicated if a specific benchmark was passed by the machine or not. The percent number is the relative score reached to the *minimum* that is needed. The `--tolerance` flag is taken into account for this decision. For example a benchmark that passes even with 95% since the *tolerance* was set to 10% would look like this: `✅ Pass ( 95.0 %)`. | ||
|
|
||
| ## Interpretation | ||
|
|
||
| Ideally all results show a `Pass` and the program exits with code 0. Currently some of the benchmarks can fail even on reference hardware; they are still being improved to make them more deterministic. | ||
| Make sure to run nothing else on the machine when benchmarking it. | ||
| You can re-run them multiple times to get more reliable results. | ||
|
|
||
| ## Arguments | ||
|
|
||
| - `--tolerance` A percent number to reduce the *minimum* requirement. This should be used to ignore outliers of the benchmarks. The default value is 10%. | ||
| - `--verify-duration` How long the verification benchmark should run. | ||
| - `--disk-duration` How long the *read* and *write* benchmarks should run each. | ||
| - `--allow-fail` Always exit the program with code 0. | ||
| - `--chain` / `--dev` Specify the chain config to use. This will be used to compare the results with the requirements of the chain (WIP). | ||
| - [`--base-path`] | ||
|
|
||
| License: Apache-2.0 | ||
|
|
||
| <!-- LINKS --> | ||
| [Blake2-256]: https://www.blake2.net/ | ||
| [blake2_256 function]: https://crates.parity.io/sp_core/hashing/fn.blake2_256.html | ||
| [Curve25519]: https://en.wikipedia.org/wiki/Curve25519 | ||
| [`--base-path`]: ../shared/README.md#arguments |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.