paritytech · paritytech-processbot · May 25, 2022 · Apr 20, 2022 · Apr 20, 2022 · Apr 20, 2022
diff --git a/frame/benchmarking/README.md b/frame/benchmarking/README.md
@@ -43,7 +43,7 @@ The benchmarking framework comes with the following tools:
 * [A set of macros](./src/lib.rs) (`benchmarks!`, `add_benchmark!`, etc...) to make it easy to
   write, test, and add runtime benchmarks.
 * [A set of linear regression analysis functions](./src/analysis.rs) for processing benchmark data.
-* [A CLI extension](../../utils/frame/benchmarking-cli/) to make it easy to execute benchmarks on your
+* [A CLI extension](../../utils/frame/benchmarking-cli/README.md) to make it easy to execute benchmarks on your
   node.
 
 The end-to-end benchmarking pipeline is disabled by default when compiling a node. If you want to
@@ -150,9 +150,13 @@ feature flag:
 
 ```bash
 cd bin/node/cli
-cargo build --release --features runtime-benchmarks
+cargo build --profile=production --features runtime-benchmarks
 ```
 
+The production profile applies various compiler optimizations.  
+These optimizations slow down the compilation process *a lot*.  
+If you are just testing things out and don't need final numbers, use `--release` instead.
+
 ## Running Benchmarks
 
 Finally, once you have a node binary with benchmarks enabled, you need to execute your various
@@ -161,13 +165,13 @@ benchmarks.
 You can get a list of the available benchmarks by running:
 
 ```bash
-./target/release/substrate benchmark --chain dev --pallet "*" --extrinsic "*" --repeat 0
+./target/production/substrate benchmark pallet --chain dev --pallet "*" --extrinsic "*" --repeat 0
 ```
 
 Then you can run a benchmark like so:
 
 ```bash
-./target/release/substrate benchmark \
+./target/production/substrate benchmark pallet \
     --chain dev \                  # Configurable Chain Spec
     --execution=wasm \             # Always test with Wasm
     --wasm-execution=compiled \    # Always used `wasm-time`
@@ -200,7 +204,7 @@ used for joining all the arguments passed to the CLI.
 To get a full list of available options when running benchmarks, run:
 
 ```bash
-./target/release/substrate benchmark --help
+./target/production/substrate benchmark --help
 ```
 
 License: Apache-2.0
diff --git a/utils/frame/benchmarking-cli/README.md b/utils/frame/benchmarking-cli/README.md
@@ -1 +1,46 @@
-License: Apache-2.0
+# The Benchmarking CLI
+
+This crate contains commands to benchmark various aspects of Substrate and the hardware.  
+All commands are exposed by the Substrate node but can be exposed by any Substrate client.  
+The goal is to have a comprehensive suite of benchmarks that cover all aspects of Substrate and the hardware that its running on.
+
+Invoking the root benchmark command prints a help menu:  
+```sh
+$ cargo run --profile=production -- benchmark
+
+Sub-commands concerned with benchmarking.
+
+USAGE:
+    substrate benchmark <SUBCOMMAND>
+
+OPTIONS:
+    -h, --help       Print help information
+    -V, --version    Print version information
+
+SUBCOMMANDS:
+    block       Benchmark the execution time of historic blocks
+    machine     Command to benchmark the hardware.
+    overhead    Benchmark the execution overhead per-block and per-extrinsic
+    pallet      Benchmark the extrinsic weight of FRAME Pallets
+    storage     Benchmark the storage speed of a chain snapshot
+```
+
+All examples use the `production` profile for correctness which makes the compilation *very* slow; for testing you can use `--release`.  
+For the final results the `production` profile and reference hardware should be used, otherwise the results are not comparable.
+
+The sub-commands are explained in depth here:  
+- [block] Compare the weight of a historic block to its actual resource usage
+- [machine] Gauges the speed of the hardware
+- [overhead] Creates weight files for the *Block*- and *Extrinsic*-base weights
+- [pallet] Creates weight files for a Pallet
+- [storage] Creates weight files for *Read* and *Write* storage operations
+
+License: Apache-2.0
+
+<!-- LINKS -->
+
+[pallet]: ../../../frame/benchmarking/README.md
+[machine]: src/machine/README.md
+[storage]: src/storage/README.md
+[overhead]: src/overhead/README.md
+[block]: src/block/README.md
diff --git a/utils/frame/benchmarking-cli/src/block/README.md b/utils/frame/benchmarking-cli/src/block/README.md
@@ -0,0 +1,118 @@
+# The `benchmark block` command
+
+The whole benchmarking process in Substrate aims to predict the resource usage of an unexecuted block.  
+This command measures how accurate this prediction was by executing a block and comparing the predicted weight to its actual resource usage.  
+It can be used to measure the accuracy of the pallet benchmarking.
+
+In the following it will be explained once for Polkadot and once for Substrate.  
+
+## Polkadot # 1
+<sup>(Also works for Kusama, Westend and Rococo)</sup>
+
+
+Suppose you either have a synced Polkadot node or downloaded a snapshot from [Polkachu].  
+This example uses a pruned ParityDB snapshot from the 2022-4-19 with the last block being 9939462.  
+For pruned snapshots you need to know the number of the last block (to be improved [here]).    
+Pruned snapshots normally store the last 256 blocks, archive nodes can use any block range.  
+
+In this example we will benchmark just the last 10 blocks:  
+```sh
+cargo run --profile=production -- benchmark block --from 9939453 --to 9939462 --db paritydb
+```
+
+Output:
+```pre
+Block 9939453 with     2 tx used   4.57% of its weight (    26,458,801 of    579,047,053 ns)    
+Block 9939454 with     3 tx used   4.80% of its weight (    28,335,826 of    590,414,831 ns)    
+Block 9939455 with     2 tx used   4.76% of its weight (    27,889,567 of    586,484,595 ns)    
+Block 9939456 with     2 tx used   4.65% of its weight (    27,101,306 of    582,789,723 ns)    
+Block 9939457 with     2 tx used   4.62% of its weight (    26,908,882 of    582,789,723 ns)    
+Block 9939458 with     2 tx used   4.78% of its weight (    28,211,440 of    590,179,467 ns)    
+Block 9939459 with     4 tx used   4.78% of its weight (    27,866,077 of    583,260,451 ns)    
+Block 9939460 with     3 tx used   4.72% of its weight (    27,845,836 of    590,462,629 ns)    
+Block 9939461 with     2 tx used   4.58% of its weight (    26,685,119 of    582,789,723 ns)    
+Block 9939462 with     2 tx used   4.60% of its weight (    26,840,938 of    583,697,101 ns)    
+```
+
+### Output Interpretation
+
+<sup>(Only results from reference hardware are relevant)</sup>
+
+Each block is executed multiple times and the results are averaged.  
+The percent number is the interesting part and indicates how much weight was used as compared to how much was predicted.  
+The closer to 100% this is without exceeding 100%, the better.  
+If it exceeds 100%, the block is marked with "**OVER WEIGHT!**" to easier spot them. This is not good since then the benchmarking under-estimated the weight.  
+This would mean that an honest validator would possibly not be able to keep up with importing blocks since users did not pay for enough weight.  
+If that happens the validator could lag behind the chain and get slashed for missing deadlines.  
+It is therefore important to investigate any overweight blocks.  
+
+In this example you can see an unexpected result; only < 5% of the weight was used!  
+The measured blocks can be executed much faster than predicted.  
+This means that the benchmarking process massively over-estimated the execution time.  
+Since they are off by so much, it is an issue [polkadot#5192].  
+
+The ideal range for these results would be 85-100%.
+
+## Polkadot # 2
+
+Let's take a more interesting example where the blocks use more of their predicted weight.  
+Every day when validators pay out rewards, the blocks are nearly full.  
+Using an archive node here is the easiest.  
+
+The Polkadot blocks TODO-TODO for example contain large batch transactions for staking payout.  
+
+```sh
+cargo run --profile=production -- benchmark block --from TODO --to TODO --db paritydb
+```
+
+```pre
+TODO
+```
+
+## Substrate
+
+It is also possible to try the procedure in Substrate, although it's a bit boring.  
+
+First you need to create some blocks with either a local or dev chain.  
+This example will use the standard development spec.  
+Pick a non existing directory where the chain data will be stored, eg `/tmp/dev`.
+```sh
+cargo run --profile=production -- --dev -d /tmp/dev
+```
+You should see after some seconds that it started to produce blocks:  
+```pre
+…
+✨ Imported #1 (0x801d…9189)
+…
+```
+You can now kill the node with `Ctrl+C`. Then measure how long it takes to execute these blocks:  
+```sh
+cargo run --profile=production -- benchmark block --from 1 --to 1 --dev -d /tmp/dev --pruning archive
+```
+This will benchmark the first block. If you killed the node at a later point, you can measure multiple blocks.
+```pre
+Block 1 with     1 tx used  72.04% of its weight (     4,945,664 of      6,864,702 ns)
+```
+
+In this example the block used ~72% of its weight.  
+The benchmarking therefore over-estimated the effort to execute the block.  
+Since this block is empty, its not very interesting.
+
+## Arguments
+
+- `--from` Number of the first block to measure (inclusive).
+- `--to` Number of the last block to measure (inclusive).
+- `--repeat` How often each block should be measured.
+- [`--db`]
+- [`--pruning`]
+
+License: Apache-2.0
+
+<!-- LINKS -->
+
+[Polkachu]: https://polkachu.com/snapshots
+[here]: https://github.com/paritytech/substrate/issues/11141
+[polkadot#5192]: https://github.com/paritytech/polkadot/issues/5192
+
+[`--db`]: ../shared/README.md#arguments
+[`--pruning`]: ../shared/README.md#arguments
diff --git a/utils/frame/benchmarking-cli/src/machine/README.md b/utils/frame/benchmarking-cli/src/machine/README.md
@@ -0,0 +1,71 @@
+# The `benchmark machine` command
+
+Different Substrate chains can have different hardware requirements.  
+It is therefore important to be able to quickly gauge if a piece of hardware fits a chains' requirements.  
+The `benchmark machine` command archives this by measuring key metrics and making them comparable.  
+
+Invoking the command looks like this:  
+```sh
+cargo run --profile=production -- benchmark machine --dev
+```
+
+## Output
+
+The output on reference hardware:  
+
+```pre
++----------+----------------+---------------+--------------+-------------------+
+| Category | Function       | Score         | Minimum      | Result            |
++----------+----------------+---------------+--------------+-------------------+
+| CPU      | BLAKE2-256     | 1023.00 MiB/s | 1.00 GiB/s   | ✅ Pass ( 99.4 %) |
++----------+----------------+---------------+--------------+-------------------+
+| CPU      | SR25519-Verify | 665.13 KiB/s  | 666.00 KiB/s | ✅ Pass ( 99.9 %) |
++----------+----------------+---------------+--------------+-------------------+
+| Memory   | Copy           | 14.39 GiB/s   | 14.32 GiB/s  | ✅ Pass (100.4 %) |
++----------+----------------+---------------+--------------+-------------------+
+| Disk     | Seq Write      | 457.00 MiB/s  | 450.00 MiB/s | ✅ Pass (101.6 %) |
++----------+----------------+---------------+--------------+-------------------+
+| Disk     | Rnd Write      | 190.00 MiB/s  | 200.00 MiB/s | ✅ Pass ( 95.0 %) |
++----------+----------------+---------------+--------------+-------------------+
+```
+
+The *score* is the average result of each benchmark. It always adheres to "higher is better".  
+
+The *category* indicate which part of the hardware was benchmarked:  
+- **CPU** Processor intensive task
+- **Memory** RAM intensive task
+- **Disk** Hard drive intensive task
+
+The *function* is the concrete benchmark that was run:  
+- **BLAKE2-256** The throughput of the [Blake2-256] cryptographic hashing function with 32 KiB input. The [blake2_256 function] is used in many places in Substrate. The throughput of a hash function strongly depends on the input size, therefore we settled to use a fixed input size for comparable results.
+- **SR25519 Verify** Sr25519 is an optimized version of the [Curve25519] signature scheme. Signature verification is used by Substrate when verifying extrinsics and blocks.
+- **Copy** The throughput of copying memory from one place in the RAM to another.
+- **Seq Write** The throughput of writing data to the storage location sequentially. It is important that the same disk is used that will later-on be used to store the chain data.
+- **Rnd Write** The throughput of writing data to the storage location in a random order. This is normally much slower than the sequential write.
+
+The *score* needs to reach the *minimum* in order to pass the benchmark. This can be reduced with the `--tolerance` flag.
+
+The *result* indicated if a specific benchmark was passed by the machine or not. The percent number is the relative score reached to the *minimum* that is needed. The `--tolerance` flag is taken into account for this decision. For example a benchmark that passes even with 95% since the *tolerance* was set to 10% would look like this: `✅ Pass ( 95.0 %)`.
+
+## Interpretation
+
+Ideally all results show a `Pass` and the program exits with code 0. Currently some of the benchmarks can fail even on reference hardware; they are still being improved to make them more deterministic.  
+Make sure to run nothing else on the machine when benchmarking it.  
+You can re-run them multiple times to get more reliable results.
+
+## Arguments
+
+- `--tolerance` A percent number to reduce the *minimum* requirement. This should be used to ignore outliers of the benchmarks. The default value is 10%.
+- `--verify-duration` How long the verification benchmark should run.
+- `--disk-duration` How long the *read* and *write* benchmarks should run each.
+- `--allow-fail` Always exit the program with code 0.
+- `--chain` / `--dev` Specify the chain config to use. This will be used to compare the results with the requirements of the chain (WIP).
+- [`--base-path`]
+
+License: Apache-2.0
+
+<!-- LINKS -->
+[Blake2-256]: https://www.blake2.net/
+[blake2_256 function]: https://crates.parity.io/sp_core/hashing/fn.blake2_256.html
+[Curve25519]: https://en.wikipedia.org/wiki/Curve25519
+[`--base-path`]: ../shared/README.md#arguments