Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions frame/benchmarking/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ The benchmarking framework comes with the following tools:
* [A set of macros](./src/lib.rs) (`benchmarks!`, `add_benchmark!`, etc...) to make it easy to
write, test, and add runtime benchmarks.
* [A set of linear regression analysis functions](./src/analysis.rs) for processing benchmark data.
* [A CLI extension](../../utils/frame/benchmarking-cli/) to make it easy to execute benchmarks on your
* [A CLI extension](../../utils/frame/benchmarking-cli/README.md) to make it easy to execute benchmarks on your
node.

The end-to-end benchmarking pipeline is disabled by default when compiling a node. If you want to
Expand Down Expand Up @@ -150,9 +150,13 @@ feature flag:

```bash
cd bin/node/cli
cargo build --release --features runtime-benchmarks
cargo build --profile=production --features runtime-benchmarks
```

The production profile applies various compiler optimizations.
These optimizations slow down the compilation process *a lot*.
If you are just testing things out and don't need final numbers, use `--release` instead.
Comment thread
ggwpez marked this conversation as resolved.
Outdated

## Running Benchmarks

Finally, once you have a node binary with benchmarks enabled, you need to execute your various
Expand All @@ -161,13 +165,13 @@ benchmarks.
You can get a list of the available benchmarks by running:

```bash
./target/release/substrate benchmark --chain dev --pallet "*" --extrinsic "*" --repeat 0
./target/production/substrate benchmark pallet --chain dev --pallet "*" --extrinsic "*" --repeat 0
```

Then you can run a benchmark like so:

```bash
./target/release/substrate benchmark \
./target/production/substrate benchmark pallet \
--chain dev \ # Configurable Chain Spec
--execution=wasm \ # Always test with Wasm
--wasm-execution=compiled \ # Always used `wasm-time`
Expand Down Expand Up @@ -200,7 +204,7 @@ used for joining all the arguments passed to the CLI.
To get a full list of available options when running benchmarks, run:

```bash
./target/release/substrate benchmark --help
./target/production/substrate benchmark --help
```

License: Apache-2.0
47 changes: 46 additions & 1 deletion utils/frame/benchmarking-cli/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,46 @@
License: Apache-2.0
# The Benchmarking CLI

This crate contains commands to benchmark various aspects of Substrate and the hardware.
All commands are exposed by the Substrate node but can be exposed by any Substrate client.
The goal is to have a comprehensive suite of benchmarks that cover all aspects of Substrate and the hardware that its running on.

Invoking the root benchmark command prints a help menu:
```sh
$ cargo run --profile=production -- benchmark

Sub-commands concerned with benchmarking.

USAGE:
substrate benchmark <SUBCOMMAND>

OPTIONS:
-h, --help Print help information
-V, --version Print version information

SUBCOMMANDS:
block Benchmark the execution time of historic blocks
machine Command to benchmark the hardware.
overhead Benchmark the execution overhead per-block and per-extrinsic
pallet Benchmark the extrinsic weight of FRAME Pallets
storage Benchmark the storage speed of a chain snapshot
```

All examples use the `production` profile for correctness which makes the compilation *very* slow; for testing you can use `--release`.
For the final results the `production` profile and reference hardware should be used, otherwise the results are not comparable.

The sub-commands are explained in depth here:
- [block] Compare the weight of a historic block to its actual resource usage
- [machine] Gauges the speed of the hardware
- [overhead] Creates weight files for the *Block*- and *Extrinsic*-base weights
- [pallet] Creates weight files for a Pallet
- [storage] Creates weight files for *Read* and *Write* storage operations

License: Apache-2.0

<!-- LINKS -->

[pallet]: ../../../frame/benchmarking/README.md
[machine]: src/machine/README.md
[storage]: src/storage/README.md
[overhead]: src/overhead/README.md
[block]: src/block/README.md
118 changes: 118 additions & 0 deletions utils/frame/benchmarking-cli/src/block/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# The `benchmark block` command

The whole benchmarking process in Substrate aims to predict the resource usage of an unexecuted block.
This command measures how accurate this prediction was by executing a block and comparing the predicted weight to its actual resource usage.
It can be used to measure the accuracy of the pallet benchmarking.

In the following it will be explained once for Polkadot and once for Substrate.

## Polkadot # 1
<sup>(Also works for Kusama, Westend and Rococo)</sup>


Suppose you either have a synced Polkadot node or downloaded a snapshot from [Polkachu].
This example uses a pruned ParityDB snapshot from the 2022-4-19 with the last block being 9939462.
For pruned snapshots you need to know the number of the last block (to be improved [here]).
Pruned snapshots normally store the last 256 blocks, archive nodes can use any block range.

In this example we will benchmark just the last 10 blocks:
```sh
cargo run --profile=production -- benchmark block --from 9939453 --to 9939462 --db paritydb
```

Output:
```pre
Block 9939453 with 2 tx used 4.57% of its weight ( 26,458,801 of 579,047,053 ns)
Block 9939454 with 3 tx used 4.80% of its weight ( 28,335,826 of 590,414,831 ns)
Block 9939455 with 2 tx used 4.76% of its weight ( 27,889,567 of 586,484,595 ns)
Block 9939456 with 2 tx used 4.65% of its weight ( 27,101,306 of 582,789,723 ns)
Block 9939457 with 2 tx used 4.62% of its weight ( 26,908,882 of 582,789,723 ns)
Block 9939458 with 2 tx used 4.78% of its weight ( 28,211,440 of 590,179,467 ns)
Block 9939459 with 4 tx used 4.78% of its weight ( 27,866,077 of 583,260,451 ns)
Block 9939460 with 3 tx used 4.72% of its weight ( 27,845,836 of 590,462,629 ns)
Block 9939461 with 2 tx used 4.58% of its weight ( 26,685,119 of 582,789,723 ns)
Block 9939462 with 2 tx used 4.60% of its weight ( 26,840,938 of 583,697,101 ns)
```

### Output Interpretation

<sup>(Only results from reference hardware are relevant)</sup>

Each block is executed multiple times and the results are averaged.
The percent number is the interesting part and indicates how much weight was used as compared to how much was predicted.
The closer to 100% this is without exceeding 100%, the better.
If it exceeds 100%, the block is marked with "**OVER WEIGHT!**" to easier spot them. This is not good since then the benchmarking under-estimated the weight.
This would mean that an honest validator would possibly not be able to keep up with importing blocks since users did not pay for enough weight.
If that happens the validator could lag behind the chain and get slashed for missing deadlines.
It is therefore important to investigate any overweight blocks.

In this example you can see an unexpected result; only < 5% of the weight was used!
The measured blocks can be executed much faster than predicted.
This means that the benchmarking process massively over-estimated the execution time.
Since they are off by so much, it is an issue [polkadot#5192].

The ideal range for these results would be 85-100%.

## Polkadot # 2

Let's take a more interesting example where the blocks use more of their predicted weight.
Every day when validators pay out rewards, the blocks are nearly full.
Using an archive node here is the easiest.

The Polkadot blocks TODO-TODO for example contain large batch transactions for staking payout.

```sh
cargo run --profile=production -- benchmark block --from TODO --to TODO --db paritydb
```

```pre
TODO
```

## Substrate

It is also possible to try the procedure in Substrate, although it's a bit boring.

First you need to create some blocks with either a local or dev chain.
This example will use the standard development spec.
Pick a non existing directory where the chain data will be stored, eg `/tmp/dev`.
```sh
cargo run --profile=production -- --dev -d /tmp/dev
```
You should see after some seconds that it started to produce blocks:
```pre
✨ Imported #1 (0x801d…9189)
```
You can now kill the node with `Ctrl+C`. Then measure how long it takes to execute these blocks:
```sh
cargo run --profile=production -- benchmark block --from 1 --to 1 --dev -d /tmp/dev --pruning archive
```
This will benchmark the first block. If you killed the node at a later point, you can measure multiple blocks.
```pre
Block 1 with 1 tx used 72.04% of its weight ( 4,945,664 of 6,864,702 ns)
```

In this example the block used ~72% of its weight.
The benchmarking therefore over-estimated the effort to execute the block.
Since this block is empty, its not very interesting.

## Arguments

- `--from` Number of the first block to measure (inclusive).
- `--to` Number of the last block to measure (inclusive).
- `--repeat` How often each block should be measured.
- [`--db`]
- [`--pruning`]

License: Apache-2.0

<!-- LINKS -->

[Polkachu]: https://polkachu.com/snapshots
[here]: https://github.com/paritytech/substrate/issues/11141
[polkadot#5192]: https://github.com/paritytech/polkadot/issues/5192

[`--db`]: ../shared/README.md#arguments
[`--pruning`]: ../shared/README.md#arguments
71 changes: 71 additions & 0 deletions utils/frame/benchmarking-cli/src/machine/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# The `benchmark machine` command

Different Substrate chains can have different hardware requirements.
It is therefore important to be able to quickly gauge if a piece of hardware fits a chains' requirements.
The `benchmark machine` command archives this by measuring key metrics and making them comparable.

Invoking the command looks like this:
```sh
cargo run --profile=production -- benchmark machine --dev
```

## Output

The output on reference hardware:

```pre
+----------+----------------+---------------+--------------+-------------------+
| Category | Function | Score | Minimum | Result |
+----------+----------------+---------------+--------------+-------------------+
| CPU | BLAKE2-256 | 1023.00 MiB/s | 1.00 GiB/s | ✅ Pass ( 99.4 %) |
+----------+----------------+---------------+--------------+-------------------+
| CPU | SR25519-Verify | 665.13 KiB/s | 666.00 KiB/s | ✅ Pass ( 99.9 %) |
+----------+----------------+---------------+--------------+-------------------+
| Memory | Copy | 14.39 GiB/s | 14.32 GiB/s | ✅ Pass (100.4 %) |
+----------+----------------+---------------+--------------+-------------------+
| Disk | Seq Write | 457.00 MiB/s | 450.00 MiB/s | ✅ Pass (101.6 %) |
+----------+----------------+---------------+--------------+-------------------+
| Disk | Rnd Write | 190.00 MiB/s | 200.00 MiB/s | ✅ Pass ( 95.0 %) |
+----------+----------------+---------------+--------------+-------------------+
```

The *score* is the average result of each benchmark. It always adheres to "higher is better".

The *category* indicate which part of the hardware was benchmarked:
- **CPU** Processor intensive task
- **Memory** RAM intensive task
- **Disk** Hard drive intensive task

The *function* is the concrete benchmark that was run:
- **BLAKE2-256** The throughput of the [Blake2-256] cryptographic hashing function with 32 KiB input. The [blake2_256 function] is used in many places in Substrate. The throughput of a hash function strongly depends on the input size, therefore we settled to use a fixed input size for comparable results.
- **SR25519 Verify** Sr25519 is an optimized version of the [Curve25519] signature scheme. Signature verification is used by Substrate when verifying extrinsics and blocks.
- **Copy** The throughput of copying memory from one place in the RAM to another.
- **Seq Write** The throughput of writing data to the storage location sequentially. It is important that the same disk is used that will later-on be used to store the chain data.
- **Rnd Write** The throughput of writing data to the storage location in a random order. This is normally much slower than the sequential write.

The *score* needs to reach the *minimum* in order to pass the benchmark. This can be reduced with the `--tolerance` flag.

The *result* indicated if a specific benchmark was passed by the machine or not. The percent number is the relative score reached to the *minimum* that is needed. The `--tolerance` flag is taken into account for this decision. For example a benchmark that passes even with 95% since the *tolerance* was set to 10% would look like this: `✅ Pass ( 95.0 %)`.

## Interpretation

Ideally all results show a `Pass` and the program exits with code 0. Currently some of the benchmarks can fail even on reference hardware; they are still being improved to make them more deterministic.
Make sure to run nothing else on the machine when benchmarking it.
You can re-run them multiple times to get more reliable results.

## Arguments

- `--tolerance` A percent number to reduce the *minimum* requirement. This should be used to ignore outliers of the benchmarks. The default value is 10%.
- `--verify-duration` How long the verification benchmark should run.
- `--disk-duration` How long the *read* and *write* benchmarks should run each.
- `--allow-fail` Always exit the program with code 0.
- `--chain` / `--dev` Specify the chain config to use. This will be used to compare the results with the requirements of the chain (WIP).
- [`--base-path`]

License: Apache-2.0

<!-- LINKS -->
[Blake2-256]: https://www.blake2.net/
[blake2_256 function]: https://crates.parity.io/sp_core/hashing/fn.blake2_256.html
[Curve25519]: https://en.wikipedia.org/wiki/Curve25519
[`--base-path`]: ../shared/README.md#arguments
Loading