Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frame-omni-bencher fix for --repeat > 1 #5083

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

bkontur
Copy link
Contributor

@bkontur bkontur commented Jul 19, 2024

Summary

I encountered a problem with the fellows' repository when I tried to regenerate weights for runtime using --steps 50 --repeat 20 and frame-omni-bencher here. I faced a strange issue with several benchmarks, and after some debugging, I discovered that e.g. pallet_xcm::SafeXcmVersion was removed during the second run. To address this, I had to apply a hotfix which sets the version every time, because the SafeXcmVersion from genesis was ignored in subsequent runs, works only for the first run.

Testing

I added several temporary CI pipelines with --repeat=2 setup for frame-omni-bencher.
Locally, see quick-benchmarks-omni-repeat-2-leave-intent and quick-benchmarks-omni-repeat-2-leave-intent how to run and simulate.

From local debugging, I see this strange behavior:

pallet_collator_selection::leave_intent and --steps=2 --repeat=1 works OK:

[2024-07-19T15:48:36Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:48:36Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:48:36Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:48:36Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:48:36Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:48:36Z ERROR bender::leave_intent] eligible_collators: 2
Pallet: "pallet_collator_selection", Extrinsic: "leave_intent", Lowest values: [], Highest values: [], Steps: 2, Repeat: 1

pallet_collator_selection::leave_intent and --steps=2 --repeat=2 fails:

[2024-07-19T15:45:59Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:45:59Z ERROR bender::leave_intent] eligible_collators: 2
[2024-07-19T15:45:59Z ERROR bender::leave_intent] eligible_collators: 0
The following 1 benchmarks failed:
- pallet_collator_selection::leave_intent
Error: Input("1 benchmarks failed")

Solution?

I suspect that genesis state is not re-initialized correctly after for second repeat - see suddenly eligible_collators: 0, but I will continue with debugging.

I think it could be related to the:

let (genesis_storage, genesis_changes) =
			self.genesis_storage::<Hasher, ExtraHostFunctions>(&chain_spec)?;

and

        /// Produce a genesis storage and genesis changes.
	///
	/// It would be easier to only return one type, but there is no easy way to convert them.
	// TODO: Re-write `BenchmarkingState` to not be such a clusterfuck and only accept
	// `OverlayedChanges` instead of a mix between `OverlayedChanges` and `State`. But this can only
	// be done once we deprecated and removed the legacy interface :(
	fn genesis_storage<H: Hash, F: HostFunctions>(

@bkontur
Copy link
Contributor Author

bkontur commented Jul 19, 2024

cc: @ggwpez I will investigate later, but do you have any hints or thoughts?

@ggwpez
Copy link
Member

ggwpez commented Jul 19, 2024

Could be that somewhere a line like this is missing where it should reset the state:

let mut changes = genesis_changes.clone();

One way to check this would be to print the storage root hashes in every benchmark and assert that they are the same.
sp_io::storage::root or something can be used for that.

@ggwpez
Copy link
Member

ggwpez commented Jul 19, 2024

It reproduces locally, i will take a look.

.gitlab/pipeline/test.yml Outdated Show resolved Hide resolved
.gitlab/pipeline/test.yml Outdated Show resolved Hide resolved
@bkontur bkontur force-pushed the bko-frame-omni-bencher-fix branch from cc826f3 to 00403dc Compare July 19, 2024 22:36
@bkontur
Copy link
Contributor Author

bkontur commented Jul 19, 2024

It reproduces locally, i will take a look.

@ggwpez looks like your fix works, e.g. this passed: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6758635 or https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6758774

@bkontur
Copy link
Contributor Author

bkontur commented Jul 22, 2024

@ggwpez testing with fellows:

frame-omni-bencher from actual master with --repeat 2:

./frame-omni-bencher v1 benchmark pallet --runtime  ./runtimes/target/debug/wbuild/bridge-hub-kusama-runtime/bridge_hub_kusama_runtime.wasm --all --steps 2 --repeat 2
./frame-omni-bencher v1 benchmark pallet --runtime  ./runtimes/target/debug/wbuild/bridge-hub-polkadot-runtime/bridge_hub_polkadot_runtime.wasm --all --steps 2 --repeat 2
./frame-omni-bencher v1 benchmark pallet --runtime  ./runtimes/target/debug/wbuild/coretime-kusama-runtime/coretime_kusama_runtime.wasm --all --steps 2 --repeat 2

...

Failed on the first one:


The following 27 benchmarks failed:
- pallet_collator_selection::leave_intent
- pallet_xcm::force_subscribe_version_notify
- pallet_xcm::force_unsubscribe_version_notify
- pallet_xcm::send
- pallet_xcm::teleport_assets
- pallet_xcm::transfer_assets
- pallet_xcm_benchmarks::fungible::deposit_reserve_asset
- pallet_xcm_benchmarks::fungible::initiate_reserve_withdraw
- pallet_xcm_benchmarks::fungible::initiate_teleport
- pallet_xcm_benchmarks::fungible::transfer_reserve_asset
- pallet_xcm_benchmarks::generic::query_pallet
- pallet_xcm_benchmarks::generic::report_error
- pallet_xcm_benchmarks::generic::report_holding
- pallet_xcm_benchmarks::generic::report_transact_status
- pallet_xcm_benchmarks::generic::subscribe_version
- pallet_xcm_benchmarks::generic::unsubscribe_version
- snowbridge_pallet_inbound_queue::submit
- snowbridge_pallet_system::create_agent
- snowbridge_pallet_system::create_channel
- snowbridge_pallet_system::force_transfer_native_from_agent
- snowbridge_pallet_system::force_update_channel
- snowbridge_pallet_system::set_operating_mode
- snowbridge_pallet_system::set_pricing_parameters
- snowbridge_pallet_system::set_token_transfer_fees
- snowbridge_pallet_system::transfer_native_from_agent
- snowbridge_pallet_system::update_channel
- snowbridge_pallet_system::upgrade
Error: Input("27 benchmarks failed")

frame-omni-bencher from actual bko-frame-omni-bencher-fix with --repeat 2:

./frame-omni-bencher v1 benchmark pallet --runtime  ./runtimes/target/debug/wbuild/bridge-hub-kusama-runtime/bridge_hub_kusama_runtime.wasm --all --steps 2 --repeat 2
./frame-omni-bencher v1 benchmark pallet --runtime  ./runtimes/target/debug/wbuild/bridge-hub-polkadot-runtime/bridge_hub_polkadot_runtime.wasm --all --steps 2 --repeat 2
./frame-omni-bencher v1 benchmark pallet --runtime  ./runtimes/target/debug/wbuild/coretime-kusama-runtime/coretime_kusama_runtime.wasm --all --steps 2 --repeat 2


...

Passed all.

fellowship-merge-bot bot pushed a commit to polkadot-fellows/runtimes that referenced this pull request Jul 24, 2024
…pec-builder` stuff to `get_preset` (#379)

This PR introduces a new CI pipeline for checking and running runtime
benchmarks as part of the test pipelines. This means we will now know if
any changes break the benchmarks.

The pipeline is based on downloading `frame-omni-bencher` (building it
in-place is too expensive for every matrix run) and running it with a
minimal setup: `--steps 2 --repeat 1`.

Another part of this PR splits the default genesis setups from
`chain-spec-generator` and moves them to the
`sp_genesis_builder::GenesisBuilder::get_preset` runtime API
implementation for every runtime, which is required by
`frame-omni-bencher`.


Closes: #197
Relates to: #298
Relates to: #324

<!-- Remember that you can run `/merge` to enable auto-merge in the PR
-->

<!-- Remember to modify the changelog. If you don't need to modify it,
you can check the following box.
Instead, if you have already modified it, simply delete the following
line. -->

- [X] Does not require a CHANGELOG entry

## Future works

When [this issue](paritytech/polkadot-sdk#5083)
is fixed, I will rewrite
[weight-generation.md](https://github.com/polkadot-fellows/runtimes/blob/main/docs/weight-generation.md).
The new version will be significantly simplified, with instructions to
simply download `frame-omni-bencher`, build the runtime WASM, and run
it—no other steps will be necessary.

```
2024-07-18 14:45:51 Using the chain spec instead of the runtime to generate the genesis state is deprecated. Please remove the `--chain`/`--dev`/`--local` argument, point `--runtime` to your runtime blob and set `--genesis-builder=runtime`. This warning may become a hard error any time after December 2024.    
```

---------

Co-authored-by: Bastian Köcher <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Aug 7, 2024
- Part of paritytech/ci_cd#1006
- Closes: paritytech/ci_cd#1010
- Related: #4405

- Possibly affecting how frame-omni-bencher works on different runtimes:
#5083

Currently works in parallel with gitlab short benchmarks. 
Triggered only by adding `GHA-migration` label to assure smooth
transition (kind of feature-flag).
Later when tested on random PRs we'll remove the gitlab and turn on by
default these tests

---------

Co-authored-by: Oliver Tale-Yazdi <[email protected]>
@bkontur bkontur force-pushed the bko-frame-omni-bencher-fix branch from fea3954 to e0dc270 Compare August 12, 2024 10:56
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 2/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6967866

gui1117 pushed a commit to gui1117/polkadot-sdk that referenced this pull request Aug 15, 2024
- Part of paritytech/ci_cd#1006
- Closes: paritytech/ci_cd#1010
- Related: paritytech#4405

- Possibly affecting how frame-omni-bencher works on different runtimes:
paritytech#5083

Currently works in parallel with gitlab short benchmarks. 
Triggered only by adding `GHA-migration` label to assure smooth
transition (kind of feature-flag).
Later when tested on random PRs we'll remove the gitlab and turn on by
default these tests

---------

Co-authored-by: Oliver Tale-Yazdi <[email protected]>
bkontur pushed a commit that referenced this pull request Aug 15, 2024
Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Fix changes_to_storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Better error messages

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

genesis-preset flag

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Bench runner fix genesis storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

clippy
@bkontur
Copy link
Contributor Author

bkontur commented Aug 16, 2024

@ggwpez I ran into a different issue and I think everything leads to the benchmarking code path that uses OverlayedChanges with connection to the StateMachine<BenchmarkingState e.g. here.

Originally, I found the issue when doing --repeat 2 (see description), when second iteration had different genesis which was missing data from OverlayedChanges.

But now, I have a different issue, when it looks like the storage is at some point flushed or something is removed.
Let's take it step by step:

  1. With this PR#5327 I am refactoring testnet runtimes to support get_preset feature, so basically just moving chain-spec genesis generation to the runtime.
    So the only result in this PR is that, polkadot-parachain generates chain spec with genesis default from runtime file genesis_config_presets.rs, so really no functional or whatever change. And the short-benchmarks with chain-spec works for BridgeHubRococo.
    Everything is ok so far and I have correct get_preset in the BridgeHubRococo runtime.
  2. So now I enabled new short-benchmarks CI that uses frame-omni-bencher (requires get_preset) in the PR#4405 based on 5327
  3. But the short benchmarks with frame-omni-bencher for BridgeHubRococo fails e.g. here see full log here, error:
    2024-08-13T09:01:50.3787830Z assertion failed: T::is_message_successfully_dispatched(setup.last_nonce())
    
  4. So I cherry-picked your fix for --repeat 1 from this PR to the PR#4405 based on 5327
  5. And now the short benchmarks with frame-omni-bencher passed for BridgeHubRococo here with full log here.

Based on what I tried and see, I assume that there is some "bug" somewhere when using StateMachine with BenchmarkingState and OverlayedChanges (maybe, something at some point clears/flushes the storage or something like that).
So even if we go with your working fix - changes_to_storage, there will be still some bug underhood.

I don't have time to investigate it further now, but when I come back from vacation in two weeks, I will continue.
If we want to do a hotfix till that time, these two commits are the actual hotfix:
eb04569
0628716

dharjeezy pushed a commit to dharjeezy/polkadot-sdk that referenced this pull request Aug 28, 2024
- Part of https://github.com/paritytech/ci_cd/issues/1006
- Closes: https://github.com/paritytech/ci_cd/issues/1010
- Related: paritytech#4405

- Possibly affecting how frame-omni-bencher works on different runtimes:
paritytech#5083

Currently works in parallel with gitlab short benchmarks. 
Triggered only by adding `GHA-migration` label to assure smooth
transition (kind of feature-flag).
Later when tested on random PRs we'll remove the gitlab and turn on by
default these tests

---------

Co-authored-by: Oliver Tale-Yazdi <[email protected]>
bkontur pushed a commit that referenced this pull request Aug 30, 2024
Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Fix changes_to_storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Better error messages

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

genesis-preset flag

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Bench runner fix genesis storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

clippy
bkontur pushed a commit that referenced this pull request Sep 1, 2024
Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Fix changes_to_storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Better error messages

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

genesis-preset flag

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Bench runner fix genesis storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

clippy
bkontur pushed a commit that referenced this pull request Sep 1, 2024
Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Fix changes_to_storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Better error messages

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

genesis-preset flag

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Bench runner fix genesis storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

clippy
bkontur pushed a commit that referenced this pull request Sep 1, 2024
Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Fix changes_to_storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Better error messages

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

genesis-preset flag

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Bench runner fix genesis storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

clippy
bkontur pushed a commit that referenced this pull request Sep 2, 2024
Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Fix changes_to_storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Better error messages

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

genesis-preset flag

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Bench runner fix genesis storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

clippy
bkontur pushed a commit that referenced this pull request Sep 2, 2024
Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Fix changes_to_storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Better error messages

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

genesis-preset flag

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Bench runner fix genesis storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

clippy
bkontur pushed a commit that referenced this pull request Sep 2, 2024
Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Fix changes_to_storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Better error messages

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

genesis-preset flag

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Bench runner fix genesis storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

clippy
bkontur pushed a commit that referenced this pull request Sep 9, 2024
Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Fix changes_to_storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Better error messages

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

genesis-preset flag

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

Bench runner fix genesis storage

Signed-off-by: Oliver Tale-Yazdi <[email protected]>

clippy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T12-benchmarks This PR/Issue is related to benchmarking and weights.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants