Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bench-e2e single mode and keep results #1693

Merged
merged 16 commits into from
Oct 10, 2024
Merged

Fix bench-e2e single mode and keep results #1693

merged 16 commits into from
Oct 10, 2024

Conversation

ch1bo
Copy link
Collaborator

@ch1bo ch1bo commented Oct 8, 2024

This fixes two issues with the bench-e2e binary / benchmark:

  • Running in single mode was not working because of a FeeTooSmallUTxO error
  • The results.csv is written into a temporary directory and removed, which makes plotting impossible.

I was in the mood of some refactoring so this contains also various other changes I encountered while working on the code and I was tidying up a bit.

The refactoring separated hydra node and payment keys further, which requires the datasets to be re-generated. I took the freedom to generate with --scaling-factor 10 which results in 300 transactions per client. Should be long enough to identify regressions, with hopefully 10x shorter benchmark time in CI.

Another benefit of this separation is that it naturally led to reducing the assumptions of the demo mode by not seeding the hydra node cardano keys, but re-using seed-devnet.sh and consequently looser coupling between the workload and container setup in our network test workflow.

I'm not 100% happy with how the bench is now requiring the --output-directory to be empty, and in turn the whole state will be captured as an artifact of our CI. Instead, making the state directory always a /tmp path and retained in case of errors (or configurable with --state-directory) would be better. But that can go into another PR .. another time.


  • CHANGELOG updated
  • Documentation updatedx (README)
  • Haddocks updated
  • No new TODOs introduced or explained herafter
    • Two XXX notes of what to improve further

@ch1bo ch1bo force-pushed the fix-bench-standalone branch 3 times, most recently from 506062b to 9eb745d Compare October 8, 2024 18:09
@ch1bo ch1bo self-assigned this Oct 8, 2024
@ch1bo ch1bo requested a review from a team October 8, 2024 18:11
@ch1bo ch1bo added the red bin label Oct 8, 2024
Copy link

github-actions bot commented Oct 8, 2024

Transaction costs

Sizes and execution budgets for Hydra protocol transactions. Note that unlisted parameters are currently using arbitrary values and results are not fully deterministic and comparable to previous runs.

Metadata
Generated at 2024-10-10 10:31:51.922255033 UTC
Max. memory units 14000000
Max. CPU units 10000000000
Max. tx size (kB) 16384

Script summary

Name Hash Size (Bytes)
νInitial b512161ccb0652d7e9a0b540e4a3c808f73d6558a4bcabf374d85880 3969
νCommit ea444d37d226e71eef73ac78d149750da977feb588900135bf9e8221 692
νHead 2253ddd95837c7aacc8635a971caaea743434152dd8dd2849bdf4162 10797
μHead 4d648ca239040b0e87901835aa11423e7aa3bd947ce6befe7db1bae8* 4508
νDeposit 1a011f23b139a6426767026bde10319546485d553219a5848cdac4e5 2993
  • The minting policy hash is only usable for comparison. As the script is parameterized, the actual script is unique per head.

Init transaction costs

Parties Tx size % max Mem % max CPU Min fee ₳
1 5096 5.79 2.29 0.44
2 5297 7.09 2.80 0.46
3 5499 8.73 3.46 0.49
5 5901 11.26 4.45 0.53
10 6907 18.11 7.16 0.65
57 16355 82.91 32.79 1.78

Commit transaction costs

This uses ada-only outputs for better comparability.

UTxO Tx size % max Mem % max CPU Min fee ₳
1 567 10.84 4.26 0.29
2 758 14.31 5.80 0.34
3 944 17.92 7.39 0.39
5 1323 25.56 10.73 0.49
10 2257 47.11 19.97 0.77
19 3947 94.71 39.81 1.38

CollectCom transaction costs

Parties UTxO (bytes) Tx size % max Mem % max CPU Min fee ₳
1 57 560 20.58 7.85 0.40
2 113 671 28.02 10.67 0.48
3 171 782 37.34 14.18 0.59
4 227 893 47.04 17.86 0.70
5 282 1009 56.92 21.60 0.81
6 340 1116 67.08 25.44 0.92
7 396 1227 67.51 25.68 0.93
8 448 1338 80.43 30.57 1.08
9 504 1449 80.47 30.65 1.09

Cost of Decrement Transaction

Parties Tx size % max Mem % max CPU Min fee ₳
1 650 18.40 8.07 0.39
2 731 18.50 8.83 0.40
3 858 19.95 10.17 0.42
5 1179 23.56 13.12 0.49
10 1989 32.96 20.62 0.65
47 7736 96.50 73.89 1.80

Close transaction costs

Parties Tx size % max Mem % max CPU Min fee ₳
1 672 20.87 9.34 0.42
2 846 22.67 11.10 0.45
3 915 23.70 12.18 0.47
5 1199 26.64 15.09 0.53
10 1887 34.14 22.55 0.67
50 8222 99.29 86.57 1.94

Contest transaction costs

Parties Tx size % max Mem % max CPU Min fee ₳
1 691 26.76 11.48 0.48
2 765 28.12 12.68 0.50
3 968 30.31 14.59 0.54
5 1204 33.94 17.72 0.61
10 2027 43.82 26.44 0.78
39 6521 99.67 75.65 1.79

Abort transaction costs

There is some variation due to the random mixture of initial and already committed outputs.

Parties Tx size % max Mem % max CPU Min fee ₳
1 4971 15.30 6.55 0.54
2 5053 21.29 9.07 0.61
3 5180 25.54 10.89 0.66
4 5288 32.14 13.76 0.74
5 5680 44.07 19.41 0.90
6 5625 49.48 21.48 0.95
7 5852 56.62 24.78 1.04
8 6131 64.89 28.64 1.15
9 6024 65.89 28.54 1.15
10 6293 76.21 33.31 1.28
11 6475 87.40 38.25 1.42
12 6408 91.57 39.93 1.46
13 6565 96.51 41.91 1.52

FanOut transaction costs

Involves spending head output and burning head tokens. Uses ada-only UTxO for better comparability.

Parties UTxO UTxO (bytes) Tx size % max Mem % max CPU Min fee ₳
10 0 0 5090 10.38 4.35 0.49
10 1 57 5123 10.96 4.81 0.50
10 5 285 5260 15.83 7.79 0.57
10 10 570 5430 21.87 11.49 0.65
10 20 1138 5768 33.16 18.55 0.81
10 30 1710 6111 45.44 26.05 0.97
10 40 2279 6451 56.74 33.12 1.13
10 50 2848 6789 68.63 40.45 1.30
10 76 4321 7664 99.27 59.39 1.72

End-to-end benchmark results

This page is intended to collect the latest end-to-end benchmark results produced by Hydra's continuous integration (CI) system from the latest master code.

Please note that these results are approximate as they are currently produced from limited cloud VMs and not controlled hardware. Rather than focusing on the absolute results, the emphasis should be on relative results, such as how the timings for a scenario evolve as the code changes.

Generated at 2024-10-10 10:34:24.332196246 UTC

Baseline Scenario

Number of nodes 1
Number of txs 300
Avg. Confirmation Time (ms) 5.551930950
P99 12.216956419999997ms
P95 7.627868650000001ms
P50 5.3302375ms
Number of Invalid txs 0

Three local nodes

Number of nodes 3
Number of txs 900
Avg. Confirmation Time (ms) 24.358177392
P99 48.859335549999486ms
P95 32.6859822ms
P50 22.443132ms
Number of Invalid txs 0

Copy link

github-actions bot commented Oct 8, 2024

Test Results

544 tests  ±0   538 ✅ ±0   26m 34s ⏱️ +4s
162 suites ±0     6 💤 ±0 
  7 files   ±0     0 ❌ ±0 

Results for commit ec21ac0. ± Comparison against base commit dff6655.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@noonio noonio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments; happy to merge if all the tests pass!

@noonio
Copy link
Contributor

noonio commented Oct 9, 2024

@noonio noonio force-pushed the fix-bench-standalone branch 2 times, most recently from 0b81249 to 1dbd899 Compare October 9, 2024 12:29
ch1bo and others added 15 commits October 10, 2024 11:33
This is not ideal, but a lot simpler than doing proper fee calculation.
It's unclear why fee calculation was removed before, it is needed when
running benchmark scenarios.
This is redundant and can be achieved by using the 'datasets'
subcommand.
Before it was written to a random temporary directory, which makes it
annoying to generate datasets with this mode.
They hydra-cluster benchmarks now only uses a single directory to store
the whole state, which is temporary unless a specific output-directory
is requested.
This reduces some code duplication without much loss of
expressiveness (which key we use does not matter).
Same transaction style (single repending txs), but deliberately smaller
length of transactions (3000 -> 300) to have shorter benchmark
run-times, while sequence should be long enough to identify regressions.

Generated with invocations:

cabal run bench-e2e -- single --cluster-size 1 --scaling-factor 10

and

cabal run bench-e2e -- single --cluster-size 3 --scaling-factor 10

Plus some manual amending of the JSON to contain a "title".
As before, the bench-e2e does not assume the hydra node keys to be
seeded. This ties the way bench-e2e binary (which hard-codes Alice, Bob
and Carol) to the configurable list of --hydra-client to connect to.
This decouples the bench-e2e binary which just produces load and
provides statistics more from how the hydra-nodes are run.

Now the only assumption is that the
'hydra-cluster/config/credentials/faucet.sk' owns funds on the given
network.
@ch1bo ch1bo added this pull request to the merge queue Oct 10, 2024
@ch1bo ch1bo removed this pull request from the merge queue due to a manual request Oct 10, 2024
@ch1bo ch1bo enabled auto-merge October 10, 2024 10:25
@ch1bo ch1bo added this pull request to the merge queue Oct 10, 2024
Merged via the queue into master with commit 321167e Oct 10, 2024
25 of 28 checks passed
@ch1bo ch1bo deleted the fix-bench-standalone branch October 10, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants