perf(daemon): add sync benchmark harness#396
Conversation
Runs the daemon's SyncMachine against an integration-test event simulator, captures per-span timings via an in-memory OTel exporter, and writes aggregated stats to JSON for branch-to-branch comparison. Smoke-tested against VOIDED_TOKEN_AUTHORITY (66 events): captures the handlers called out in #395 (handleVertexAccepted, getTransactionById, addOrUpdateTx, addUtxos, getAddressWalletInfo, etc). Current simulator scenarios are too small for stable absolute numbers — a larger scenario is needed before drawing conclusions, but the harness itself is stack-complete. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdded benchmarking infrastructure for the daemon's SyncMachine. Includes a new CLI script that simulates event processing, captures OpenTelemetry metrics, and outputs performance statistics. Added supporting configuration changes and git ignore rule. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
packages/daemon/src/scripts/bench-sync.ts (1)
211-230: Benchmark timing includescleanDatabase+ xstateinterpret()setup.
performance.now()is sampled at line 216 aftercleanDatabase(mysql)andexporter.reset(), which is fine, butinterpret(SyncMachine)on line 215 is also outside the timed section — good. HowevertransitionUntilEventtypically includes the WebSocket connect/handshake against the simulator, which is exactly the "fullnode connect warmup" the file-header doc warns dominates short scenarios. The warmup-run mechanism helps, but any per-run reconnect cost is still insidetotalMs.Consider (follow-up PR, not blocking) splitting
totalMsintoconnectMs(time to receive the first event) andprocessMs(first → last event) so the "interesting" per-event work can be compared independently of connect overhead. This would address the run-to-run variance the PR description mentions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/daemon/src/scripts/bench-sync.ts`:
- Around line 241-253: The percentile picker in summarize currently uses
Math.floor(q * sorted.length) which makes high quantiles map to the max
(off-by-one); change the pick implementation inside summarize to use the
nearest-rank formula: index = Math.min(sorted.length - 1, Math.max(0,
Math.ceil(q * sorted.length) - 1)); return sorted[index] for p50/p95 so
percentiles (pick) are computed correctly (update references to pick used for
p50 and p95).
- Around line 61-83: The arg parser currently assigns raw v to opts and uses
parseInt(v,10) without validation; update the loop that handles args to (1)
check that v is not undefined before assigning to opts.scenario / opts.label /
opts.out and if missing call printHelp() and exit(1) with an error message, (2)
parse numeric flags via const n = Number(v) or parseInt(v,10) and validate
Number.isInteger(n) && n > 0 for opts.runs and opts.warmup (reject NaN,
non-integer or <=0, log an error, call printHelp(), exit(1)), and (3) explicitly
reject opts.runs === 0 (treat as invalid) so the script does not produce empty
results; adjust references to opts, parseInt/Number, printHelp() and the switch
handling in the same function accordingly.
- Around line 303-313: Wrap the body of main() in a try/finally so cleanup
always runs: ensure runOnce (and any other awaited work) is executed inside the
try block and move the existing cleanup calls — the mysql release call ((mysql
as any).release()) and await provider.shutdown() — into the finally block so
they execute whether runOnce succeeds or throws; update main() to rethrow or
allow errors to propagate so the existing .then rejection handler still receives
the error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d0bd8585-96ab-41b1-86f0-f9c206cfbf4f
📒 Files selected for processing (3)
.gitignorepackages/daemon/package.jsonpackages/daemon/src/scripts/bench-sync.ts
| for (let i = 0; i < args.length; i++) { | ||
| const a = args[i]; | ||
| const v = args[i + 1]; | ||
| switch (a) { | ||
| case '--scenario': opts.scenario = v; i++; break; | ||
| case '--runs': opts.runs = parseInt(v, 10); i++; break; | ||
| case '--warmup': opts.warmup = parseInt(v, 10); i++; break; | ||
| case '--label': opts.label = v; i++; break; | ||
| case '--out': opts.out = v; i++; break; | ||
| case '--help': | ||
| case '-h': | ||
| printHelp(); | ||
| process.exit(0); | ||
| break; | ||
| default: | ||
| console.error(`Unknown arg: ${a}`); | ||
| printHelp(); | ||
| process.exit(1); | ||
| } | ||
| } | ||
| if (!opts.out) opts.out = `bench-results-${opts.label}.json`; | ||
| return opts; | ||
| } |
There was a problem hiding this comment.
Validate CLI argument values (NaN / missing).
Two small gaps in the arg parser:
parseInt(v, 10)on lines 66-67 silently returnsNaNfor non-numeric input.--runs abcresults infor (let i = 0; i < NaN; i++)executing zero iterations, yet the script still writes an "empty" results file and exits0, which is misleading for a benchmarking tool.- If a flag is the last argument (e.g.
yarn bench:sync --scenario),visundefinedand gets assigned toopts.scenario/opts.label/opts.outwithout detection.--scenario undefinedthen falls through to theSCENARIOS[opts.scenario]check, but flags like--labelwould silently producebench-results-undefined.json.
🛠️ Suggested fix
for (let i = 0; i < args.length; i++) {
const a = args[i];
const v = args[i + 1];
+ const requireValue = () => {
+ if (v === undefined) {
+ console.error(`Missing value for ${a}`);
+ process.exit(1);
+ }
+ return v;
+ };
+ const requireInt = () => {
+ const n = parseInt(requireValue(), 10);
+ if (!Number.isFinite(n) || n < 0) {
+ console.error(`Invalid integer for ${a}: ${v}`);
+ process.exit(1);
+ }
+ return n;
+ };
switch (a) {
- case '--scenario': opts.scenario = v; i++; break;
- case '--runs': opts.runs = parseInt(v, 10); i++; break;
- case '--warmup': opts.warmup = parseInt(v, 10); i++; break;
- case '--label': opts.label = v; i++; break;
- case '--out': opts.out = v; i++; break;
+ case '--scenario': opts.scenario = requireValue(); i++; break;
+ case '--runs': opts.runs = requireInt(); i++; break;
+ case '--warmup': opts.warmup = requireInt(); i++; break;
+ case '--label': opts.label = requireValue(); i++; break;
+ case '--out': opts.out = requireValue(); i++; break;Also consider rejecting opts.runs === 0 (no measured runs) explicitly so the results file isn't produced with n: 0 / null summaries.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/daemon/src/scripts/bench-sync.ts` around lines 61 - 83, The arg
parser currently assigns raw v to opts and uses parseInt(v,10) without
validation; update the loop that handles args to (1) check that v is not
undefined before assigning to opts.scenario / opts.label / opts.out and if
missing call printHelp() and exit(1) with an error message, (2) parse numeric
flags via const n = Number(v) or parseInt(v,10) and validate Number.isInteger(n)
&& n > 0 for opts.runs and opts.warmup (reject NaN, non-integer or <=0, log an
error, call printHelp(), exit(1)), and (3) explicitly reject opts.runs === 0
(treat as invalid) so the script does not produce empty results; adjust
references to opts, parseInt/Number, printHelp() and the switch handling in the
same function accordingly.
| function summarize(values: number[]): Summary | null { | ||
| if (values.length === 0) return null; | ||
| const sorted = [...values].sort((a, b) => a - b); | ||
| const pick = (q: number) => sorted[Math.min(sorted.length - 1, Math.floor(q * sorted.length))]; | ||
| return { | ||
| n: values.length, | ||
| min: sorted[0], | ||
| p50: pick(0.5), | ||
| p95: pick(0.95), | ||
| max: sorted[sorted.length - 1], | ||
| mean: values.reduce((a, b) => a + b, 0) / values.length, | ||
| }; | ||
| } |
There was a problem hiding this comment.
Percentile formula uses Math.floor(q * n) — off-by-one at the top end.
For n = 20 and q = 0.95, Math.floor(0.95 * 20) = 19, which picks sorted[19] (the max). That's not a p95, that's the max. More standard is the nearest-rank method using Math.ceil(q * n) - 1 clamped to [0, n-1], i.e. sorted[Math.min(n - 1, Math.ceil(q * n) - 1)].
For the current typical opts.runs = 5 this produces pick(0.95) = sorted[4] = max, which is still the max — so p95 is effectively an alias of max for small samples either way. Not a correctness bug for today's usage, but worth fixing now so the output is meaningful once larger scenarios land (which the file header explicitly anticipates).
🛠️ Suggested fix
- const pick = (q: number) => sorted[Math.min(sorted.length - 1, Math.floor(q * sorted.length))];
+ const pick = (q: number) =>
+ sorted[Math.min(sorted.length - 1, Math.max(0, Math.ceil(q * sorted.length) - 1))];📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function summarize(values: number[]): Summary | null { | |
| if (values.length === 0) return null; | |
| const sorted = [...values].sort((a, b) => a - b); | |
| const pick = (q: number) => sorted[Math.min(sorted.length - 1, Math.floor(q * sorted.length))]; | |
| return { | |
| n: values.length, | |
| min: sorted[0], | |
| p50: pick(0.5), | |
| p95: pick(0.95), | |
| max: sorted[sorted.length - 1], | |
| mean: values.reduce((a, b) => a + b, 0) / values.length, | |
| }; | |
| } | |
| function summarize(values: number[]): Summary | null { | |
| if (values.length === 0) return null; | |
| const sorted = [...values].sort((a, b) => a - b); | |
| const pick = (q: number) => | |
| sorted[Math.min(sorted.length - 1, Math.max(0, Math.ceil(q * sorted.length) - 1))]; | |
| return { | |
| n: values.length, | |
| min: sorted[0], | |
| p50: pick(0.5), | |
| p95: pick(0.95), | |
| max: sorted[sorted.length - 1], | |
| mean: values.reduce((a, b) => a + b, 0) / values.length, | |
| }; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/daemon/src/scripts/bench-sync.ts` around lines 241 - 253, The
percentile picker in summarize currently uses Math.floor(q * sorted.length)
which makes high quantiles map to the max (off-by-one); change the pick
implementation inside summarize to use the nearest-rank formula: index =
Math.min(sorted.length - 1, Math.max(0, Math.ceil(q * sorted.length) - 1));
return sorted[index] for p50/p95 so percentiles (pick) are computed correctly
(update references to pick used for p50 and p95).
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| if ('release' in (mysql as any)) (mysql as any).release(); | ||
| await provider.shutdown(); | ||
| } | ||
|
|
||
| main().then( | ||
| () => process.exit(0), | ||
| (err) => { | ||
| console.error(err); | ||
| process.exit(1); | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
ast-grep --pattern 'export $_ getDbConnection = $$$'
rg -nP --type=ts 'getDbConnection' -C3 packages/daemon/src/dbRepository: HathorNetwork/hathor-wallet-service
Length of output: 483
🏁 Script executed:
#!/bin/bash
# Check the type definition of MysqlConnection
rg -nP --type=ts 'type MysqlConnection|interface MysqlConnection' packages/daemon/src/db
# Look at the full getDbConnection implementation
cat -n packages/daemon/src/db/index.ts | head -80 | tail -40
# Check the actual bench-sync.ts code around lines 303-313
cat -n packages/daemon/src/scripts/bench-sync.ts | sed -n '290,320p'Repository: HathorNetwork/hathor-wallet-service
Length of output: 1889
🏁 Script executed:
#!/bin/bash
# Find MysqlConnection type definition
rg -nP --type=ts 'MysqlConnection' packages/daemon/src/types
# Check the main() function in full
cat -n packages/daemon/src/scripts/bench-sync.ts | sed -n '250,315p'
# Check imports to understand which mysql library is being used
head -30 packages/daemon/src/scripts/bench-sync.tsRepository: HathorNetwork/hathor-wallet-service
Length of output: 3705
🏁 Script executed:
#!/bin/bash
# Find the imports at the top of bench-sync.ts to see what mysql library is used
head -50 packages/daemon/src/scripts/bench-sync.ts
# Check the type imports/definitions
rg -nP --type=ts 'import.*MysqlConnection|type MysqlConnection|interface MysqlConnection' packages/daemon/srcRepository: HathorNetwork/hathor-wallet-service
Length of output: 2194
🏁 Script executed:
#!/bin/bash
# Check runOnce function to see if it can throw
rg -nP --type=ts 'async function runOnce|runOnce\(' packages/daemon/src/scripts/bench-sync.ts -A 2
# Look for error handling patterns in the codebase for similar scripts
rg -nP --type=ts 'try.*finally|\.then\(' packages/daemon/src/scripts/ -A 3
# Check if there are other places where provider.shutdown() is called
rg -nP --type=ts 'provider\.shutdown' packages/daemon/srcRepository: HathorNetwork/hathor-wallet-service
Length of output: 696
🏁 Script executed:
#!/bin/bash
# Verify what methods the mysql2 connection object actually has
# Let's check the package.json to see mysql2 version
rg -nP '"mysql2"' packages/daemon/package.json
# Check if there are other examples of release() or end() being called
rg -nP '\.release\(\)|\.end\(\)' packages/daemon/src --type=ts -B 2Repository: HathorNetwork/hathor-wallet-service
Length of output: 2225
🏁 Script executed:
#!/bin/bash
# Check how mysql connection is used in runOnce and cleanDatabase
rg -nP --type=ts 'async function cleanDatabase|cleanDatabase\(' packages/daemon/src/scripts/bench-sync.ts -A 5
# Check what methods Connection from mysql2 actually exposes by looking at how it's used
rg -nP --type=ts 'mysql\.(destroy|release|end)\(' packages/daemon/src --type=ts -B 2Repository: HathorNetwork/hathor-wallet-service
Length of output: 2417
🏁 Script executed:
#!/bin/bash
# Find where provider is instantiated/initialized
rg -nP --type=ts 'const provider|provider =' packages/daemon/src/scripts/bench-sync.ts -B 2 -A 2
# Check what provider.shutdown() does and whether it needs error handling
rg -nP --type=ts 'provider\.shutdown|\.shutdown\(\)' packages/daemon/src/scripts/bench-sync.ts -B 5Repository: HathorNetwork/hathor-wallet-service
Length of output: 522
🏁 Script executed:
#!/bin/bash
# Double-check mysql2/promise Connection API to be thorough
# Look at how release() is actually used in tests or elsewhere
rg -nP --type=ts 'release\(\)' packages/daemon --type=ts -B 3 -A 1
# Check if there are any tests that verify connection cleanup
fd -t f '.*test.*' packages/daemon/src/scripts/ --exec basename {}Repository: HathorNetwork/hathor-wallet-service
Length of output: 1994
Wrap main() body in try/finally to ensure cleanup on errors.
If runOnce throws mid-benchmark, the rejection handler (line 310) exits without calling cleanup: mysql.release() and provider.shutdown() are skipped, potentially leaving the DB connection unreleased. The try/finally pattern ensures cleanup runs regardless of whether the function succeeds or errors.
🛠️ Suggested fix
async function main() {
const mysql = await getDbConnection();
-
- const totalMsRuns: number[] = [];
+ try {
+ const totalMsRuns: number[] = [];
// ... existing body ...
- writeFileSync(opts.out, JSON.stringify(output, null, 2));
- console.log(`Wrote ${opts.out}`);
-
- // eslint-disable-next-line `@typescript-eslint/no-explicit-any`
- if ('release' in (mysql as any)) (mysql as any).release();
- await provider.shutdown();
+ writeFileSync(opts.out, JSON.stringify(output, null, 2));
+ console.log(`Wrote ${opts.out}`);
+ } finally {
+ // eslint-disable-next-line `@typescript-eslint/no-explicit-any`
+ if ('release' in (mysql as any)) (mysql as any).release();
+ await provider.shutdown();
+ }
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/daemon/src/scripts/bench-sync.ts` around lines 303 - 313, Wrap the
body of main() in a try/finally so cleanup always runs: ensure runOnce (and any
other awaited work) is executed inside the try block and move the existing
cleanup calls — the mysql release call ((mysql as any).release()) and await
provider.shutdown() — into the finally block so they execute whether runOnce
succeeds or throws; update main() to rethrow or allow errors to propagate so the
existing .then rejection handler still receives the error.
Third piece of the benchmarking infrastructure. On PRs that touch packages/daemon, runs the bench against both master and the PR branch in parallel, then posts (or updates) a sticky PR comment with the comparator output. Key design choices (rationale in #396 review thread): - Matrix strategy — two runners in parallel, each with its own MySQL + simulator containers, no cross-run state bleed. - Bench scripts from PR head are overlaid onto the baseline checkout (via refs/pull/N/head so fork PRs work too) so the measurement tool stays constant across the comparison and baseline works even before the harness has landed on master. - No exit gating. continue-on-error on every bench step — CI runner variance is too high for a hard threshold to mean anything at the run counts we can afford. - Report also emitted to the job summary, so fork PRs (where GITHUB_TOKEN is read-only) still surface results. - concurrency.cancel-in-progress: true to avoid stacking stale runs when a PR is pushed to repeatedly. - Starts at 5 runs × 1 warmup per side; dial up once the scenario grows past its current 66-event ceiling. Also adds a warning comment at the top of bench-sync.ts flagging the overlay constraint: any symbol this script references must also exist on master. Depends on #396 (harness) and #397 (comparator). Targets #397 so the three PRs can be reviewed as a stack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Motivation
Issue #395 lists per-event sync optimizations we want to execute, and we need a scientific way to measure whether each PR actually improves sync performance before shipping.
This PR lands the first piece of that infrastructure: a benchmark harness that runs the daemon's
SyncMachineagainst an integration-test event simulator, captures per-span timings via an in-memory OTel exporter, and writes aggregated stats to JSON. Future PRs will add a stats comparator and a CI workflow that posts a comparison as an informational PR comment (no hard gating — CI runner variance is too high to reliably catch the <20% regressions we care about).Acceptance Criteria
yarn bench:sync --scenario <name> --runs <n> --warmup <n> --label <s>runs against a simulator container and a test MySQL, producingbench-results-<label>.jsontotalMsandperSpansummaries (n/min/p50/p95/max/mean) across measured runsHow to run
Smoke test (VOIDED_TOKEN_AUTHORITY, 66 events, 3 runs + 1 warmup)
Captured spans match the handlers called out in #395:
handleVertexAccepted,getTransactionById,addOrUpdateTx,addUtxos,getAddressWalletInfo,handleTxFirstBlock,clearTxProposalForVoidedTx,getTxOutputsFromTx.Known limitations
totalMs. Absolute numbers aren't actionable until a scenario in the thousands-of-events range exists — which @andreabadesso mentioned he could add separately. The harness itself is stack-complete; only the input needs scaling.jest.mockpattern in__tests__/integration/balances.test.ts.BasicTracerProviderdirectly (notNodeSDK) because.register()is synchronous — guarantees the global tracer provider is in place before anytrace.getTracer()call from the daemon modules imported after resolves its first span.OTEL_SDK_DISABLED=trueis set so the daemon's built-in OTLP exporter (src/tracing.ts) doesn't also initialize.Checklist
master, confirm this code is production-ready and can be included in future releases as soon as it gets merged@opentelemetry/sdk-trace-basewhich is already a transitive dep)🤖 Generated with Claude Code
Summary by CodeRabbit