Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,19 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht

## [Unreleased]

## [1.27.1] - 2026-05-24

### Performance

- **In-memory deep run search is no longer quadratic.** `InMemoryFlowRunStore`
deep search (`deepSearch: true`) scanned the global step dictionary per
candidate run — O(runs × total_steps) — so latency and allocation grew
quadratically with run history. It now enumerates each run's steps via the
existing `_stepKeysByRun` index (O(runs × steps_in_run)). At 10,000 stored
runs a deep search drops from ~25 s / 4.6 GB to ~24 ms / 2.6 MB. Added a
BenchmarkDotNet case (`tests/benchmarks/.../RunSearchBenchmarks.cs`)
characterising the quick vs deep tiers and the before/after.

## [1.27.0] - 2026-05-24

### Changed — RUN search performance + dependency roll-up
Expand Down
2 changes: 1 addition & 1 deletion Directory.Build.props
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<RepositoryUrl>https://github.com/hoangsnowy/FlowOrchestrator</RepositoryUrl>
<RepositoryType>git</RepositoryType>
<PackageProjectUrl>https://github.com/hoangsnowy/FlowOrchestrator</PackageProjectUrl>
<VersionPrefix>1.27.0</VersionPrefix>
<VersionPrefix>1.27.1</VersionPrefix>
<PackageReadmeFile>README.md</PackageReadmeFile>
<PackageIcon>icon.png</PackageIcon>
</PropertyGroup>
Expand Down
154 changes: 154 additions & 0 deletions docs/benchmarks/run-search-quick-vs-deep-2026-05-24.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# RUN search — quick vs deep tier cost, 2026-05-24

Quantifies the cost gap between the two RUN-search tiers added in v1.27.0
on `IFlowRunStore.GetRunsPageAsync(..., bool deepSearch, ...)`, measured
against the in-process `InMemoryFlowRunStore`.

- **Quick** (`deepSearch: false`): matches the search term only against run
identity columns (id, flow name, trigger key, status, background job id),
then short-circuits. Work is O(runs).
- **Deep** (`deepSearch: true`): additionally scans the step rows for every
run that survives the identity filter.

Benchmark: `tests/benchmarks/FlowOrchestrator.Benchmarks/RunSearchBenchmarks.cs`.

## Why the deep path is expensive in the in-memory store

`InMemoryFlowRunStore.MatchesRunSearch`
(`src/FlowOrchestrator.InMemory/InMemoryFlowRunStore.cs:678`) implements the
deep branch as:

```csharp
return _steps.Values.Any(s =>
s.RunId == run.Id
&& (ContainsIgnoreCase(s.StepKey, search)
|| ContainsIgnoreCase(s.ErrorMessage, search)
|| ContainsIgnoreCase(s.OutputJson, search)));
```

`_steps` is the **global** step dictionary across all runs in history. The
predicate filters by `s.RunId == run.Id` *inside* the scan, so each candidate
run walks the entire global step keyspace — O(total_steps). That predicate runs
once per run surviving the identity filter (`ApplyRunsFilter` →
`MatchesRunSearch`, line 672-673), so the whole search is
**O(runs × total_steps)**. With a fixed steps-per-run, total_steps grows with
run count, making the deep path scale **quadratically** with history size.

The quick path returns `false` immediately after the identity-column checks
(`InMemoryFlowRunStore.cs:689-690`), so it never touches `_steps`.

## Setup

- Each run has 6 completed steps (Started → Dispatched → Claimed → Completed).
- Every step's `OutputJson` is a ~300-byte JSON blob; the search needle
(`needle-7f3a`) is planted in exactly **one** step's output per run and in
**no** identity column. So the quick path is a true negative (0 matches, no
step scan) and the deep path does the full representative scan and matches.
- `take: 20` (the dashboard default page size).
- `TotalRuns` sweeps {1,000, 10,000}; total steps in store = 6 × TotalRuns.
- In-process emit toolchain, ShortRun (3 warmup / 3 iterations / 1 launch) —
the deep/10,000 cell is ~25 s per op, so a full job is infeasible.
- BenchmarkDotNet v0.15.8, .NET 10.0.6, Intel Core Ultra 7 255H, 16 cores.

## Results — before fix (original quadratic deep branch)

Figures below are from a flag-free reproduction run; a prior `--job short` run
agreed within run-to-run noise (e.g. quick/1,000: 61.15 µs vs 62.12 µs; deep
allocations identical to the byte). Allocations are deterministic and the
strongest signal; the deep/10,000 time has wide variance (n=3 short iterations
over a ~25 s op) but the order of magnitude and the quadratic shape are
unambiguous.

| TotalRuns | Quick mean | Quick alloc | Deep mean | Deep alloc | Deep ÷ Quick (time) | Deep ÷ Quick (alloc) |
|---:|---:|---:|---:|---:|---:|---:|
| 1,000 | 62.12 µs | 133.44 KB | 91.09 ms | 47,185 KB (≈46 MB) | **1,466×** | **354×** |
| 10,000 | 1,060.7 µs | 1,328.75 KB | 24,966 ms (≈25.0 s) | 4,691,243 KB (≈4.6 GB) | **23,537×** | **3,530×** |

## Fix — per-run step-key index

The deep branch now enumerates the run's own steps via the existing
`_stepKeysByRun` secondary index and direct-looks-up each in `_steps`, instead
of scanning the global `_steps` dictionary
(`src/FlowOrchestrator.InMemory/InMemoryFlowRunStore.cs`, `MatchesRunSearch`):

```csharp
if (!_stepKeysByRun.TryGetValue(run.Id, out var stepKeys))
return false;
foreach (var stepKey in stepKeys.Keys)
{
if (_steps.TryGetValue((run.Id, stepKey), out var s)
&& (ContainsIgnoreCase(s.StepKey, search)
|| ContainsIgnoreCase(s.ErrorMessage, search)
|| ContainsIgnoreCase(s.OutputJson, search)))
{
return true;
}
}
return false;
```

This makes the per-run cost O(steps_in_run) instead of O(total_steps), so the
whole deep search is O(runs × steps_in_run) — linear in history size, same
asymptotic shape as the quick tier (deep just does a small constant of extra
work per run). It mirrors the index `GetRunDetailAsync` already uses.

## Results — after fix (O(runs × steps_in_run))

| TotalRuns | Quick mean | Quick alloc | Deep mean | Deep alloc | Deep ÷ Quick (time) | Deep ÷ Quick (alloc) |
|---:|---:|---:|---:|---:|---:|---:|
| 1,000 | 56.44 µs | 102.19 KB | 1.46 ms | 262.83 KB | ~26× | ~2.6× |
| 10,000 | 1,532.5 µs | 1,016.25 KB | 24.17 ms | 2,622.36 KB | ~16× | ~2.6× |

Deep 1,000 → 10,000 now scales ~16× in time (was 274×) — linear, not quadratic.
At 10,000 runs the deep search dropped from **~24,966 ms → ~24 ms (~1,040×)** and
**~4.6 GB → ~2.6 MB allocations (~1,800×)**. (Deep times at n=3 short iterations
have wide CI; the order of magnitude and the now-linear scaling are the signal.)

## Headline

The two tiers are not a constant-factor difference — they are different
complexity classes. Scaling `TotalRuns` from 1,000 to 10,000 (10×):

- **Quick** grows ~17× in time (1,061 µs / 62 µs) — linear-ish, dominated by
LINQ `Where`/`OrderByDescending`/`ToList` over the run set, and ~10× in
allocation (133 KB → 1.33 MB), matching the run-count scaling.
- **Deep** grows ~274× in time (24,966 ms / 91 ms) and ~99× in allocation
(46 MB → 4.6 GB) — quadratic, exactly the O(runs × total_steps) blow-up
predicted above (10× runs × 10× total_steps ≈ 100×).

At 10,000 runs in store, a single deep search takes **~25 seconds** and
allocates **4.6 GB** (driving sustained Gen2 collections), versus **~1 ms /
1.3 MB** for quick. Choosing the quick tier when the caller does not need
step-body matching is a **~23,500× latency win** and a **~3,500× allocation
reduction** at that scale.

This is the data backing the v1.27.0 design decision to default the dashboard
run list to the quick tier and gate deep search behind an explicit opt-in.

The numbers above are the **original** quadratic implementation; the index fix
documented in "Fix — per-run step-key index" collapses the in-memory deep path
to linear (24,966 ms → ~24 ms at 10,000 runs). Quick remains the right default
for typeahead, but deep is no longer a multi-second, multi-GB cliff on a store
with history.

> Note: the in-memory store is the most extreme case because the deep predicate
> rescans the *global* `_steps` per candidate run. The SQL Server / PostgreSQL
> stores push the step match into a single SQL statement (EXISTS / join), so
> their deep-vs-quick ratio is smaller — but the asymptotic shape (deep adds a
> per-run step scan the quick path skips) is the same. This benchmark
> characterises the in-memory runtime; a Testcontainers-backed SQL benchmark is
> intentionally out of scope to keep the suite dependency-free and CI-runnable.

## Reproducing

```bash
# from the repo root, with current HEAD
cd tests/benchmarks/FlowOrchestrator.Benchmarks/bin/Release/net10.0
./FlowOrchestrator.Benchmarks.exe --filter "*RunSearchBenchmarks*"
```

No `--job` / `--inProcess` flags are needed — the benchmark pins the in-process
emit toolchain and a ShortRun job via `[Config(typeof(RunSearchConfig))]`. The
in-process toolchain is required because the repo's `.claude/worktrees/` copies
of this project otherwise make BenchmarkDotNet's default toolchain fail on a
duplicate-project-name discovery error.
13 changes: 9 additions & 4 deletions src/FlowOrchestrator.InMemory/InMemoryFlowRunStore.cs
Original file line number Diff line number Diff line change
Expand Up @@ -689,10 +689,15 @@ private bool MatchesRunSearch(FlowRunRecord run, string search, bool deepSearch)
if (!deepSearch)
return false;

// Deep search also scans the current step rows (incl. OutputJson). Attempt history is
// intentionally not searched — it duplicates the current step row and is the dominant cost.
return _steps.Values.Any(s =>
s.RunId == run.Id
// Deep search also scans this run's current step rows (incl. OutputJson). Enumerate via
// the per-run step-key index (O(steps_in_run)) and direct-look-up each step, instead of
// scanning the global _steps dictionary (O(total_steps) per run — quadratic over run
// history). Attempt history is intentionally not searched — it duplicates the current row.
if (!_stepKeysByRun.TryGetValue(run.Id, out var stepKeys))
return false;

return stepKeys.Keys.Any(stepKey =>
_steps.TryGetValue((run.Id, stepKey), out var s)
&& (ContainsIgnoreCase(s.StepKey, search)
|| ContainsIgnoreCase(s.ErrorMessage, search)
|| ContainsIgnoreCase(s.OutputJson, search)));
Expand Down
169 changes: 169 additions & 0 deletions tests/benchmarks/FlowOrchestrator.Benchmarks/RunSearchBenchmarks.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
using System.Globalization;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Toolchains.InProcess.Emit;
using FlowOrchestrator.InMemory;

namespace FlowOrchestrator.Benchmarks;

/// <summary>
/// Quantifies the cost difference between the tiered RUN-search modes added in
/// v1.27.0 on <see cref="InMemoryFlowRunStore.GetRunsPageAsync(System.Guid?, string?, int, int, string?, bool, System.DateTimeOffset?, System.DateTimeOffset?)"/>.
/// <para>
/// The <c>quick</c> path (<c>deepSearch: false</c>) matches the search term only
/// against the run identity columns (id, flow name, trigger key, status, job id)
/// and short-circuits — its work is O(runs).
/// </para>
/// <para>
/// The <c>deep</c> path (<c>deepSearch: true</c>) additionally scans the step rows
/// for every run that survives the identity filter. In the in-memory store that
/// inner scan is <c>_steps.Values.Any(s =&gt; s.RunId == run.Id &amp;&amp; ...)</c>
/// (see <c>InMemoryFlowRunStore.MatchesRunSearch</c>), which walks the whole global
/// step keyspace per candidate run — so the cost is O(runs × total_steps) and
/// grows quadratically as run history accumulates.
/// </para>
/// <para>
/// The search term is chosen to appear <b>only</b> inside step <c>OutputJson</c>,
/// never in any identity column, so the quick path matches zero rows (the cheap
/// path) while the deep path performs the full step scan and returns matches (the
/// representative path). Both calls request a single page (<c>take: 20</c>),
/// matching the dashboard's default page size.
/// </para>
/// </summary>
/// <remarks>
/// Uses the in-process emit toolchain (<see cref="RunSearchConfig"/>) rather than
/// the default out-of-process job. The repo keeps stale git worktrees under
/// <c>.claude/worktrees/</c> that each contain a copy of this benchmark project;
/// BenchmarkDotNet's default toolchain discovers the duplicate <c>.csproj</c> by
/// assembly name and refuses to build the boilerplate. Running in-process skips
/// the generated project entirely. The subject is a pure in-memory store, so
/// in-process execution does not perturb the measurement.
/// <para>
/// A reduced job (3 warmup / 3 iterations) is used because the deep path at
/// <c>TotalRuns=10_000</c> is O(runs × total_steps) ≈ 6×10⁸ comparisons per
/// invocation and takes tens of seconds per op — a full job would run for hours.
/// </para>
/// </remarks>
[MemoryDiagnoser]
[Config(typeof(RunSearchConfig))]
public class RunSearchBenchmarks
{
/// <summary>
/// In-process, reduced-iteration configuration for <see cref="RunSearchBenchmarks"/>.
/// </summary>
private sealed class RunSearchConfig : ManualConfig
{
/// <summary>Initialises the config with the in-process emit toolchain and a short job.</summary>
public RunSearchConfig()
{
AddJob(Job.Default
.WithToolchain(InProcessEmitToolchain.Instance)
.WithWarmupCount(3)
.WithIterationCount(3)
.WithLaunchCount(1));
}
}

private const int StepsPerRun = 6;
private const int PageSize = 20;

/// <summary>
/// A token embedded in exactly one step's output JSON per run, and in no
/// identity column. Matching it forces the deep path to do real work while
/// the quick path provably returns nothing.
/// </summary>
private const string DeepOnlyNeedle = "needle-7f3a";

/// <summary>Total runs seeded into the store before each measurement.</summary>
[Params(1_000, 10_000)]
public int TotalRuns { get; set; }

/// <summary>
/// Selects the search tier under test: <see langword="false"/> = quick
/// (identity-only), <see langword="true"/> = deep (identity + step scan).
/// </summary>
[Params(false, true)]
public bool DeepSearch { get; set; }

private InMemoryFlowRunStore _store = null!;

/// <summary>
/// Seeds <see cref="TotalRuns"/> completed runs, each with
/// <see cref="StepsPerRun"/> steps carrying a non-trivial JSON output blob.
/// The needle is planted in one step per run so the deep scan has to walk to
/// it; identity columns never contain the needle so the quick path is a true
/// negative.
/// </summary>
[GlobalSetup]
public async Task Setup()
{
_store = new InMemoryFlowRunStore();

for (var i = 0; i < TotalRuns; i++)
{
var runId = Guid.NewGuid();
await _store.StartRunAsync(
flowId: Guid.Empty,
flowName: "BenchFlow",
runId: runId,
triggerKey: "manual",
triggerData: null,
jobId: null);

for (var s = 0; s < StepsPerRun; s++)
{
var stepKey = $"step_{s}";
await _store.RecordStepStartAsync(runId, stepKey, "noop", inputJson: null, jobId: null);
await _store.TryRecordDispatchAsync(runId, stepKey);
await _store.TryClaimStepAsync(runId, stepKey);

// Only the last step of each run carries the needle, so the deep
// scan must enumerate past the earlier steps before it can match.
var carriesNeedle = s == StepsPerRun - 1;
await _store.RecordStepCompleteAsync(
runId, stepKey,
status: "Succeeded",
outputJson: BuildOutputJson(i, s, carriesNeedle),
errorMessage: null);
}
}
}

/// <summary>
/// Runs a single search page against the store using the tier selected by
/// <see cref="DeepSearch"/>. The result tuple is returned so the JIT cannot
/// elide the call.
/// </summary>
[Benchmark(Description = "GetRunsPageAsync(search, take:20)")]
public async Task<int> Search()
{
var (_, total) = await _store.GetRunsPageAsync(
flowId: null,
status: null,
skip: 0,
take: PageSize,
search: DeepOnlyNeedle,
deepSearch: DeepSearch);
return total;
}

/// <summary>
/// Builds a realistic, non-trivial output payload for a step. The needle is
/// embedded as a field value only when <paramref name="carriesNeedle"/> is set
/// so it lives exclusively in step output, never in a run identity column.
/// </summary>
/// <param name="runOrdinal">Sequential index of the run being seeded.</param>
/// <param name="stepOrdinal">Index of the step within the run.</param>
/// <param name="carriesNeedle">When <see langword="true"/>, plants the deep-only search token.</param>
/// <returns>A JSON object string of a few hundred bytes.</returns>
private static string BuildOutputJson(int runOrdinal, int stepOrdinal, bool carriesNeedle)
{
var correlation = carriesNeedle ? DeepOnlyNeedle : "ok";
// Hand-built JSON (no serializer dependency) shaped like a typical step
// output: a status block, a few scalar fields, and a small nested object.
return string.Create(CultureInfo.InvariantCulture, $$"""
{"status":"completed","stepOrdinal":{{stepOrdinal}},"runOrdinal":{{runOrdinal}},"correlation":"{{correlation}}","httpStatus":200,"durationMs":{{42 + stepOrdinal}},"payload":{"itemsProcessed":{{100 + runOrdinal % 50}},"warnings":0,"region":"westus2","retryable":false},"timestamp":"2026-05-24T12:00:00Z"}
""");
}
}
Loading