diff --git a/CHANGELOG.md b/CHANGELOG.md
index cf26e39..0f8914c 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,6 +6,19 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht
## [Unreleased]
+## [1.27.1] - 2026-05-24
+
+### Performance
+
+- **In-memory deep run search is no longer quadratic.** `InMemoryFlowRunStore`
+ deep search (`deepSearch: true`) scanned the global step dictionary per
+ candidate run — O(runs × total_steps) — so latency and allocation grew
+ quadratically with run history. It now enumerates each run's steps via the
+ existing `_stepKeysByRun` index (O(runs × steps_in_run)). At 10,000 stored
+ runs a deep search drops from ~25 s / 4.6 GB to ~24 ms / 2.6 MB. Added a
+ BenchmarkDotNet case (`tests/benchmarks/.../RunSearchBenchmarks.cs`)
+ characterising the quick vs deep tiers and the before/after.
+
## [1.27.0] - 2026-05-24
### Changed — RUN search performance + dependency roll-up
diff --git a/Directory.Build.props b/Directory.Build.props
index 1ff4ccd..9553bed 100644
--- a/Directory.Build.props
+++ b/Directory.Build.props
@@ -5,7 +5,7 @@
https://github.com/hoangsnowy/FlowOrchestrator
git
https://github.com/hoangsnowy/FlowOrchestrator
- 1.27.0
+ 1.27.1
README.md
icon.png
diff --git a/docs/benchmarks/run-search-quick-vs-deep-2026-05-24.md b/docs/benchmarks/run-search-quick-vs-deep-2026-05-24.md
new file mode 100644
index 0000000..825c7a7
--- /dev/null
+++ b/docs/benchmarks/run-search-quick-vs-deep-2026-05-24.md
@@ -0,0 +1,154 @@
+# RUN search — quick vs deep tier cost, 2026-05-24
+
+Quantifies the cost gap between the two RUN-search tiers added in v1.27.0
+on `IFlowRunStore.GetRunsPageAsync(..., bool deepSearch, ...)`, measured
+against the in-process `InMemoryFlowRunStore`.
+
+- **Quick** (`deepSearch: false`): matches the search term only against run
+ identity columns (id, flow name, trigger key, status, background job id),
+ then short-circuits. Work is O(runs).
+- **Deep** (`deepSearch: true`): additionally scans the step rows for every
+ run that survives the identity filter.
+
+Benchmark: `tests/benchmarks/FlowOrchestrator.Benchmarks/RunSearchBenchmarks.cs`.
+
+## Why the deep path is expensive in the in-memory store
+
+`InMemoryFlowRunStore.MatchesRunSearch`
+(`src/FlowOrchestrator.InMemory/InMemoryFlowRunStore.cs:678`) implements the
+deep branch as:
+
+```csharp
+return _steps.Values.Any(s =>
+ s.RunId == run.Id
+ && (ContainsIgnoreCase(s.StepKey, search)
+ || ContainsIgnoreCase(s.ErrorMessage, search)
+ || ContainsIgnoreCase(s.OutputJson, search)));
+```
+
+`_steps` is the **global** step dictionary across all runs in history. The
+predicate filters by `s.RunId == run.Id` *inside* the scan, so each candidate
+run walks the entire global step keyspace — O(total_steps). That predicate runs
+once per run surviving the identity filter (`ApplyRunsFilter` →
+`MatchesRunSearch`, line 672-673), so the whole search is
+**O(runs × total_steps)**. With a fixed steps-per-run, total_steps grows with
+run count, making the deep path scale **quadratically** with history size.
+
+The quick path returns `false` immediately after the identity-column checks
+(`InMemoryFlowRunStore.cs:689-690`), so it never touches `_steps`.
+
+## Setup
+
+- Each run has 6 completed steps (Started → Dispatched → Claimed → Completed).
+- Every step's `OutputJson` is a ~300-byte JSON blob; the search needle
+ (`needle-7f3a`) is planted in exactly **one** step's output per run and in
+ **no** identity column. So the quick path is a true negative (0 matches, no
+ step scan) and the deep path does the full representative scan and matches.
+- `take: 20` (the dashboard default page size).
+- `TotalRuns` sweeps {1,000, 10,000}; total steps in store = 6 × TotalRuns.
+- In-process emit toolchain, ShortRun (3 warmup / 3 iterations / 1 launch) —
+ the deep/10,000 cell is ~25 s per op, so a full job is infeasible.
+- BenchmarkDotNet v0.15.8, .NET 10.0.6, Intel Core Ultra 7 255H, 16 cores.
+
+## Results — before fix (original quadratic deep branch)
+
+Figures below are from a flag-free reproduction run; a prior `--job short` run
+agreed within run-to-run noise (e.g. quick/1,000: 61.15 µs vs 62.12 µs; deep
+allocations identical to the byte). Allocations are deterministic and the
+strongest signal; the deep/10,000 time has wide variance (n=3 short iterations
+over a ~25 s op) but the order of magnitude and the quadratic shape are
+unambiguous.
+
+| TotalRuns | Quick mean | Quick alloc | Deep mean | Deep alloc | Deep ÷ Quick (time) | Deep ÷ Quick (alloc) |
+|---:|---:|---:|---:|---:|---:|---:|
+| 1,000 | 62.12 µs | 133.44 KB | 91.09 ms | 47,185 KB (≈46 MB) | **1,466×** | **354×** |
+| 10,000 | 1,060.7 µs | 1,328.75 KB | 24,966 ms (≈25.0 s) | 4,691,243 KB (≈4.6 GB) | **23,537×** | **3,530×** |
+
+## Fix — per-run step-key index
+
+The deep branch now enumerates the run's own steps via the existing
+`_stepKeysByRun` secondary index and direct-looks-up each in `_steps`, instead
+of scanning the global `_steps` dictionary
+(`src/FlowOrchestrator.InMemory/InMemoryFlowRunStore.cs`, `MatchesRunSearch`):
+
+```csharp
+if (!_stepKeysByRun.TryGetValue(run.Id, out var stepKeys))
+ return false;
+foreach (var stepKey in stepKeys.Keys)
+{
+ if (_steps.TryGetValue((run.Id, stepKey), out var s)
+ && (ContainsIgnoreCase(s.StepKey, search)
+ || ContainsIgnoreCase(s.ErrorMessage, search)
+ || ContainsIgnoreCase(s.OutputJson, search)))
+ {
+ return true;
+ }
+}
+return false;
+```
+
+This makes the per-run cost O(steps_in_run) instead of O(total_steps), so the
+whole deep search is O(runs × steps_in_run) — linear in history size, same
+asymptotic shape as the quick tier (deep just does a small constant of extra
+work per run). It mirrors the index `GetRunDetailAsync` already uses.
+
+## Results — after fix (O(runs × steps_in_run))
+
+| TotalRuns | Quick mean | Quick alloc | Deep mean | Deep alloc | Deep ÷ Quick (time) | Deep ÷ Quick (alloc) |
+|---:|---:|---:|---:|---:|---:|---:|
+| 1,000 | 56.44 µs | 102.19 KB | 1.46 ms | 262.83 KB | ~26× | ~2.6× |
+| 10,000 | 1,532.5 µs | 1,016.25 KB | 24.17 ms | 2,622.36 KB | ~16× | ~2.6× |
+
+Deep 1,000 → 10,000 now scales ~16× in time (was 274×) — linear, not quadratic.
+At 10,000 runs the deep search dropped from **~24,966 ms → ~24 ms (~1,040×)** and
+**~4.6 GB → ~2.6 MB allocations (~1,800×)**. (Deep times at n=3 short iterations
+have wide CI; the order of magnitude and the now-linear scaling are the signal.)
+
+## Headline
+
+The two tiers are not a constant-factor difference — they are different
+complexity classes. Scaling `TotalRuns` from 1,000 to 10,000 (10×):
+
+- **Quick** grows ~17× in time (1,061 µs / 62 µs) — linear-ish, dominated by
+ LINQ `Where`/`OrderByDescending`/`ToList` over the run set, and ~10× in
+ allocation (133 KB → 1.33 MB), matching the run-count scaling.
+- **Deep** grows ~274× in time (24,966 ms / 91 ms) and ~99× in allocation
+ (46 MB → 4.6 GB) — quadratic, exactly the O(runs × total_steps) blow-up
+ predicted above (10× runs × 10× total_steps ≈ 100×).
+
+At 10,000 runs in store, a single deep search takes **~25 seconds** and
+allocates **4.6 GB** (driving sustained Gen2 collections), versus **~1 ms /
+1.3 MB** for quick. Choosing the quick tier when the caller does not need
+step-body matching is a **~23,500× latency win** and a **~3,500× allocation
+reduction** at that scale.
+
+This is the data backing the v1.27.0 design decision to default the dashboard
+run list to the quick tier and gate deep search behind an explicit opt-in.
+
+The numbers above are the **original** quadratic implementation; the index fix
+documented in "Fix — per-run step-key index" collapses the in-memory deep path
+to linear (24,966 ms → ~24 ms at 10,000 runs). Quick remains the right default
+for typeahead, but deep is no longer a multi-second, multi-GB cliff on a store
+with history.
+
+> Note: the in-memory store is the most extreme case because the deep predicate
+> rescans the *global* `_steps` per candidate run. The SQL Server / PostgreSQL
+> stores push the step match into a single SQL statement (EXISTS / join), so
+> their deep-vs-quick ratio is smaller — but the asymptotic shape (deep adds a
+> per-run step scan the quick path skips) is the same. This benchmark
+> characterises the in-memory runtime; a Testcontainers-backed SQL benchmark is
+> intentionally out of scope to keep the suite dependency-free and CI-runnable.
+
+## Reproducing
+
+```bash
+# from the repo root, with current HEAD
+cd tests/benchmarks/FlowOrchestrator.Benchmarks/bin/Release/net10.0
+./FlowOrchestrator.Benchmarks.exe --filter "*RunSearchBenchmarks*"
+```
+
+No `--job` / `--inProcess` flags are needed — the benchmark pins the in-process
+emit toolchain and a ShortRun job via `[Config(typeof(RunSearchConfig))]`. The
+in-process toolchain is required because the repo's `.claude/worktrees/` copies
+of this project otherwise make BenchmarkDotNet's default toolchain fail on a
+duplicate-project-name discovery error.
diff --git a/src/FlowOrchestrator.InMemory/InMemoryFlowRunStore.cs b/src/FlowOrchestrator.InMemory/InMemoryFlowRunStore.cs
index 1fef533..94b1b84 100644
--- a/src/FlowOrchestrator.InMemory/InMemoryFlowRunStore.cs
+++ b/src/FlowOrchestrator.InMemory/InMemoryFlowRunStore.cs
@@ -689,10 +689,15 @@ private bool MatchesRunSearch(FlowRunRecord run, string search, bool deepSearch)
if (!deepSearch)
return false;
- // Deep search also scans the current step rows (incl. OutputJson). Attempt history is
- // intentionally not searched — it duplicates the current step row and is the dominant cost.
- return _steps.Values.Any(s =>
- s.RunId == run.Id
+ // Deep search also scans this run's current step rows (incl. OutputJson). Enumerate via
+ // the per-run step-key index (O(steps_in_run)) and direct-look-up each step, instead of
+ // scanning the global _steps dictionary (O(total_steps) per run — quadratic over run
+ // history). Attempt history is intentionally not searched — it duplicates the current row.
+ if (!_stepKeysByRun.TryGetValue(run.Id, out var stepKeys))
+ return false;
+
+ return stepKeys.Keys.Any(stepKey =>
+ _steps.TryGetValue((run.Id, stepKey), out var s)
&& (ContainsIgnoreCase(s.StepKey, search)
|| ContainsIgnoreCase(s.ErrorMessage, search)
|| ContainsIgnoreCase(s.OutputJson, search)));
diff --git a/tests/benchmarks/FlowOrchestrator.Benchmarks/RunSearchBenchmarks.cs b/tests/benchmarks/FlowOrchestrator.Benchmarks/RunSearchBenchmarks.cs
new file mode 100644
index 0000000..edad3e3
--- /dev/null
+++ b/tests/benchmarks/FlowOrchestrator.Benchmarks/RunSearchBenchmarks.cs
@@ -0,0 +1,169 @@
+using System.Globalization;
+using BenchmarkDotNet.Attributes;
+using BenchmarkDotNet.Configs;
+using BenchmarkDotNet.Jobs;
+using BenchmarkDotNet.Toolchains.InProcess.Emit;
+using FlowOrchestrator.InMemory;
+
+namespace FlowOrchestrator.Benchmarks;
+
+///
+/// Quantifies the cost difference between the tiered RUN-search modes added in
+/// v1.27.0 on .
+///
+/// The quick path (deepSearch: false) matches the search term only
+/// against the run identity columns (id, flow name, trigger key, status, job id)
+/// and short-circuits — its work is O(runs).
+///
+///
+/// The deep path (deepSearch: true) additionally scans the step rows
+/// for every run that survives the identity filter. In the in-memory store that
+/// inner scan is _steps.Values.Any(s => s.RunId == run.Id && ...)
+/// (see InMemoryFlowRunStore.MatchesRunSearch), which walks the whole global
+/// step keyspace per candidate run — so the cost is O(runs × total_steps) and
+/// grows quadratically as run history accumulates.
+///
+///
+/// The search term is chosen to appear only inside step OutputJson,
+/// never in any identity column, so the quick path matches zero rows (the cheap
+/// path) while the deep path performs the full step scan and returns matches (the
+/// representative path). Both calls request a single page (take: 20),
+/// matching the dashboard's default page size.
+///
+///
+///
+/// Uses the in-process emit toolchain () rather than
+/// the default out-of-process job. The repo keeps stale git worktrees under
+/// .claude/worktrees/ that each contain a copy of this benchmark project;
+/// BenchmarkDotNet's default toolchain discovers the duplicate .csproj by
+/// assembly name and refuses to build the boilerplate. Running in-process skips
+/// the generated project entirely. The subject is a pure in-memory store, so
+/// in-process execution does not perturb the measurement.
+///
+/// A reduced job (3 warmup / 3 iterations) is used because the deep path at
+/// TotalRuns=10_000 is O(runs × total_steps) ≈ 6×10⁸ comparisons per
+/// invocation and takes tens of seconds per op — a full job would run for hours.
+///
+///
+[MemoryDiagnoser]
+[Config(typeof(RunSearchConfig))]
+public class RunSearchBenchmarks
+{
+ ///
+ /// In-process, reduced-iteration configuration for .
+ ///
+ private sealed class RunSearchConfig : ManualConfig
+ {
+ /// Initialises the config with the in-process emit toolchain and a short job.
+ public RunSearchConfig()
+ {
+ AddJob(Job.Default
+ .WithToolchain(InProcessEmitToolchain.Instance)
+ .WithWarmupCount(3)
+ .WithIterationCount(3)
+ .WithLaunchCount(1));
+ }
+ }
+
+ private const int StepsPerRun = 6;
+ private const int PageSize = 20;
+
+ ///
+ /// A token embedded in exactly one step's output JSON per run, and in no
+ /// identity column. Matching it forces the deep path to do real work while
+ /// the quick path provably returns nothing.
+ ///
+ private const string DeepOnlyNeedle = "needle-7f3a";
+
+ /// Total runs seeded into the store before each measurement.
+ [Params(1_000, 10_000)]
+ public int TotalRuns { get; set; }
+
+ ///
+ /// Selects the search tier under test: = quick
+ /// (identity-only), = deep (identity + step scan).
+ ///
+ [Params(false, true)]
+ public bool DeepSearch { get; set; }
+
+ private InMemoryFlowRunStore _store = null!;
+
+ ///
+ /// Seeds completed runs, each with
+ /// steps carrying a non-trivial JSON output blob.
+ /// The needle is planted in one step per run so the deep scan has to walk to
+ /// it; identity columns never contain the needle so the quick path is a true
+ /// negative.
+ ///
+ [GlobalSetup]
+ public async Task Setup()
+ {
+ _store = new InMemoryFlowRunStore();
+
+ for (var i = 0; i < TotalRuns; i++)
+ {
+ var runId = Guid.NewGuid();
+ await _store.StartRunAsync(
+ flowId: Guid.Empty,
+ flowName: "BenchFlow",
+ runId: runId,
+ triggerKey: "manual",
+ triggerData: null,
+ jobId: null);
+
+ for (var s = 0; s < StepsPerRun; s++)
+ {
+ var stepKey = $"step_{s}";
+ await _store.RecordStepStartAsync(runId, stepKey, "noop", inputJson: null, jobId: null);
+ await _store.TryRecordDispatchAsync(runId, stepKey);
+ await _store.TryClaimStepAsync(runId, stepKey);
+
+ // Only the last step of each run carries the needle, so the deep
+ // scan must enumerate past the earlier steps before it can match.
+ var carriesNeedle = s == StepsPerRun - 1;
+ await _store.RecordStepCompleteAsync(
+ runId, stepKey,
+ status: "Succeeded",
+ outputJson: BuildOutputJson(i, s, carriesNeedle),
+ errorMessage: null);
+ }
+ }
+ }
+
+ ///
+ /// Runs a single search page against the store using the tier selected by
+ /// . The result tuple is returned so the JIT cannot
+ /// elide the call.
+ ///
+ [Benchmark(Description = "GetRunsPageAsync(search, take:20)")]
+ public async Task Search()
+ {
+ var (_, total) = await _store.GetRunsPageAsync(
+ flowId: null,
+ status: null,
+ skip: 0,
+ take: PageSize,
+ search: DeepOnlyNeedle,
+ deepSearch: DeepSearch);
+ return total;
+ }
+
+ ///
+ /// Builds a realistic, non-trivial output payload for a step. The needle is
+ /// embedded as a field value only when is set
+ /// so it lives exclusively in step output, never in a run identity column.
+ ///
+ /// Sequential index of the run being seeded.
+ /// Index of the step within the run.
+ /// When , plants the deep-only search token.
+ /// A JSON object string of a few hundred bytes.
+ private static string BuildOutputJson(int runOrdinal, int stepOrdinal, bool carriesNeedle)
+ {
+ var correlation = carriesNeedle ? DeepOnlyNeedle : "ok";
+ // Hand-built JSON (no serializer dependency) shaped like a typical step
+ // output: a status block, a few scalar fields, and a small nested object.
+ return string.Create(CultureInfo.InvariantCulture, $$"""
+ {"status":"completed","stepOrdinal":{{stepOrdinal}},"runOrdinal":{{runOrdinal}},"correlation":"{{correlation}}","httpStatus":200,"durationMs":{{42 + stepOrdinal}},"payload":{"itemsProcessed":{{100 + runOrdinal % 50}},"warnings":0,"region":"westus2","retryable":false},"timestamp":"2026-05-24T12:00:00Z"}
+ """);
+ }
+}