Skip to content

feat(search): per-phase timing instrumentation for the query pipeline#953

Merged
magyargergo merged 2 commits into
abhigyanpatwari:mainfrom
azizur100389:feat/search-phase-timer
Apr 18, 2026
Merged

feat(search): per-phase timing instrumentation for the query pipeline#953
magyargergo merged 2 commits into
abhigyanpatwari:mainfrom
azizur100389:feat/search-phase-timer

Conversation

@azizur100389

Copy link
Copy Markdown
Contributor

Closes #553.

Problem

The eval harness already measures search-pipeline latency per phase, but the product `query()` tool has no timing visibility. That leaves production latency opaque:

  • Is BM25 the tail, or vector search?
  • How much `Promise.all` overlap do concurrent searches actually save?
  • Does `symbol_lookup` dominate when per-symbol Cypher round-trips pile up?

None of this is answerable from the outside today, which blocks the latency-quality Pareto work tracked in the parent RFC (#546 / #553).

What changed

New `PhaseTimer` class at gitnexus/src/core/search/phase-timer.ts:

const t = new PhaseTimer();
t.start('merge'); ...; t.stop();                 // sequential phases
await Promise.all([
  t.time('bm25', bm25Search(...)),               // concurrent phases
  t.time('vector', semanticSearch(...)),
]);
t.mark('wall', performance.now() - start);       // pre-measured durations
t.summary();  // -> { bm25, vector, merge, wall, ... } rounded to 0.1ms

Deviation from issue spec

The issue's original spec was sequential-only (`start` auto-stops the previous phase). That doesn't work for BM25 + vector inside `Promise.all` — the second `start` would auto-stop the first and only one phase would get timed. I added `mark(phase, durationMs)` + `time(phase, promise)` variants that record wall time without touching the current phase slot, so both concurrent searches get independently timed. The sequential `start`/`stop` API is unchanged and is still used for the other four phases.

`query()` instrumentation — seven phase markers in gitnexus/src/mcp/local/local-backend.ts:

Phase Measured with What it covers
`bm25` `timer.time` (concurrent) BM25 FTS search
`vector` `timer.time` (concurrent) Semantic/embedding search
`merge` `timer.start/stop` Inline RRF reciprocal-rank fusion
`symbol_lookup` `timer.start/stop` Per-symbol process + cohesion + content Cypher round-trips
`ranking` `timer.start/stop` In-memory process priority sort
`formatting` `timer.start/stop` Response object construction + dedup
`wall` `timer.mark` End-to-end wall time; compare vs sum(phases) to see `Promise.all` overlap savings

`logQueryTiming()` helper next to `logQueryError()`, same plain-`console` pattern (repo has no structured logger). Emits:

```
GitNexus [query:timing] query="..." totalMs=402 phases={"bm25":42,"vector":315,"merge":2,"symbol_lookup":18,"ranking":5,"formatting":2,"wall":320}
```

Greppable prefix, JSON-parseable payload, no new dependencies.

Response shape — `timing: Record<string, number>` added as a top-level field alongside `processes` / `process_symbols` / `definitions` / `warning`. Strict superset; existing fields unchanged.

Backward compat

  • Response shape: strict superset. Existing callers ignore the new `timing` field. Existing integration test asserted field presence only — no regression.
  • No new dependencies. `performance.now()` is a Node builtin.
  • No behavioural change to search results. PhaseTimer is passive — observes phase boundaries, never alters what they return.
  • Log output is additive. New `GitNexus [query:timing] …` lines on stdout alongside existing error lines on stderr; trivial to filter.
  • Rollback: pure code revert, no side effects, no data migration.

Scope declined for v1

  • In-process histogram aggregation — the structured log line is enough for external tooling to aggregate.
  • Pareto curve generation — issue asks to enable it, not generate it.
  • Sub-phases of `symbol_lookup` (process vs cohesion vs content) — the issue lists them under one bucket; can split later if demand surfaces.

Test plan

  • `npx vitest run test/unit/phase-timer.test.ts` → 6 pass (new — sequential / Promise.all / mark additivity / NaN rejection / auto-stop)
  • `npx vitest run test/unit/calltool-dispatch.test.ts` → 65 pass (no regression on mocked dispatch)
  • `npx vitest run test/integration/local-backend-calltool.test.ts` → 18 pass (includes 3 new `timing` assertions on the existing `query` test)
  • `npm run test:unit` → 3777 pass (same 4 pre-existing env failures: `skip-git-cli.test.ts` needs built `dist/`, `git-utils.test.ts` Windows worktree tmpdir)
  • `npx tsc --noEmit` → clean
  • Pre-commit hook (lint-staged + eslint + prettier + tsc) → passed

The eval harness already measures search-pipeline latency per phase,
but the *product* query() tool has no timing visibility. That leaves
production latency opaque:

 - Is BM25 the tail, or vector search?
 - How much Promise.all overlap do concurrent searches actually save?
 - Does symbol_lookup dominate when per-symbol Cypher round-trips pile up?

None of this is answerable from the outside, which blocks the
latency-quality Pareto work tracked in abhigyanpatwari#546 / abhigyanpatwari#553.

Changes:

* New PhaseTimer class at src/core/search/phase-timer.ts.
  Supports three APIs:
    - start(phase) / stop() for sequential phases (per issue spec)
    - mark(phase, durationMs) for pre-measured durations
    - time(phase, promise) to wrap a promise inside Promise.all

  The issue's original spec was sequential-only, which doesn't work
  for BM25 + vector inside Promise.all — the second start() would
  auto-stop the first and only one phase would get timed. The mark()
  and time() variants resolve that without changing the sequential
  API for the other phases.

* local-backend.ts query() instrumented across seven phase markers:
    bm25, vector   (concurrent via timer.time inside Promise.all)
    merge          (RRF reciprocal-rank-fusion)
    symbol_lookup  (per-symbol process + cohesion + content Cypher)
    ranking        (in-memory priority sort)
    formatting     (response object construction + dedup)
    wall           (end-to-end; separate mark so callers can compare
                   sum(phases) vs wall and see Promise.all savings)

* logQueryTiming() helper next to logQueryError(), same console-based
  pattern (repo has no structured logger). Emits
    GitNexus [query:timing] query="..." totalMs=N phases={...}
  to stdout — greppable prefix, JSON-parseable payload, no new deps.

* timing: Record<string, number> added as a top-level field on the
  query() response. Strict superset of the previous shape — existing
  tests only assert field presence, so no regression. Other MCP tools
  use the same top-level-metadata convention (status, row_count,
  warning) rather than a nested _meta wrapper.

Tests:

 - 6 new unit tests for PhaseTimer covering start/stop, implicit
   stop-on-start, additive mark(), Promise.all-safe time(),
   negative/NaN rejection, and totalMs auto-stop.
 - 3 new assertions on the existing query integration test verifying
   timing.wall is a non-negative number and at least one of
   bm25/vector fired.

Verification:
  npx vitest run test/unit/phase-timer.test.ts       -> 6 pass
  npx vitest run test/unit/calltool-dispatch.test.ts -> 65 pass
  npx vitest run test/integration/local-backend-calltool.test.ts -> 18 pass
  npm run test:unit                                   -> 3777 pass
    (4 pre-existing env failures unchanged: skip-git-cli needs
     built dist/, git-utils tmpdir on Windows worktree)
  npx tsc --noEmit                                    -> clean

Scope declined for v1:

 - In-process histogram aggregation — the log line is enough for
   external tooling
 - Pareto curve generation — issue asks to enable it, not generate it
 - Sub-phases of symbol_lookup (process vs cohesion vs content) —
   issue lists them under one bucket; can split later if demand surfaces

Closes abhigyanpatwari#553
@vercel

vercel Bot commented Apr 18, 2026

Copy link
Copy Markdown

@azizur100389 is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
6612 6513 0 98 248s

✅ All 6513 tests passed

98 test(s) skipped — expand for details
  • withTestLbugDB(java-class-impact) > Bug 2 fix: context() on Class shows non-empty incoming > contrast: context() on a Method still works (regression check)
  • withTestLbugDB(java-class-impact) > Bug 2 fix: context() on Class shows non-empty incoming > RankPermissionHandler: incoming shows caller and importers
  • Swift MethodExtractor > isTypeDeclaration > recognizes class_declaration
  • Swift MethodExtractor > isTypeDeclaration > recognizes protocol_declaration
  • Swift MethodExtractor > isTypeDeclaration > rejects import_declaration
  • Swift MethodExtractor > visibility > extracts public method
  • Swift MethodExtractor > visibility > extracts private method
  • Swift MethodExtractor > visibility > defaults to internal when no modifier
  • Swift MethodExtractor > protocol methods > marks protocol method as abstract
  • Swift MethodExtractor > static and class methods > detects static func as isStatic
  • Swift MethodExtractor > static and class methods > detects class func as isStatic
  • Swift MethodExtractor > parameters > extracts parameters with types and default values
  • Swift MethodExtractor > return type > extracts return type from -> annotation
  • Swift MethodExtractor > annotations > extracts @objc attribute
  • Swift MethodExtractor > isFinal > detects final func
  • Swift MethodExtractor > isFinal > is false when not final
  • Swift MethodExtractor > isAsync > detects async func
  • Swift MethodExtractor > isOverride > detects override method
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference uses lookupClassByName
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift explicit init inference uses lookupClassByName
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference does not bind plain functions
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
  • Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
  • Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
  • Swift self resolution > detects User and Repo classes, each with a save function
  • Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
  • Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
  • Swift parent resolution > emits EXTENDS edge: User → BaseModel
  • Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
  • Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
  • Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
  • Swift return type inference > detects User class and getUser function
  • Swift return type inference > detects save function on User (Swift class methods are Function nodes)
  • Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
  • Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
  • Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
  • Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()
  • Swift implicit imports (cross-file visibility) > detects UserService class in Models.swift
  • Swift implicit imports (cross-file visibility) > resolves UserService() constructor call across files (no explicit import)
  • Swift implicit imports (cross-file visibility) > resolves service.fetchUser() member call across files
  • Swift implicit imports (cross-file visibility) > creates IMPORTS edges between files in the same module
  • Swift extension deduplication > detects Product class
  • Swift extension deduplication > resolves Product() constructor despite extension creating duplicate class node
  • Swift extension deduplication > resolves product.save() to Product.swift (primary definition)
  • Swift constructor call fallback (no new keyword) > resolves OCRService() as constructor call across files
  • Swift constructor call fallback (no new keyword) > resolves ocr.recognize() member call via constructor-inferred type
  • Swift export visibility (internal vs private) > resolves PublicService() constructor across files
  • Swift export visibility (internal vs private) > resolves internalHelper() across files (internal = module-scoped)
  • Swift if let / guard let binding resolution > detects User and Repo classes
  • Swift if let / guard let binding resolution > resolves user.save() inside if-let to User#save
  • Swift if let / guard let binding resolution > resolves repo.save() inside guard-let to Repo#save
  • Swift if let / guard let binding resolution > user.save() in if-let does NOT resolve to Repo#save
  • Swift await / try expression unwrapping > resolves user.save() via await fetchUser() return type
  • Swift await / try expression unwrapping > resolves repo.save() via try parseRepo() return type
  • Swift await / try expression unwrapping > detects fetchUser and parseRepo as functions
  • Swift for-in loop element type inference > detects User and Repo classes
  • Swift for-in loop element type inference > creates implicit import edges between files
  • Swift field-type resolution > detects classes and their properties
  • Swift field-type resolution > emits HAS_PROPERTY edges from class to field
  • Swift field-type resolution > resolves field-chain call user.address.save() → Address#save
  • Swift field-type resolution > emits ACCESSES edges for field reads in chains
  • Swift field-type resolution > populates field metadata (visibility, declaredType) on Property nodes
  • Swift call-result binding > resolves call-result-bound method call user.save() → User#save
  • Swift call-result binding > getUser() is present as a defined function
  • Swift call-result binding > emits processUser -> getUser CALLS edge for let-assigned free function call
  • Swift method enrichment > detects Animal protocol and Dog class
  • Swift method enrichment > emits IMPLEMENTS edge Dog -> Animal
  • Swift method enrichment > emits HAS_METHOD edges for Dog methods
  • Swift method enrichment > marks protocol Animal.speak as isAbstract
  • Swift method enrichment > marks Dog.speak as NOT isAbstract
  • Swift method enrichment > marks breathe as isFinal
  • Swift method enrichment > marks classify as isStatic
  • Swift method enrichment > captures @objc annotation on breathe
  • Swift method enrichment > populates parameterTypes for classify(_ name: String)
  • Swift method enrichment > records parameterCount for classify
  • Swift method enrichment > records returnType for speak
  • Swift method enrichment > resolves dog.speak() CALLS edge
  • Swift method enrichment > resolves Dog.classify("dog") CALLS edge
  • Swift abstract dispatch > detects Repository protocol and SqlRepository class
  • Swift abstract dispatch > emits IMPLEMENTS edge SqlRepository -> Repository
  • Swift abstract dispatch > emits HAS_METHOD edges for Repository.find and Repository.save
  • Swift abstract dispatch > emits HAS_METHOD edges for SqlRepository.find and SqlRepository.save
  • Swift abstract dispatch > marks base Repository.find as isAbstract
  • Swift abstract dispatch > marks base Repository.save as isAbstract
  • Swift abstract dispatch > marks concrete SqlRepository.find as NOT isAbstract
  • Swift abstract dispatch > resolves repo.find(id: 42) CALLS edge
  • Swift abstract dispatch > resolves repo.save(entity: user) CALLS edge
  • Swift abstract dispatch > populates parameterTypes for Repository.find
  • Swift abstract dispatch > populates parameterTypes for Repository.save
  • Swift abstract dispatch > records returnType for SqlRepository.find
  • Swift abstract dispatch > emits METHOD_IMPLEMENTS edges from SqlRepository methods → Repository protocol methods
  • Swift overloaded method disambiguation > detects 2 distinct find Method nodes on SqlRepository
  • Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edges for both find overloads
  • Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edge for save
  • Swift overloaded method disambiguation > emits exactly 3 METHOD_IMPLEMENTS edges total
  • Swift Child extends Parent — inherited method resolution (SM-9) > detects Parent and Child classes
  • Swift Child extends Parent — inherited method resolution (SM-9) > resolves c.parentMethod() to Parent.parentMethod via first-wins MRO walk

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 73.25% 17893/24424 73.22% 📈 +0.0 🟢 ██████████████░░░░░░
Branches 62.25% 11350/18232 62.24% 📈 +0.0 🟢 ████████████░░░░░░░░
Functions 77.85% 1695/2177 77.82% 📈 +0.0 🟢 ███████████████░░░░░
Lines 75.82% 16224/21396 75.78% 📈 +0.0 🟢 ███████████████░░░░░

📋 View full run · Generated by CI

…ontract

CI (abhigyanpatwari#953) failed the `query: JSON appears on stdout, not stderr`
e2e test in test/integration/cli-e2e.test.ts with:

  SyntaxError: Unexpected token 'G', "GitNexus [..." is not valid JSON

Root cause: my initial logQueryTiming() in 63fbdc4 used console.log,
which writes to stdout. The MCP stdio transport uses stdout
exclusively for JSON-RPC responses (abhigyanpatwari#324), and the CLI e2e test
guards that contract by asserting stdout parses as JSON on every
tool invocation. The "GitNexus [query:timing] ..." line was
interleaving with the response JSON and breaking the parse.

Fix: route logQueryTiming through console.error instead. stderr is
the correct channel for human-readable diagnostics and it is what
the sibling logQueryError already uses for the same reason. The log
line format is otherwise unchanged -- still greppable, still
JSON-parseable payload.

Verification (local, with dist built):
  npx vitest run test/integration/cli-e2e.test.ts -t "query: JSON"
    -> now passes (was failing across ubuntu/windows/macos in CI)
  npx tsc --noEmit                                  -> clean
  Two unrelated pre-existing failures on non-git
  directory handling remain (same on upstream/main).

Closes the CI regression introduced in 63fbdc4.
@magyargergo

Copy link
Copy Markdown
Collaborator

@claude Act as a senior reviewer for GitNexus. Your job is to determine whether this PR is production-ready for this repo, not to give a generic code review.

You are reviewing a PR in the GitNexus monorepo:

  • gitnexus/ → CLI + MCP
  • gitnexus-web/
  • gitnexus-shared/

Your task has 2 phases, in this exact order:

PHASE 1 — DEFINE THE BAR
Before reviewing the diff, establish a concise repo-specific definition of “production-ready” for GitNexus, based only on the repo docs and the affected area.
Keep this definition practical and reviewable. Do not invent standards that are not grounded in the repo.

PHASE 2 — REVIEW THE PR AGAINST THAT BAR
Review the actual diff only after defining the bar.
Stay tightly scoped to the changed code and its direct consequences.


CONTEXT TO LOAD FIRST
Read these before reviewing:

  • AGENTS.md
  • GUARDRAILS.md
  • CONTRIBUTING.md
  • TESTING.md
  • ARCHITECTURE.md

Additional context:


PRIMARY OBJECTIVE
Decide whether this PR is safe, correct, maintainable, and operationally acceptable to merge into production for GitNexus.

Do not optimize for completeness at the expense of signal.
Do not pad the review.
Do not propose unrelated refactors.
Do not restate the PR description unless needed for verification.


REVIEW RULES

  • Every finding must be grounded in specific evidence from the diff or directly relevant surrounding code.
  • Every finding must include path:line.
  • If you make a behavioral claim, cite the code that proves it.
  • If you make a performance claim, explain the mechanism.
  • If something cannot be verified from the diff alone, explicitly say so.
  • Distinguish clearly between:
    • verified issue
    • plausible risk
    • unverified concern
  • Avoid vague wording like “might be better” or “could be improved” unless you explain exactly why.
  • Keep the review focused on this PR’s scope only.

For each finding, assign one severity:

  • BLOCKING → must be fixed before merge
  • NON-BLOCKING → valid issue, but merge may still be acceptable
  • NIT → stylistic/minor, not merge-relevant

REPO-SPECIFIC REVIEW CHECKLIST
Use these exact headings.

1. Correctness & functional completeness

Check:

  • Does the implementation actually satisfy the PR claim?
    • ManifestExtractor is truly invoked
    • config.links produces non-zero cross-links where expected
  • Resolver contracts are preserved:
    • resolveSymbol remains exact-match
    • label-scoped Cypher remains correct per contract type
    • flag any regression toward fuzzy or unscoped matching
  • Graph schema integrity is preserved:
    • no silent changes to node labels
    • no silent changes to edge types
    • no silent changes to ID generation (generateId)
  • Call out any missing wiring, partial integration, dead branch, or mismatch between tests and runtime behavior

2. Code clarity & clean code

Check:

  • naming quality
  • local cohesion
  • dead code
  • unnecessary abstraction
  • hidden control flow
  • confusing indirection
  • adherence to repo conventions:
    • direct imports from gitnexus-shared
    • no barrel re-export regressions
    • no // removed comments
    • no unused re-exports
  • no drive-by refactors outside stated scope per CONTRIBUTING.md and GUARDRAILS.md § Scope

3. Test coverage & change safety

Evaluate against TESTING.md:

  • Are there unit tests under gitnexus/test/unit/ covering the newly wired path?
  • Is there a regression guard for 0-link → N-link behavior?
  • Are assertions meaningful rather than tautological?
  • Are fixtures realistic for manifest inputs?
  • If memoization/cache was introduced, is there a test proving hit/miss behavior and correctness?
  • Is there evidence the expected validation path would pass for staged gitnexus/ files?
    • tsc --noEmit
    • vitest run --project default
      If not verifiable, say exactly what is missing.

4. Performance

Inspect for:

  • hot-path overhead in ingestion/group sync
  • excess allocations per manifest entry
  • redundant Cypher round-trips
  • missed batching or missed parallelism (Promise.all) where it materially matters
  • O(n²) or repeated lookup patterns on large repos
  • memoization tradeoffs:
    • correctness
    • invalidation
    • bounded vs unbounded memory growth
      Do not speculate casually; explain the mechanism and likely impact.

5. Operational risk

Check:

  • Windows/cross-platform safety:
    • stream lifecycle
    • FD/file handle lifecycle
    • path separator assumptions
    • anything resembling prior ENOTEMPTY-style lifecycle regressions
  • LadybugDB single-writer invariant is preserved
  • Embeddings preservation:
    • no silent breakage of --embeddings
    • .gitnexus/meta.json.stats.embeddings not silently zeroed by changed paths
  • MCP contracts remain compatible:
    • group_*
    • query
    • context
    • impact
    • detect_changes
    • rename
    • cypher
      Flag any schema or contract break without migration note
  • staleness behavior still triggers correctly (gitnexus/src/mcp/staleness.ts)
  • rollback safety:
    • can this PR be reverted safely without re-analyze?
    • if not, explain why

6. Maintainability

Check:

  • Does the change respect Pipeline Phase DAG boundaries?
  • Is cross-phase coupling being introduced?
  • Are manifest-extractor entry points discoverable from group/sync.ts?
  • Would the next contributor know where to extend this?
  • Is shared logic placed appropriately:
    • in gitnexus-shared/
    • or gitnexus/src/core/**/utils.ts
    • and not duplicated between CLI and web bridge (server/api.ts)?

NON-NEGOTIABLES TO VERIFY
From GUARDRAILS.md / CONTRIBUTING.md:

  • no secrets, tokens, or machine-specific paths committed
  • no blind find-and-replace renames
  • if a symbol rename occurred, rename MCP tool or equivalent careful review was used
  • upstream impact was considered for modified shared symbols
  • no HIGH/CRITICAL impact left unaddressed
  • PR title follows conventional commit style + [area] Short description
  • PolyForm Noncommercial license headers/metadata untouched

OUTPUT FORMAT

Production-ready bar for this PR

Give a concise repo-specific definition in 5–10 bullets.

Findings

For each finding, use this structure exactly:

[BLOCKING|NON-BLOCKING|NIT] Short title

  • Evidence: path:line, path:line
  • Why it matters: ...
  • Recommendation: ...
  • Confidence: High / Medium / Low

Only include findings that are real and supported.
If there are no findings in a section, say:

  • No issues found in this area based on the diff and reviewed context.

What I could not fully verify

List any behavior you could not prove from the diff, tests, or available context.

Final verdict

Choose exactly one:

  • ✅ Ready to merge — no blocking issues.
  • ⚠️ Merge after addressing blocking items.
  • ❌ Not ready — fundamental concerns.

If verdict is not ready, list the blocking items in priority order.


IMPORTANT REVIEW BEHAVIOR

  • Be strict on correctness and contracts.
  • Be conservative about operational risk.
  • Be explicit about uncertainty.
  • Do not suggest unrelated refactors.
  • Do not turn nits into blockers.
  • Do not reward intent; review actual implementation.

@github-actions

github-actions Bot commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 4m 46s —— View job


PR Review: feat(search): per-phase timing instrumentation for the query pipeline

Tasks:

  • Initialize review
  • Read repo docs: AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, ARCHITECTURE.md
  • Read changed files: phase-timer.ts, local-backend.ts, test files
  • Define production-ready bar (Phase 1)
  • Review PR against that bar (Phase 2)
  • Post final verdict

Production-ready bar for this PR

Grounded in AGENTS.md, GUARDRAILS.md, CONTRIBUTING.md, TESTING.md, ARCHITECTURE.md for a change in gitnexus/src/core/search/ (new utility class) + gitnexus/src/mcp/local/local-backend.ts (MCP query() handler):

  • TypeScript compiles cleanly (tsc --noEmit), unit + integration tests pass in CI
  • MCP query tool response contract is strictly additive — no existing fields removed or type-changed
  • Log output is routed to stderr, not stdout, to preserve the MCP stdio JSON-RPC transport (ARCHITECTURE.md § MCP; #324)
  • No new runtime dependencies; performance.now() is a Node.js ≥16 global (project requires ≥20)
  • Change is scoped to stated purpose — no drive-by refactors (GUARDRAILS.md § Scope, CONTRIBUTING.md)
  • No secrets, tokens, or machine-specific paths committed
  • PR title follows conventional-commit format (feat(scope): subject)
  • PhaseTimer is passive — never mutates the data flowing through query(), only observes phase boundaries
  • Rollback is a pure code revert: no data migration, no schema change, no side effects

Findings

1. Correctness & functional completeness

No issues found with correctness of the phase wiring or data flow. All seven phases from the PR description are instrumented as claimed. Verified:

  • bm25 / vector via timer.time() inside Promise.all — correct choice; avoids the single-slot conflict that start/stop would have ✅
  • merge, symbol_lookup, ranking, formatting via start/stop — called unconditionally, so they always appear in summary()
  • wall via timer.mark(...) before summary() — correctly recorded last ✅
  • timing added to return object as a strict superset (local-backend.ts:799) ✅

NIT — totalMs is redundant in the log line

  • Evidence: local-backend.ts:175const totalMs = phases.wall ?? Object.values(phases).reduce(...). Since wall is unconditionally marked before summary(), phases.wall is always defined, so totalMs always equals phases.wall. The log emits totalMs=X phases={...,wall:X}, making totalMs a duplicate of wall in the output.
  • Why it matters: Log consumers expecting totalMs to represent the arithmetic sum of phases (to compute overlap savings) will get wall instead. The reduce fallback path is dead.
  • Recommendation: Either (a) compute totalMs as Object.values(phases).reduce(...) excluding wall, or (b) remove totalMs from the log line and document that phases.wall is the end-to-end figure. Not merge-blocking.
  • Confidence: High

NIT — time() misses the synchronous preamble of each async helper

  • Evidence: local-backend.ts:573-575 — JS evaluates this.bm25Search(...) (argument) before calling timer.time(...), so the async function's synchronous pre-await lines execute before t0 = performance.now() is captured in phase-timer.ts:74.
  • Why it matters: The timer slightly under-reports both phases. In practice the preamble is negligible (a handful of lines before the first DB I/O), so this does not materially affect the Pareto analysis the feature is designed for.
  • Recommendation: No code change needed; document the caveat in the time() JSDoc if future maintainers extend the helper to phases with heavier synchronous setup.
  • Confidence: High

2. Code clarity & clean code

No issues found in this area based on the diff and reviewed context.

  • PhaseTimer lives in src/core/search/ — correct placement per ARCHITECTURE.md (Search ranking concern)
  • logQueryTiming follows the exact pattern of logQueryError directly above it (same prefix scheme, same stderr, same file)
  • No dead code from the PR (the totalMs() method is exercised in phase-timer.test.ts:63)
  • No // removed comments, no unused re-exports, no barrel regressions

3. Test coverage & change safety

NON-BLOCKING — Integration test does not independently assert all 7 phases are present

  • Evidence: test/integration/local-backend-calltool.test.ts:108-114 — asserts timing defined, wall is a number, and bm25 ?? vector >= 0. The sequential phases (merge, symbol_lookup, ranking, formatting) are never asserted.
  • Why it matters: If a future refactor silently drops a phase timer call (e.g., wraps the merge block in an early-return path), the integration test would not catch it. The unit tests cover PhaseTimer mechanics exhaustively, but the integration-level wiring of all 7 phases has no assertion.
  • Recommendation: Add expect(typeof result.timing.merge).toBe('number') etc. for the four sequential phases — one line each. Merge-acceptable without this.
  • Confidence: High

NON-BLOCKING — bm25 ?? vector assertion doesn't independently verify both

  • Evidence: local-backend-calltool.test.ts:114expect(result.timing.bm25 ?? result.timing.vector).toBeGreaterThanOrEqual(0). Because bm25 is always defined (it always fires), the ?? vector branch is never exercised by this assertion.
  • Why it matters: If vector were accidentally omitted, this assertion would not catch it.
  • Recommendation: Use expect(result.timing.bm25).toBeGreaterThanOrEqual(0) and expect(result.timing.vector).toBeGreaterThanOrEqual(0) as separate assertions.
  • Confidence: High

Unit tests for PhaseTimer itself are solid: 6 cases covering all API surface including concurrent usage, additive marking, NaN/negative rejection, and auto-stop behaviour. ✅

CI confirms: tsc --noEmit clean, 6513 tests pass across 3 platforms. ✅

4. Performance

No issues found in this area based on the diff and reviewed context.

PhaseTimer overhead per query() call: ~7 performance.now() calls + ~7 Map.set/get operations. Negligible vs the sequential Cypher round-trips in symbol_lookup that dominate the pipeline (up to ~100 DB round-trips for searchLimit=50 results). The pre-existing sequential-per-symbol Cypher pattern is outside scope of this PR; notably, this timer is precisely the tool that will expose it in production.

5. Operational risk

NIT — PR description says "stdout" but code correctly uses stderr

  • Evidence: local-backend.ts:177 uses console.error; PR body says "New GitNexus [query:timing] … lines on stdout". The code is correct (commit f605a26 fixed an initial console.log). The description was not updated after the fix.
  • Why it matters: No runtime impact. Misleads future readers of the PR description into thinking the log lands on stdout, which would corrupt the MCP stdio transport if replicated.
  • Recommendation: Update the PR description for accuracy before merge (cosmetic only).
  • Confidence: High

All other operational risk vectors: ✅

  • Windows/cross-platform: performance.now() and Map are cross-platform. No path separators, no file handles, no stream lifecycle.
  • LadybugDB single-writer: Not touched.
  • Embeddings: Not touched.
  • MCP contracts: query response is a strict superset. group_*, context, impact, detect_changes, rename, cypher are not touched.
  • Staleness (staleness.ts): Not touched.
  • Rollback: Pure code revert, zero side effects.

6. Maintainability

No issues found in this area based on the diff and reviewed context.

  • PhaseTimer is self-contained with clear JSDoc covering all three usage patterns. A future contributor extending the pipeline can drop in timer.start/stop or timer.time with no additional plumbing.
  • No cross-phase DAG coupling: PhaseTimer is local to query() and never touches PipelinePhase or pipeline-phase DAG types.
  • Correctly placed in src/core/search/ (not gitnexus-shared/ — search-pipeline-specific, not a shared type), not duplicated in server/api.ts.

Non-negotiables check:

  • No secrets, tokens, or machine-specific paths ✅
  • No blind find-and-replace renames ✅
  • No symbol renames (additive only) ✅
  • No HIGH/CRITICAL impact unaddressed ✅
  • PR title feat(search): ... — conventional-commit compliant ✅
  • PolyForm license untouched ✅

What I could not fully verify

  1. Whether downstream consumers outside this repo (e.g. eval harness, external integrations parsing query responses) access the response by field and could be affected by the new timing key. Grep found no hits in gitnexus-web/ for process_symbols/definitions as typed consumers; the web UI does not appear to consume the MCP query tool response directly. Low confidence this is a risk.
  2. Accurate bm25/vector phase reporting in production — whether the ~microsecond synchronous preamble miss in timer.time() is perceptible in flame-graph-style aggregation. Almost certainly not, but unverifiable without profiling data.
  3. The logQueryTiming log line's totalMs vs sum(phases) divergence in practice — specifically, whether any log-scraping pipeline built on the documented format will misinterpret totalMs as the sum rather than wall.

Final verdict

Ready to merge — no blocking issues.

The implementation is correct, CI is green across 3 platforms, the MCP contract is strictly preserved, and the PhaseTimer class is a clean, testable abstraction that matches repo conventions. The two nits on test assertions (phases not fully enumerated in the integration test) and the totalMs/wall redundancy are acceptable for an additive observability feature. The PR description's "stdout" claim is stale and should be corrected, but does not affect the code.

@magyargergo

Copy link
Copy Markdown
Collaborator

Nice work! Thanks for you PR!

@magyargergo magyargergo merged commit ac14861 into abhigyanpatwari:main Apr 18, 2026
12 of 13 checks passed
github714801013 pushed a commit to github714801013/GitNexus that referenced this pull request Apr 28, 2026
…abhigyanpatwari#953)

* feat(search): per-phase timing instrumentation for the query pipeline

The eval harness already measures search-pipeline latency per phase,
but the *product* query() tool has no timing visibility. That leaves
production latency opaque:

 - Is BM25 the tail, or vector search?
 - How much Promise.all overlap do concurrent searches actually save?
 - Does symbol_lookup dominate when per-symbol Cypher round-trips pile up?

None of this is answerable from the outside, which blocks the
latency-quality Pareto work tracked in abhigyanpatwari#546 / abhigyanpatwari#553.

Changes:

* New PhaseTimer class at src/core/search/phase-timer.ts.
  Supports three APIs:
    - start(phase) / stop() for sequential phases (per issue spec)
    - mark(phase, durationMs) for pre-measured durations
    - time(phase, promise) to wrap a promise inside Promise.all

  The issue's original spec was sequential-only, which doesn't work
  for BM25 + vector inside Promise.all — the second start() would
  auto-stop the first and only one phase would get timed. The mark()
  and time() variants resolve that without changing the sequential
  API for the other phases.

* local-backend.ts query() instrumented across seven phase markers:
    bm25, vector   (concurrent via timer.time inside Promise.all)
    merge          (RRF reciprocal-rank-fusion)
    symbol_lookup  (per-symbol process + cohesion + content Cypher)
    ranking        (in-memory priority sort)
    formatting     (response object construction + dedup)
    wall           (end-to-end; separate mark so callers can compare
                   sum(phases) vs wall and see Promise.all savings)

* logQueryTiming() helper next to logQueryError(), same console-based
  pattern (repo has no structured logger). Emits
    GitNexus [query:timing] query="..." totalMs=N phases={...}
  to stdout — greppable prefix, JSON-parseable payload, no new deps.

* timing: Record<string, number> added as a top-level field on the
  query() response. Strict superset of the previous shape — existing
  tests only assert field presence, so no regression. Other MCP tools
  use the same top-level-metadata convention (status, row_count,
  warning) rather than a nested _meta wrapper.

Tests:

 - 6 new unit tests for PhaseTimer covering start/stop, implicit
   stop-on-start, additive mark(), Promise.all-safe time(),
   negative/NaN rejection, and totalMs auto-stop.
 - 3 new assertions on the existing query integration test verifying
   timing.wall is a non-negative number and at least one of
   bm25/vector fired.

Verification:
  npx vitest run test/unit/phase-timer.test.ts       -> 6 pass
  npx vitest run test/unit/calltool-dispatch.test.ts -> 65 pass
  npx vitest run test/integration/local-backend-calltool.test.ts -> 18 pass
  npm run test:unit                                   -> 3777 pass
    (4 pre-existing env failures unchanged: skip-git-cli needs
     built dist/, git-utils tmpdir on Windows worktree)
  npx tsc --noEmit                                    -> clean

Scope declined for v1:

 - In-process histogram aggregation — the log line is enough for
   external tooling
 - Pareto curve generation — issue asks to enable it, not generate it
 - Sub-phases of symbol_lookup (process vs cohesion vs content) —
   issue lists them under one bucket; can split later if demand surfaces

Closes abhigyanpatwari#553

* fix(search): route query:timing log to stderr to preserve stdio MCP contract

CI (abhigyanpatwari#953) failed the `query: JSON appears on stdout, not stderr`
e2e test in test/integration/cli-e2e.test.ts with:

  SyntaxError: Unexpected token 'G', "GitNexus [..." is not valid JSON

Root cause: my initial logQueryTiming() in 63fbdc4 used console.log,
which writes to stdout. The MCP stdio transport uses stdout
exclusively for JSON-RPC responses (abhigyanpatwari#324), and the CLI e2e test
guards that contract by asserting stdout parses as JSON on every
tool invocation. The "GitNexus [query:timing] ..." line was
interleaving with the response JSON and breaking the parse.

Fix: route logQueryTiming through console.error instead. stderr is
the correct channel for human-readable diagnostics and it is what
the sibling logQueryError already uses for the same reason. The log
line format is otherwise unchanged -- still greppable, still
JSON-parseable payload.

Verification (local, with dist built):
  npx vitest run test/integration/cli-e2e.test.ts -t "query: JSON"
    -> now passes (was failing across ubuntu/windows/macos in CI)
  npx tsc --noEmit                                  -> clean
  Two unrelated pre-existing failures on non-git
  directory handling remain (same on upstream/main).

Closes the CI regression introduced in 63fbdc4.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P1] Add Per-Phase Timing Instrumentation to Search Pipeline

2 participants