Skip to content

feat: multi-branch indexing and branch-scoped querying (#2106)#2137

Merged
magyargergo merged 23 commits into
abhigyanpatwari:mainfrom
magyargergo:feat/2106-multi-branch-indexing
Jun 10, 2026
Merged

feat: multi-branch indexing and branch-scoped querying (#2106)#2137
magyargergo merged 23 commits into
abhigyanpatwari:mainfrom
magyargergo:feat/2106-multi-branch-indexing

Conversation

@magyargergo

@magyargergo magyargergo commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

Closes #2106. Makes GitNexus indexing branch-aware. Today, re-analyzing after a
git checkout silently destroys the previous branch's graph (the single
.gitnexus/lbug is wiped and rebuilt). This PR gives each branch its own index
while keeping single-branch behavior backward compatible.

  • Per-branch indexing without overwrite — a non-primary checked-out branch
    is indexed into <repo>/.gitnexus/branches/<slug>/ (its own LadybugDB). The
    primary (first-indexed) branch keeps the flat .gitnexus/{lbug,meta.json}
    layout. git checkout feature && gitnexus analyze no longer clobbers main's index.
  • Branch-scoped querying — an optional branch parameter on the MCP tools
    (query, cypher, context, impact, detect_changes, rename,
    route_map, tool_map, shape_check, api_impact) and a --branch flag on
    the CLI (analyze, query, context, impact, cypher, detect-changes).
    It swaps the resolved lbugPath; the connection pool already keys by
    lbugPath, so per-branch DBs isolate with zero pool-layer changes.
  • Clear separation + visibility — the registry gains additive optional
    branch? / branches?[] on the one entry per path; list and status
    surface branch indexes; list_repos advertises a branches sub-field.

Design notes

  • Headline behavior change: the no-flag analyze path is now branch-aware.
    A single-branch user who never switches branches between analyses sees zero
    change
    (flat layout, no branch field for detached HEAD / non-git). A user
    who switches branches now gets separate, non-destructive indexes — which is
    exactly what Support multi-branch indexing and cross-branch search #2106 asks for.
  • Minimal schema change: additive-optional registry/meta fields only; no
    node-PK change
    (each branch has its own DB) and no
    INCREMENTAL_SCHEMA_VERSION bump
    . Existing indexes need no re-analyze.
  • Distinct from --default-branch: the existing --default-branch flag is
    the cosmetic base_ref feature (generated AGENTS.md/CLAUDE.md). The new
    --branch selects the index slot and does not read the .gitnexusrc branch
    key.
  • Detached HEAD / CI maps to the flat index (never a branch literally named HEAD).
  • Branch slugs are sanitizeRepoName(ref) + a short sha256 of the raw ref,
    so feature/x and feature_x can never collide; path-traversal is contained
    (verified against adversarial inputs).
  • Guard: analyze --branch X is refused when X is not the checked-out
    branch — analyze indexes the working tree, so labeling tree Y as branch X
    would corrupt X's index.

Scope deferred to follow-ups (intentional)

  • Cross-branch search / diff (marked optional in the issue) — the layout
    makes this a fan-out via the existing GroupService RRF.
  • HTTP REST (/api) branch parity — MCP + CLI ship here; REST is a fast-follow.
  • Branch lifecycle / GC (clean --branch, orphan reaping, per-repo cap).
  • Cross-branch shared-cache prune union (correctness-safe; perf optimization).

Test plan

  • New unit tests: getCurrentBranch; branchSlug (collision + traversal
    containment); resolveBranchPlacement (all precedence branches); registry
    branch-nesting; tool-schema branch param; MCP resolveRepo/callTool branch
    routing (incl. legacy-entry + un-indexed-branch error); CLI --branch help +
    i18n; list/status branch rendering (incl. detached HEAD + stale).
  • New integration test (multi-branch-analyze.test.ts): full two-branch
    analyze proving the feature index coexists with an untouched primary, and the
    registry nests the branch.
  • tsc clean; gitnexus/CHANGELOG.md untouched (release-owned).

Known limitation

~/.gitnexus/registry.json is global: an older GitNexus binary that
upserts a path entry will drop the branches[] it doesn't understand. New
fields are otherwise ignored by old binaries (additive).

Residual Review Findings

The pre-merge multi-persona review (correctness/adversarial/security/api-contract/
reliability/maintainability/testing/project-standards) surfaced these deferred,
non-blocking items (the actionable P1s were fixed in-PR):

  • P2 — Primary inversion. Indexing a non-default branch before the default
    one makes that branch own the flat slot; a later main analyze is routed to a
    sub-dir. Deliberate (first-indexed claims primary; origin/HEAD is unreliable
    in CI/clones) and non-destructive — both indexes coexist. Follow-up: optionally
    let a default-branch analyze reclaim the flat slot.
  • P2 — Cross-branch cache churn. The shared content-addressed parse-cache/
    and durable parsedfile store are pruned to the current run's keys, so
    alternating analyze between two large divergent branches re-parses on each
    switch. Correctness-safe (content-addressed); follow-up: union live keys across
    branches or per-branch shards.
  • P3 — Branch registry growth / orphan reaping. branches[] and
    branches/<slug>/ dirs are never pruned when a branch is deleted. Follow-up:
    gitnexus clean --branch / TTL / cap (the deferred branch-GC work).
  • P3 — Auto-detected branch label normalization. An auto-detected git ref
    that validateBranchName would reject (exotic refs) is stored verbatim, so a
    later validated --branch lookup may not round-trip. Follow-up: route the
    auto-detected label through sanitizeDetectedBranch.
  • P2 — repo-manager.ts size. Pre-existing >1k-line file; the new branch
    primitives could later extract to branch-index.ts.

🤖 Generated with Claude Code


✅ Residual review findings — all resolved (follow-up commits)

The deferred residuals (from the autofix review + Codex tri-review) were planned
(docs/plans/2026-06-10-002-…), pressure-tested by a deepening pass, and fixed
one commit per finding:

  • R5 only trust a non-empty-string flatMeta.branch (corrupt-meta misroute guard)
  • R8 warn when the default branch isn't the primary (warn-only — the deepening rejected the risky live-DB relocation for a P3)
  • R4 resolve --branch <primary> on a legacy unstamped flat index
  • R7 gitnexus clean --branch <name> (resolves via the recorded branches[] summary)
  • R3 evict orphaned branch pools on unregister/clean --branch
  • R6 persist per-branch cacheKeys + union them at prune (fail-safe-toward-retention) — a branch switch no longer evicts another branch's shards
  • R1 normalize the auto-detected branch via sanitizeDetectedBranch
  • R2 skip the AGENTS.md base_ref refresh for a non-primary branch fast path
  • R9 atomic writeRegistry (tmp+rename) + re-read-before-write to narrow the registry race
  • R10 extract the branch primitives to src/storage/branch-index.ts

The deepening caught a non-implementable R6 design (chunk keys aren't in meta.json) and a disproportionate R8 (live-DB move) before any code was written. All changes additive; INCREMENTAL_SCHEMA_VERSION unchanged; single-branch behavior backward compatible; gitnexus/CHANGELOG.md untouched.

- guard analyze against --branch != checked-out branch (prevents writing one
  branch's working tree into another branch's index slot)
- fix branch-handle pool reinit thrash (track observed indexedAt by lbugPath,
  since applyBranchScope returns fresh handles)
- remove dead resolveRefToCommit helper (staleness uses HEAD vs branch meta)
- RepoListing.branches -> Omit<BranchSummary,'stats'> for type cohesion
- add tests: branchSlug traversal containment, --branch mismatch reject,
  callTool branch threading, legacy-entry branch routing, status detached/stale
@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

@magyargergo is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

- P1 data-loss: a detached-HEAD re-analyze (CI's actions/checkout default) no
  longer strips the primary's meta.branch stamp; preserve it so a later branch
  analyze cannot claim & overwrite the flat/primary index. +cascade integration test
- P2: capture validateBranchName's trimmed return for --branch so a
  whitespace-padded value no longer false-rejects on-branch or ghosts an index
- F1: on a lost/rebuilt registry, a branch run reconstructs the primary
  top-level entry from the flat meta, not the feature branch's meta

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tri-review digest (3 methods)

Methods: GitNexus swarm + Compound-Engineering personas + Codex. Engine breakdown: 6 Claude lanes (risk, test/CI, security-boundary, correctness, adversarial, performance) + Codex — the one independent engine, live. Reviewed the post-fix diff (after main was merged in). The 3 actionable findings below were found AND fixed in this session (commit 62bb93e0) with passing tests — this is a record + residuals, not open blockers.

🔴 P1 — data-loss on detached-HEAD re-analyze (FIXED)

A detached-HEAD re-analyze (exactly what CI's actions/checkout does by default) stripped the primary's meta.branch stamp; a later branch analyze then saw an "unowned" flat slot and overwrote the primary index.

  • Trigger: index main → index feature → detached re-analyze (stamp wiped) → index featuremain's graph destroyed.
  • Found by the adversarial lane, corroborated by Codex on the same meta.branch stamping (Codex+Claude = the strong signal).
  • Fix: preserve an existing stamp when the label is null — branch: branchLabel ?? existingMeta?.branch — plus a cascade integration test that reproduces and pins it. [reproduced]

🟠 P2 — --branch whitespace (FIXED)

validateBranchName trims but its return was discarded; a padded --branch " feature" false-rejected while on-branch and ghosted an index when detached. Found by correctness + adversarial (consistent across Claude personas). Fix: forward the trimmed return. [code-read]

🟠 P2 — registry-loss seed (FIXED)

If registry.json is lost/rebuilt, a branch run seeded the top-level entry from the feature branch's meta, so --branch <primary> could not resolve. Codex (HIGH) + correctness. Fix: reconstruct the primary top-level from the flat meta when no entry exists. [code-read]

✅ Validated correct (credit to the change)

The reviews actively refuted several suspected issues:

  • Path-traversal via branch names is containedbranchSlug = sanitizeRepoName(ref) + sha256 of the raw ref; security verified ~19 traversal payloads + a containment test.
  • The normal two-branch sequence does not overwrite the primary (integration test).
  • The pool reinit/staleness fix is correctlastObservedIndexedAt stays in lockstep with initializedRepos; real rebuilds are still detected.
  • applyBranchScope uses exact === (no substring/prefix mis-resolution); getCurrentBranch has no injection.
  • Test/CI lane rated the coverage "superior to a typical feature PR."

🔵 Deferred residuals (documented, non-blocking)

  • P3 — AGENTS.md fast-path (Codex): a non-primary branch's already-up-to-date analyze can still touch AGENTS.md via refreshBaseRefLine (the gate lives in runFullAnalysis, not the CLI fast path — src/cli/analyze.ts). Needs runFullAnalysis to surface the placement.
  • Primary-inversion (first-indexed non-default branch claims the flat slot); cross-branch shared-cache prune churn (perf only — content-addressed → correct); branches[] unbounded growth / no clean --branch; route the auto-detected branch label through sanitizeDetectedBranch; branch-pool eviction on clean/unregister (bounded, correctness-safe). All noted in the PR description.

CI

Running on the pushed branch; the Vercel "fail" is fork deploy-auth, not code. Locally: tsc clean, full touched-area unit suite + 2 integration tests green.


Independence note: two of three methods are Claude under different persona prompts (correlated priors); only Codex is a different engine. The strong signals are Codex+Claude agreement — persona-only agreement is "consistent," not independent confirmation.

Automated multi-tool digest — verify before acting.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
10967 10951 0 16 549s

✅ All 10951 tests passed

16 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C++ ADL emit benchmark > emit phase scales sub-quadratically with co-scaled files and sites
  • C++ pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • Go structural interface detection benchmark > scales linearly with interface × struct count
  • Go structural interface detection split-phase benchmark > separates index-build and detection time
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • Vue pipeline benchmark > scales with component count
  • run.cjs direct-exec entrypoint (fix(cli): steer docs, skills, and hooks through a CLI-neutral project-local runner (#1939) #1945) > resolves a .cmd shim via the Windows shell branch, passing args and exit code
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 75.22% 35771/47549 N/A% 🟢 ███████████████░░░░░
Branches 63.02% 22142/35133 N/A% 🟢 ████████████░░░░░░░░
Functions 80.96% 3859/4766 N/A% 🟢 ████████████████░░░░
Lines 79.03% 32340/40921 N/A% 🟢 ███████████████░░░░░

📋 View full run · Generated by CI

@magyargergo magyargergo merged commit 7eaeb0a into abhigyanpatwari:main Jun 10, 2026
27 of 28 checks passed
@magyargergo magyargergo deleted the feat/2106-multi-branch-indexing branch June 10, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support multi-branch indexing and cross-branch search

1 participant