Skip to content

feat(group): bridge.lbug storage + contract matching expansion (1/4 of #606 split)#795

Merged
magyargergo merged 2 commits into
abhigyanpatwari:mainfrom
ivkond:feat/group-bridge-storage
Apr 11, 2026
Merged

feat(group): bridge.lbug storage + contract matching expansion (1/4 of #606 split)#795
magyargergo merged 2 commits into
abhigyanpatwari:mainfrom
ivkond:feat/group-bridge-storage

Conversation

@ivkond

@ivkond ivkond commented Apr 11, 2026

Copy link
Copy Markdown
Contributor

Part 1 of 4 in the split of #606 per @magyargergo's request.

Closes #791.

What changed

Adds the LadybugDB-backed bridge storage infrastructure and extends the contract matching algorithm with wildcard support. All changes are additive — storage.ts, sync.ts, service.ts, cli/group.ts, and mcp/tools.ts are left on their current upstream `main` versions and will migrate to the new bridge in follow-up PRs.

New (844 LOC prod):

  • `gitnexus/src/core/group/bridge-db.ts` — atomic write-to-temp with `retryRename` for Windows EBUSY/EPERM, per-item write tolerance via `WriteBridgeReport`, `findContractNode` with three-tier symbol lookup
  • `gitnexus/src/core/group/bridge-schema.ts` — schema DDL
  • `gitnexus/src/core/group/normalization.ts` — contract ID canonicalization + `dedupeContracts` / `dedupeCrossLinks` helpers

Modified (+137 LOC prod):

  • `matching.ts` — adds `runWildcardMatch` for `grpc::Service/*` wildcard consumers, `buildProviderIndex`, canonical gRPC ID handling
  • `types.ts` — `MatchType` gains `'wildcard'`; new `BridgeHandle` and `BridgeMeta` interfaces

New tests (658 LOC):

  • `bridge-db.test.ts`, `bridge-db-edge.test.ts` — core write/read round trip, `WriteBridgeReport` shape, dropped-links counter, `retryRename` behavior on EBUSY/ENOENT/EPERM/EACCES, edge cases

Modified tests (+225 LOC):

  • `matching.test.ts` — wildcard consumer matching, gRPC canonical ID handling, same-service guard

Self-review fixes folded in

Carried forward from the original #606 self-review (see commit body for full list):

  • `writeBridge` try/finally handle lifecycle + `handleClosed` sentinel
  • `openBridgeDbReadOnly` partial-handle cleanup
  • `writeBridgeMeta` uses `retryRename`
  • `retryRename` unit tests (was zero coverage)
  • Per-item try/catch around every CREATE loop
  • Dropped cross-link counter

How to verify

```bash
cd gitnexus
npx tsc --noEmit
npx vitest run test/unit/group/bridge-db.test.ts --pool=forks
npx vitest run test/unit/group/bridge-db-edge.test.ts --pool=forks
npx vitest run test/unit/group/matching.test.ts --pool=forks
```

Pre-commit hook should run clean.

Risk / rollback

Low. All new code sits under `src/core/group/` in new files plus a minimal `+16/-1` diff to `types.ts` and `+136/-0` diff to `matching.ts` (both purely additive). No existing callers reference the new APIs — the PRs that wire them in come later in the split chain. Rollback = `git revert`; no state introduced, no schema migration triggered.

Scope discipline (per `GUARDRAILS.md`)

Dependencies

🤖 Co-authored with Claude Code — findings and fixes written by me, Claude ran the test/edit loop. I reviewed and signed off on every change before pushing.

Part 1 of 4 in the split of #606 (ticket: #791, closes #790 with a
revised plan per @magyargergo's request).

## What changed

Adds the LadybugDB-backed bridge storage infrastructure and extends
the contract matching algorithm with wildcard support. All changes are
additive: storage.ts, sync.ts, service.ts, cli/group.ts, mcp/tools.ts
are left on their upstream main versions and will migrate to the new
bridge in follow-up PRs (#792, #793, #794).

### Files

**New (844 LOC prod):**
- `gitnexus/src/core/group/bridge-db.ts` — atomic write-to-temp with
  `retryRename` for Windows EBUSY/EPERM, per-item write tolerance via
  `WriteBridgeReport`, `findContractNode` with three-tier symbol
  lookup (uid → filePath+name → filePath)
- `gitnexus/src/core/group/bridge-schema.ts` — schema DDL
- `gitnexus/src/core/group/normalization.ts` — contract ID
  canonicalization + `dedupeContracts` / `dedupeCrossLinks` helpers
  used by both matching and bridge write

**Modified (+137 LOC prod):**
- `gitnexus/src/core/group/matching.ts` — adds `runWildcardMatch` for
  `grpc::Service/*` wildcard consumers, `buildProviderIndex` helper,
  and canonical gRPC ID handling in `normalizeContractId`
- `gitnexus/src/core/group/types.ts` — `MatchType` gains `'wildcard'`;
  new `BridgeHandle` and `BridgeMeta` interfaces

**New tests (658 LOC):**
- `gitnexus/test/unit/group/bridge-db.test.ts` — core write/read round
  trip, `WriteBridgeReport` shape, dropped-links counter, retryRename
  behavior on EBUSY/ENOENT/EPERM/EACCES
- `gitnexus/test/unit/group/bridge-db-edge.test.ts` — edge cases
  (malformed meta, missing contract nodes, concurrent access)

**Modified tests (+225 LOC):**
- `gitnexus/test/unit/group/matching.test.ts` — wildcard consumer
  matching, gRPC canonical ID handling, same-service guard

### Self-review fixes folded in

Carried forward from the original #606 self-review:
- `writeBridge` try/finally handle lifecycle + `handleClosed` sentinel
- `openBridgeDbReadOnly` partial-handle cleanup
- `writeBridgeMeta` uses `retryRename` for Windows consistency
- `retryRename` unit tests (was zero coverage)
- Per-item try/catch around every CREATE loop so one malformed contract
  doesn't abort the whole write
- Dropped cross-link counter (`linksDroppedMissingNode`)

### Why now

magyargergo asked for the #606 PR to be split so we can iterate with
confidence (#606 (comment)).
This is the foundational layer — pure infra, no user-facing surface,
no callers of the new APIs in this PR. Later PRs wire it in.

### How to verify

- `cd gitnexus && npx tsc --noEmit`
- `cd gitnexus && npx vitest run test/unit/group/bridge-db.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/bridge-db-edge.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/matching.test.ts --pool=forks`
- Pre-commit hook runs clean

### Risk / rollback

**Low.** All new code sits under `src/core/group/` in new files plus a
minimal `+16/-1` diff to `types.ts` and a `+136/-0` diff to `matching.ts`
(both purely additive). No existing callers reference the new APIs
(bridge-db, openBridgeOrFallback, runWildcardMatch) — the PRs that wire
them in come later in the split chain. Rollback = `git revert` of the
merge commit; no state introduced, no schema migration triggered.

### Scope discipline (per GUARDRAILS.md)

- Only the 8 files listed above are touched; no drive-by refactors
- No CI/release/security config changes
- No secrets, tokens, or machine-specific paths
- Content is lifted from the #606 branch which already passed CI 11/11
  green on `d15b8cb` (before the split)

### Dependencies

- **Base:** `main` (no dependencies on other split PRs)
- **Blocks:** extractor expansion (#792), sync pipeline (#793),
  cross-impact feature (#794)
- **Related ticket:** #791

Co-authored-by: Claude <noreply@anthropic.com>
@vercel

vercel Bot commented Apr 11, 2026

Copy link
Copy Markdown

@ivkond is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude is this pr ready for production? Is this following clean code principles? Are there any performance bottlenecks? Is the code structured in a way that will unlock easy contributions in the future?

@github-actions

github-actions Bot commented Apr 11, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 3m 4s —— View job


PR Review — feat(group): bridge.lbug storage + contract matching expansion

  • Gather context and understand request
  • Read changed files
  • Analyze production readiness
  • Analyze clean code principles
  • Identify performance bottlenecks
  • Evaluate contribution-friendliness

Overall Verdict

Conditionally ready for production. The architectural foundations are solid — atomic swap, backup recovery, per-item fault tolerance, and schema versioning are all correctly implemented. There are no blocking bugs, but two medium-severity issues should be fixed before this ships, and a handful of minor items worth tracking.


Production Readiness

Strengths:

  • ✅ Atomic write-to-temp-then-rename pattern prevents partial bridge state
  • .bak recovery path in openBridgeDbReadOnly handles interrupted swaps
  • try/finally + handleClosed sentinel correctly prevents resource leaks even on mid-write exceptions
  • WriteBridgeReport surfaces partial-success state without aborting the sync
  • retryRename with exponential backoff correctly handles Windows EBUSY/EPERM/EACCES
  • ✅ Schema version gate prevents incompatible reads with a clean fallback
  • dedupeContracts / dedupeCrossLinks prevents duplicate inserts before DB write

Medium issue — openBridgeDbReadOnly bak recovery uses bare fsp.rename instead of retryRename (bridge-db.ts:416):

// Current — inconsistent with rest of file on Windows
await fsp.rename(bakPath, dbPath);

// Should be
await retryRename(bakPath, dbPath);

The very scenario this recovery path exists for (interrupted write on Windows) is also the scenario where the rename of the .bak file is most likely to get an EBUSY. Fix this →

Medium issue — ensureBridgeSchema detects existing-table errors by message substring (bridge-db.ts:34):

if (!msg.includes('already exists')) throw err;

This is fragile: a LadybugDB version bump that rephrases its error message would silently break idempotent schema creation (or worse, swallow real errors). If the DDL syntax supports IF NOT EXISTS, that's the clean fix. If not, at minimum the matched string should be a named constant or the LadybugDB error code (if one exists) should be used. Fix this →

Minor — queryBridge could panic if LadybugDB returns an empty array (bridge-db.ts:52):

const result = Array.isArray(queryResult) ? queryResult[0] : queryResult;
// queryResult[0] is `undefined` if array is empty → .getAll() throws

Worth adding a guard: const result = Array.isArray(queryResult) ? (queryResult[0] ?? queryResult) : queryResult; — or confirming with a LadybugDB test that the driver never returns an empty top-level array.


Clean Code

Strengths:

  • ✅ Module boundaries are coherent and purposeful (bridge-db = I/O, bridge-schema = DDL, normalization = dedup logic, matching = algorithms)
  • ✅ Function names are descriptive and idiomatic
  • ✅ The contractRichness scoring approach in normalization.ts is elegant
  • ✅ Comments on gRPC canonicalization (matching.ts:42–51) are genuinely helpful — rare and appreciated
  • MAX_SAMPLE_ERRORS constant prevents unbounded report growth

Issue — findContractNode is a 35-line async closure buried inside writeBridge (bridge-db.ts:278-313):

This is the most significant clean code concern. A 200+ line function containing a 35-line inner function that does its own multi-step DB lookup is hard to read and impossible to test in isolation. Extracting it as a module-private function findContractNode(handle, repo, role, symbolUid, filePath, symbolName) would make the write loop much cleaner and unlock targeted unit tests. Fix this →

Minor — magic numbers in contractRichness (normalization.ts:19-26):
The weights (3, 2, 2, 2, 1, 1) carry implicit policy about what makes a contract "richer" but have no names or comments explaining the rationale. A brief comment or named constants would help contributors understand the priority ordering without reverse-engineering it.

Minor — BridgeHandle._db and ._conn are typed unknown but cast everywhere:
The underscore convention signals "internal" but TypeScript doesn't enforce it — any consumer can cast and use the raw DB handle. Consider a module-boundary approach (only bridge-db.ts imports from @ladybugdb/core, everyone else uses BridgeHandle), which the code already does in practice. The typing is fine for now.


Performance Bottlenecks

Medium — N+1 query pattern in findContractNode (bridge-db.ts:315-363):

For each cross-link, findContractNode is called twice (from + to), and each call issues up to 3 sequential DB round-trips (UID → filePath+symbolName → filePath-only). In the worst case with M cross-links: up to 6M database queries. For the current scale (hundreds of contracts) this is likely fine, but at thousands of cross-links across large groups it will be the bottleneck.

A caching layer (Map<string, string> keyed on lookup params, built once before the link loop) would reduce this to O(contracts) queries total instead of O(links × 3). This is worth a TODO comment if not fixing now. Fix this →

Minor — sequential per-item inserts without batching:
Each contract and snapshot is inserted with a separate queryBridge call (N round-trips for N contracts). If LadybugDB supports UNWIND or multi-row inserts, batching in groups of e.g. 100 would substantially speed up large syncs. Worth a // TODO: batch when LadybugDB supports UNWIND comment.

Minor — queryBridge creates a new prepared statement per parameterized call without explicit close:
Prepared statements may hold resources depending on LadybugDB internals. The current pattern (prepare → execute → discard) is common but worth verifying against the LadybugDB docs that statements are GC'd correctly.


Contribution-Friendliness

Strengths:

  • ✅ Clear module boundaries make it easy to extend without touching unrelated code
  • ✅ Adding a new contract type matcher = add a case in normalizeContractId + a match in findMatchingKeys
  • WriteBridgeReport is extensible — new failure categories = new counter field
  • BRIDGE_SCHEMA_VERSION establishes the right pattern for future migrations
  • ✅ Tests are real integration tests with temp dirs, not brittle mocks (good model for contributors)

Issue — makeContract test helper is copy-pasted between bridge-db.test.ts and bridge-db-edge.test.ts (test files, lines 120–131 and 13–24):
Any change to the default contract shape requires updating two files. Extracting to a shared test/unit/group/fixtures.ts would fix this and serve future test files in the split chain. Fix this →

Minor — findContractNode is untestable in isolation (see clean code note above). Contributors who want to improve the three-tier symbol lookup logic have no unit test surface for it.

Minor — no migration guide in BRIDGE_SCHEMA_VERSION (bridge-schema.ts:6):
The version gate returns null on mismatch (triggering re-sync), which is a valid V1 strategy. But the next contributor who bumps this constant won't know that — a brief comment explaining the contract ("bumping this forces a full re-sync; add a migration path in a future PR if the data volume warrants it") would prevent confusion.


Summary Table

Area Status Blockers
Production readiness ✅ Near-ready fsp.renameretryRename in bak recovery path
Clean code ✅ Good Extract findContractNode out of writeBridge
Performance ⚠️ Adequate for current scale N+1 queries in findContractNode — needs cache or TODO for growth
Contribution-friendliness ✅ Good Deduplicate makeContract test fixture

The two medium issues (bak recovery rename + findContractNode extraction) are the only items I'd want addressed before merge. The rest can land as follow-up issues.

@github-actions

github-actions Bot commented Apr 11, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
6039 5942 0 97 221s

✅ All 5942 tests passed

97 test(s) skipped — expand for details
  • Swift MethodExtractor > isTypeDeclaration > recognizes class_declaration
  • Swift MethodExtractor > isTypeDeclaration > recognizes protocol_declaration
  • Swift MethodExtractor > isTypeDeclaration > rejects import_declaration
  • Swift MethodExtractor > visibility > extracts public method
  • Swift MethodExtractor > visibility > extracts private method
  • Swift MethodExtractor > visibility > defaults to internal when no modifier
  • Swift MethodExtractor > protocol methods > marks protocol method as abstract
  • Swift MethodExtractor > static and class methods > detects static func as isStatic
  • Swift MethodExtractor > static and class methods > detects class func as isStatic
  • Swift MethodExtractor > parameters > extracts parameters with types and default values
  • Swift MethodExtractor > return type > extracts return type from -> annotation
  • Swift MethodExtractor > annotations > extracts @objc attribute
  • Swift MethodExtractor > isFinal > detects final func
  • Swift MethodExtractor > isFinal > is false when not final
  • Swift MethodExtractor > isAsync > detects async func
  • Swift MethodExtractor > isOverride > detects override method
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference uses lookupClassByName
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift explicit init inference uses lookupClassByName
  • buildTypeEnv > constructor inference (Tier 1 fallback) > lookupClassByName regression coverage > Swift lookupClassByName regression coverage > Swift cross-file constructor inference does not bind plain functions
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
  • Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
  • Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
  • Swift self resolution > detects User and Repo classes, each with a save function
  • Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
  • Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
  • Swift parent resolution > emits EXTENDS edge: User → BaseModel
  • Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
  • Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
  • Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
  • Swift return type inference > detects User class and getUser function
  • Swift return type inference > detects save function on User (Swift class methods are Function nodes)
  • Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
  • Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
  • Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
  • Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()
  • Swift implicit imports (cross-file visibility) > detects UserService class in Models.swift
  • Swift implicit imports (cross-file visibility) > resolves UserService() constructor call across files (no explicit import)
  • Swift implicit imports (cross-file visibility) > resolves service.fetchUser() member call across files
  • Swift implicit imports (cross-file visibility) > creates IMPORTS edges between files in the same module
  • Swift extension deduplication > detects Product class
  • Swift extension deduplication > resolves Product() constructor despite extension creating duplicate class node
  • Swift extension deduplication > resolves product.save() to Product.swift (primary definition)
  • Swift constructor call fallback (no new keyword) > resolves OCRService() as constructor call across files
  • Swift constructor call fallback (no new keyword) > resolves ocr.recognize() member call via constructor-inferred type
  • Swift export visibility (internal vs private) > resolves PublicService() constructor across files
  • Swift export visibility (internal vs private) > resolves internalHelper() across files (internal = module-scoped)
  • Swift if let / guard let binding resolution > detects User and Repo classes
  • Swift if let / guard let binding resolution > resolves user.save() inside if-let to User#save
  • Swift if let / guard let binding resolution > resolves repo.save() inside guard-let to Repo#save
  • Swift if let / guard let binding resolution > user.save() in if-let does NOT resolve to Repo#save
  • Swift await / try expression unwrapping > resolves user.save() via await fetchUser() return type
  • Swift await / try expression unwrapping > resolves repo.save() via try parseRepo() return type
  • Swift await / try expression unwrapping > detects fetchUser and parseRepo as functions
  • Swift for-in loop element type inference > detects User and Repo classes
  • Swift for-in loop element type inference > creates implicit import edges between files
  • Swift field-type resolution > detects classes and their properties
  • Swift field-type resolution > emits HAS_PROPERTY edges from class to field
  • Swift field-type resolution > resolves field-chain call user.address.save() → Address#save
  • Swift field-type resolution > emits ACCESSES edges for field reads in chains
  • Swift field-type resolution > populates field metadata (visibility, declaredType) on Property nodes
  • Swift call-result binding > resolves call-result-bound method call user.save() → User#save
  • Swift call-result binding > getUser() is present as a defined function
  • Swift call-result binding > emits processUser -> getUser CALLS edge for let-assigned free function call
  • Swift method enrichment > detects Animal protocol and Dog class
  • Swift method enrichment > emits IMPLEMENTS edge Dog -> Animal
  • Swift method enrichment > emits HAS_METHOD edges for Dog methods
  • Swift method enrichment > marks protocol Animal.speak as isAbstract
  • Swift method enrichment > marks Dog.speak as NOT isAbstract
  • Swift method enrichment > marks breathe as isFinal
  • Swift method enrichment > marks classify as isStatic
  • Swift method enrichment > captures @objc annotation on breathe
  • Swift method enrichment > populates parameterTypes for classify(_ name: String)
  • Swift method enrichment > records parameterCount for classify
  • Swift method enrichment > records returnType for speak
  • Swift method enrichment > resolves dog.speak() CALLS edge
  • Swift method enrichment > resolves Dog.classify("dog") CALLS edge
  • Swift abstract dispatch > detects Repository protocol and SqlRepository class
  • Swift abstract dispatch > emits IMPLEMENTS edge SqlRepository -> Repository
  • Swift abstract dispatch > emits HAS_METHOD edges for Repository.find and Repository.save
  • Swift abstract dispatch > emits HAS_METHOD edges for SqlRepository.find and SqlRepository.save
  • Swift abstract dispatch > marks base Repository.find as isAbstract
  • Swift abstract dispatch > marks base Repository.save as isAbstract
  • Swift abstract dispatch > marks concrete SqlRepository.find as NOT isAbstract
  • Swift abstract dispatch > resolves repo.find(id: 42) CALLS edge
  • Swift abstract dispatch > resolves repo.save(entity: user) CALLS edge
  • Swift abstract dispatch > populates parameterTypes for Repository.find
  • Swift abstract dispatch > populates parameterTypes for Repository.save
  • Swift abstract dispatch > records returnType for SqlRepository.find
  • Swift abstract dispatch > emits METHOD_IMPLEMENTS edges from SqlRepository methods → Repository protocol methods
  • Swift overloaded method disambiguation > detects 2 distinct find Method nodes on SqlRepository
  • Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edges for both find overloads
  • Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edge for save
  • Swift overloaded method disambiguation > emits exactly 3 METHOD_IMPLEMENTS edges total
  • Swift Child extends Parent — inherited method resolution (SM-9) > detects Parent and Child classes
  • Swift Child extends Parent — inherited method resolution (SM-9) > resolves c.parentMethod() to Parent.parentMethod via first-wins MRO walk

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 72.35% 15809/21849 72.18% 📈 +0.2 🟢 ██████████████░░░░░░
Branches 61.4% 10252/16695 61.31% 📈 +0.1 🟢 ████████████░░░░░░░░
Functions 77.27% 1472/1905 76.99% 📈 +0.3 🟢 ███████████████░░░░░
Lines 74.52% 14325/19223 74.35% 📈 +0.2 🟢 ██████████████░░░░░░

📋 View full run · Generated by CI

@magyargergo

Copy link
Copy Markdown
Collaborator

@ivkond could you please look at the findings above?

Addresses the findings from the automated review on PR #795
(#795 (comment)
— posted by @magyargergo / claude-code Action run).

### Medium severity (reviewer flagged as blockers)

- **bridge-db.ts `openBridgeDbReadOnly` bak recovery** — the `.bak`
  recovery path used bare `fsp.rename(bakPath, dbPath)`, which is
  exactly the scenario most likely to hit Windows EBUSY/EPERM (an
  interrupted writer still holding the handle for a few ms). Switched
  to `retryRename` for consistency with the rest of the file's
  Windows-safe rename path.
- **bridge-db.ts `ensureBridgeSchema` error detection** — the inline
  `msg.includes('already exists')` substring match has been lifted
  into a named constant `LBUG_ALREADY_EXISTS_MSG` with a comment
  documenting the coupling to LadybugDB's error message wording and
  why we can't use `IF NOT EXISTS` (LadybugDB DDL doesn't support it)
  or typed errors (LadybugDB's JS driver doesn't expose error codes).
  Also tightened the `catch (err: any)` to `catch (err: unknown)`.
- **bridge-db.ts `findContractNode` — extracted out of writeBridge**
  — the 35-line async closure living inside `writeBridge` has been
  lifted to three module-level functions: `createContractLookupIndex`,
  `indexContract`, and `findContractNode`. `findContractNode` is now
  a pure synchronous function taking a prebuilt index instead of
  doing its own DB queries. The `writeBridge` cross-link loop is now
  ~25 lines instead of ~100.
- **bridge-db.ts `findContractNode` — N+1 query elimination** — the
  old inner-closure version issued up to 6 DB round-trips per
  cross-link (2 endpoints × up to 3 tiers of fallback queries). For a
  group with 1000 cross-links, that's up to 6000 DB queries just to
  resolve endpoints. The new version consults an in-memory
  `ContractLookupIndex` built incrementally as contracts are inserted
  (`indexContract` called AFTER each successful insert so failed
  inserts don't poison the index). Cross-link resolution is now
  O(1) per link instead of O(3) DB queries per link, with zero DB
  round-trips during the cross-link loop.

### Minor severity

- **bridge-db.ts `queryBridge` empty-array guard** — if LadybugDB
  ever returns an empty `QueryResult[]` at the top level (shouldn't
  happen with single-statement calls, but driver contract isn't
  explicit), the old code would call `.getAll()` on `undefined` and
  crash with a confusing stack. Added an `unwrapQueryResult` helper
  that throws an explicit `'empty QueryResult array'` error instead,
  making a potential driver regression visible immediately.
- **normalization.ts `contractRichness` weights** — added a
  block-level comment documenting the weight ordering (+3 for
  symbolUid, +2 for each symbol-identifying field, +1 for service
  tag or non-manifest origin) and explicitly noting that the
  absolute numbers don't matter, only the relative ordering. Matches
  the "comment for contributors" suggestion in the review.
- **bridge-schema.ts `BRIDGE_SCHEMA_VERSION` migration comment** —
  added a 4-point contract explaining what bumping the constant
  means ("discard and re-sync" strategy for V1, no in-place
  migration yet, new migration logic should live in a separate
  `bridge-migrations.ts` module when it becomes necessary).
- **test/unit/group/fixtures.ts** — extracted the `makeContract`
  helper previously copy-pasted between `bridge-db.test.ts` and
  `bridge-db-edge.test.ts` into a shared fixtures module. Both test
  files now import from `./fixtures.js`. Kept the scope minimal:
  fixtures is NOT a general-purpose factory module, just the shared
  baseline contract builder.

### New tests

Added 9 pure-function unit tests for the now-extracted
`findContractNode` in `bridge-db.test.ts`:
  - returns null on empty index
  - tier 1 (symbolUid) match, including repo-scope and role-scope
    isolation
  - tier 2 (filePath + symbolName) fallback when symbolUid is empty
    or mismatches
  - tier 3 (filePath only) when exactly one contract lives in the
    file, and refusal when multiple do
  - priority ordering when multiple tiers could resolve

These are fully isolated — no DB, no temp directories, no native
LadybugDB binding — so they run in &lt;10ms total and are
immediately trustworthy as a regression safety net.

### Deliberately deferred (reviewer marked as "fine for now")

- `BridgeHandle._db` / `._conn` typing to `unknown` with casts in
  `bridge-db.ts` — reviewer's note: "The typing is fine for now."
- Batch inserts via `UNWIND` — needs LadybugDB support confirmation,
  tracked as a follow-up; the per-item pattern remains.
- `queryBridge` prepared-statement lifecycle — the current pattern
  (prepare → execute → GC) relies on LadybugDB's internals, worth
  verifying against their docs in a separate audit.

### Scope discipline (per `GUARDRAILS.md`)

- Only files touched by this PR (`bridge-db.ts`, `bridge-schema.ts`,
  `normalization.ts`, both bridge test files, new `fixtures.ts`) —
  no drive-by refactors
- No CI/release/security config changes
- No secrets

### Test + typecheck status

- `npx tsc --noEmit` clean
- `bridge-db.test.ts`: added 9 `findContractNode` tests, all pass in
  isolation. The full-file run still hits the pre-existing native
  LadybugDB cleanup segfault that flakes the reported count — same
  as every prior commit on this branch, not a regression.
- `bridge-db-edge.test.ts`: 4/4 pass
- `matching.test.ts`: 28/28 pass
- `types.test.ts`: 5/5 pass
- `retryRename` tests (4/4) and `findContractNode` tests (9/9)
  verified in isolation via `-t` filter

Co-authored-by: Claude <noreply@anthropic.com>
@ivkond

ivkond commented Apr 11, 2026

Copy link
Copy Markdown
Contributor Author

@magyargergo addressed all findings in `0e9fd78`.

Medium:

  • `openBridgeDbReadOnly` bak recovery now uses `retryRename` for Windows EBUSY consistency.
  • `ensureBridgeSchema` substring check lifted into a named `LBUG_ALREADY_EXISTS_MSG` constant (LadybugDB DDL doesn't support `IF NOT EXISTS` and its JS driver doesn't expose error codes).
  • `findContractNode` extracted out of `writeBridge` into module-level pure functions.
  • N+1 eliminated: an in-memory `ContractLookupIndex` is built incrementally during contract insert, so cross-link resolution is now O(1) per link with zero DB round-trips (was up to 6).

Minor: `queryBridge` empty-array guard, `contractRichness` weights documented, `BRIDGE_SCHEMA_VERSION` migration contract comment, `makeContract` extracted to shared `test/unit/group/fixtures.ts`.

New tests: 9 pure-function unit tests for `findContractNode` covering all three tiers + repo/role scope isolation + priority ordering. No DB, no temp dirs, <10ms total.

Deferred (per reviewer's "fine for now"): `BridgeHandle` typing, batch inserts via `UNWIND`, prepared-statement lifecycle audit.

Typecheck clean, pre-commit clean, CI run in progress.

@magyargergo

Copy link
Copy Markdown
Collaborator

@claude are we backward compatible with these changes?

@github-actions

github-actions Bot commented Apr 11, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 1m 24s —— View job


Backward Compatibility Analysis

  • Gather context
  • Read modified files (types.ts, matching.ts)
  • Inspect existing callers (sync.ts, cli/group.ts)
  • Provide compatibility verdict

Short answer: Yes, backward compatible — with one intentional behavioral nuance worth knowing.


What changed in the two modified files

types.tsMatchType gains 'wildcard'

- export type MatchType = 'exact' | 'manifest' | 'bm25' | 'embedding';
+ export type MatchType = 'exact' | 'manifest' | 'wildcard' | 'bm25' | 'embedding';

Additive — adding a new member to a union type doesn't break existing code that reads or stores matchType as a string. The only risk would be exhaustive switch statements with a TypeScript never guard on the default branch. A search across the codebase finds no such exhaustive switch on MatchTypecli/group.ts:178 filters with l.matchType === 'exact' (a simple equality check), which is unaffected. 'wildcard' links simply won't appear in that count until follow-up PRs wire them in, which is correct.

The two new interfaces BridgeHandle and BridgeMeta are purely additive.

matching.tsrunExactMatch signature + behavior

Signature change:

- export function runExactMatch(contracts: StoredContract[]): MatchResult
+ export function runExactMatch(contracts: StoredContract[], providerIndex?: Map<...>): MatchResult

The new parameter is optional. The sole existing caller at sync.ts:161 calls runExactMatch(autoContracts) — this continues to compile and run identically. ✅

Behavioral change — gRPC wildcard consumers:

The new code excludes grpc::Service/* wildcard consumers from the exact-match loop and pushes them directly to unmatched. The question is whether this is a regression. Tracing through the old code: a wildcard consumer normalized to e.g. grpc::userservice/* would be looked up via findMatchingKeys against provider keys like grpc::userservice/Loginno exact match is possible, so the wildcard consumer ended up in unmatched anyway. The end result is identical; the new code just short-circuits it explicitly rather than doing a no-op pass.

Effective behavioral change: none. Wildcard consumers were already silently unmatched in the old code.


One intentional nuance to be aware of

sync.ts:161–162 currently discards unmatched entirely:

const { matched, unmatched } = runExactMatch(autoContracts);
const crossLinks: CrossLink[] = matched; // unmatched is dropped

Now that gRPC wildcard consumers are explicitly routed through unmatched, they will be silently excluded from the registry output until the follow-up PR (#792/#793) wires in runWildcardMatch. This is the stated intent of the split, but it means: if any user today has grpc::Service/* style consumers, they will produce zero cross-links for that contract type until the follow-up lands. This is not a regression (it didn't work before either), but it is a gap window in the split chain.


All new files — zero breakage risk

bridge-db.ts, bridge-schema.ts, normalization.ts, and fixtures.ts are brand-new files with no existing callers. They contribute zero backward compatibility risk.


Summary

Area Verdict
runExactMatch signature ✅ Backward compatible (optional param)
runExactMatch behavior ✅ Functionally equivalent for all existing inputs
MatchType union expansion ✅ Additive, no exhaustive switches to break
ContractRegistry / .group.json schema ✅ No format change
New files / new exports ✅ Purely additive
gRPC wildcard consumers during split window ⚠️ Silently unmatched — same as before, but explicitly intentional until #792/#793 land

This PR is backward compatible. The only thing worth tracking is the split-window gap for wildcard consumers, which is expected behavior per the PR description.

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one!

@magyargergo magyargergo merged commit 1ff324c into abhigyanpatwari:main Apr 11, 2026
12 of 13 checks passed
@ivkond ivkond deleted the feat/group-bridge-storage branch April 11, 2026 18:47
magyargergo pushed a commit that referenced this pull request Apr 13, 2026
…lit) (#796)

* feat(group): extractor expansion + manifest extractor

Part 2 of 4 in the split of #606 (ticket: #792). Follows #795
(bridge.lbug storage foundation, already merged), but this PR has no
code-level dependency on #795 — it only imports types and the
ContractExtractor interface that existed on upstream main before
either PR. It could have been reviewed in parallel with #795.

## What changed

Expands the 3 existing contract extractors with substantially more
language/framework coverage, and adds a new `manifest-extractor`
that resolves `group.yaml`-declared cross-links against the per-repo
graph via exact-name lookups.

### New file (228 LOC)

- `gitnexus/src/core/group/extractors/manifest-extractor.ts` —
  exact graph lookup for `group.yaml`-declared cross-links. HTTP
  paths are canonicalized before Route.name matching; gRPC is
  resolved by service/method name (NO `.proto`-filename fallback);
  topic and lib use exact-name match. Falls back to a synthetic
  `manifest::<repo>::<contractId>` uid when the graph has no
  matching symbol, so cross-impact traversal still has a stable
  anchor for the contract.

### Modified extractors (+958 LOC prod)

- `extractors/grpc-extractor.ts` (+522) — `.proto` parser with
  comment and string-literal sanitization (braces inside strings no
  longer truncate service bodies); package/service/method canonical
  IDs; server/client detection across Go (`grpc.NewServer`,
  `RegisterXxxServer`, `XxxGrpc.XxxImplBase`), Java (`@GrpcService`,
  `BlockingStub`), Python (`servicer_to_server`, `XxxStub`), and
  TypeScript/Node (`@GrpcMethod`, `ClientGrpc`, `loadPackageDefinition`).
- `extractors/http-route-extractor.ts` (+174) — Go gin/echo/stdlib
  `HandleFunc`, NestJS `@Controller`+`@Get`/etc, Python FastAPI
  decorators, Java Spring `@RequestMapping`/`@GetMapping`,
  restTemplate / WebClient / OkHttp consumers.
- `extractors/topic-extractor.ts` (+98) — sarama `ProducerMessage{}`
  struct literal detection (replaces a constructor-anchored regex
  that missed topics inside producer loops), kafka-go Writer/Reader,
  Python NATS (`await nc.subscribe`/`await nc.publish`), JetStream
  helpers.

### Modified and new tests (+1264 LOC)

- `grpc-extractor.test.ts` (+539) — full coverage of the new proto
  parser (strings-with-braces regression, comments-with-braces
  regression), per-language server/client detection
- `http-route-extractor.test.ts` (+240) — per-framework route
  extraction + normalization edge cases
- `topic-extractor.test.ts` (+177) — the sarama in-loop regression,
  JetStream, Python NATS, kafka-go Writer/Reader
- `manifest-extractor.test.ts` (+308 NEW) — HTTP path normalization,
  gRPC exact lookup with proto-fallback regression, lib and topic
  exact matching, synthetic-uid fallback behavior

### Self-review fixes folded in

Carried forward from the #606 self-review (commit `d15b8cb`):

- **HIGH #1** — `manifest-extractor.resolveSymbol` was too fuzzy.
  Previously used `CONTAINS` on route/name fields plus an
  unconditional `filePath ENDS WITH '.proto'` fallback for gRPC.
  Consequences: `/orders` matched `/suborders`, and any repo with
  any `.proto` file returned a random proto symbol for a gRPC
  manifest entry. Replaced with exact equality + deterministic
  `ORDER BY` + synthetic-uid fallback for unresolved manifests.
  Regression tests included.
- **MED #3** — gRPC proto parser brace-depth counting now sanitizes
  strings and comments first (`stripProtoCommentsAndStrings`). A
  valid proto with `option deprecated_reason = "use NewService {
  instead"` used to have its service body closed early by the `"{"`
  inside the literal, silently dropping methods after the offending
  string. Regression tests for both string-with-brace and
  comment-with-brace cases.
- **MED #4** — sarama Kafka regex changed from
  `sarama.NewSyncProducer[\s\S]{0,300}?Topic:` (anchored on
  constructor, caught only first topic in a loop) to
  `sarama.ProducerMessage{...Topic:}` (matches every struct literal
  directly). Regression test with a for-loop that constructs
  multiple `ProducerMessage`s.
- **MED #7** — `manifest-extractor.resolveSymbol` no longer has a
  silent `catch { /* fall through */ }`. Errors from the graph
  executor are logged via `console.warn` with link type, contract
  name, repo key, and error message before falling through to the
  synthetic-uid path.

## Why

Reviewer focus here is pure regex / parser correctness — no
storage, no Cypher queries, no algorithmic changes to the cross-link
algorithm. Separating this from the bridge foundation PR (#795)
meant reviewers could stay in a single mental mode (parsing logic)
instead of context-switching between DDL, Cypher, and regex.

## How to verify

- `cd gitnexus && npx tsc --noEmit`
- `cd gitnexus && npx vitest run test/unit/group/grpc-extractor.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/http-route-extractor.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/topic-extractor.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/manifest-extractor.test.ts --pool=forks`

Local pre-push: typecheck clean, all 99 extractor unit tests pass
(grpc 43, http 18, topic 30, manifest 8).

## Risk / rollback

**Low.** Extractors have no user-facing surface in this PR — they
produce `ExtractedContract[]` that is consumed by `sync.ts` in the
next split (#793). No existing behavior changes for users who don't
run a `group sync`. Rollback = `git revert` of the merge commit;
the modifications to `grpc-extractor.ts` / `http-route-extractor.ts`
/ `topic-extractor.ts` revert to the pre-PR versions that still
work (they're subsets of the new functionality).

## Scope discipline (per GUARDRAILS.md)

- Only the 8 files above are touched; no drive-by refactors
- No CI/release/security config changes
- No secrets or machine-specific paths
- Content lifted from #606 (CI 11/11 green on `d15b8cb`)

## Dependencies

- **Base:** `main` (upstream already includes #795 as `1ff324c`)
- **Blocks:** sync pipeline (#793) and the cross-impact feature (#794)
- **Tracker issue:** #792
- **Parent PR:** #606

Co-authored-by: Claude <noreply@anthropic.com>

* refactor(group): migrate topic-extractor from regex to tree-sitter queries

Addresses @magyargergo's feedback on #796 that regex-based lookups
should use tree-sitter nodes instead, and that the top-level
extractors must NOT carry language dependencies. This is phase 1 of
a multi-step migration — topic-extractor first because its patterns
are the most uniform (16 "call/annotation with first-arg string
literal" variants), which makes it a clean proof of the approach
before grpc-extractor and http-route-extractor get the same treatment.

## Architecture: language-agnostic orchestrator + per-language plugins

The top-level extractor is a thin orchestrator that never imports a
tree-sitter grammar or a query string. Per-language knowledge lives
in a new `topic-patterns/` folder with one file per language plus a
registry that maps file extensions to compiled plugins:

```
src/core/group/extractors/
├── tree-sitter-scanner.ts         # shared, language-agnostic scanning utilities
├── topic-extractor.ts              # thin orchestrator (no grammar imports)
└── topic-patterns/
    ├── types.ts                    # TopicMeta, Broker
    ├── index.ts                    # registry: extension → compiled provider
    ├── java.ts                     # tree-sitter-java + JAVA_TOPIC_PROVIDER
    ├── go.ts                       # tree-sitter-go + GO_TOPIC_PROVIDER
    ├── python.ts                   # tree-sitter-python + PYTHON_TOPIC_PROVIDER
    └── node.ts                     # tree-sitter-javascript + tree-sitter-typescript
                                    # → JAVASCRIPT_/TYPESCRIPT_/TSX_TOPIC_PROVIDER
```

**Shared scanner (`tree-sitter-scanner.ts`)** — defines
`PatternSpec<TMeta>`, `LanguagePatterns<TMeta>`, `CompiledPatterns<TMeta>`
and the `scanFile(parser, plugin, content)` helper. Plugins compile their
queries eagerly at module load via `compilePatterns()`, so a broken
pattern fails loudly at import time instead of silently at scan time.
`unquoteLiteral()` handles single/double/template quotes, Python
triple-quoted strings, and Go raw backtick strings.

**Per-language plugins** own:
- the tree-sitter grammar import (this is the ONLY place in
  `src/core/group/` where tree-sitter grammars are imported),
- the query S-expressions,
- the `TopicMeta` payload (role, broker, confidence, symbolName) that
  the orchestrator receives back on every match.

Each plugin uses a `@value` capture name to bind the topic literal node.
The JavaScript and TypeScript grammars share AST node names for every
construct we query, so `node.ts` defines the pattern sources once and
compiles them against `JavaScript`, `TypeScript.typescript`, and
`TypeScript.tsx` — exporting three providers because `Parser.Query`
objects are NOT portable across grammar instances.

**Registry (`topic-patterns/index.ts`)** — maps `.java` → Java provider,
`.go` → Go, `.py` → Python, `.js`/`.jsx` → JS, `.ts` → TS, `.tsx` → TSX.
Also exports `TOPIC_SCAN_GLOB` so adding a new language is a single
file-level edit (drop `topic-patterns/<lang>.ts`, import + register it
here — zero edits required in `topic-extractor.ts`).

**Orchestrator (`topic-extractor.ts`)** — ~110 lines, no grammar or
query imports. Per file: `getProviderForFile(rel)` → `scanFile(parser,
provider, content)` → `unquoteLiteral(valueText)` → `makeContract(...)`.
Reuses one `Parser` instance across files; the scanner calls
`setLanguage` per plugin.

## Why this is better than regex

1. **Comments and strings are respected for free.** The old regex
   would match `// kafkaTemplate.send("fake.topic")` as a real
   producer; tree-sitter never visits comments or string literals as
   code nodes, so false positives from commented-out code are
   eliminated.
2. **Struct/object literal patterns are structural, not textual.**
   `sarama.ProducerMessage{Topic: "..."}` no longer needs a 300-char
   lookahead (which was a known cross-match bug partly mitigated by a
   loop regression test in the self-review). The new query matches a
   specific `composite_literal` with a specific `qualified_type` and
   `keyed_element` — exactly one struct literal per match.
3. **No order-of-operations fragility.** Regex for
   `channel.publish` vs `channel.consume` was independent and
   file-wide; the AST scopes matches to the specific `call_expression`.
4. **Language-agnostic extension.** Adding Ruby, Rust, or C# topic
   detection later means dropping one file in `topic-patterns/` — no
   changes to shared scanner or orchestrator, and no tree-sitter
   imports leak into top-level code.

## Per-file fault tolerance

- Malformed files that tree-sitter can't parse are silently skipped
  (`parser.parse` is wrapped by `scanFile`). The ingestion pipeline
  already logs unparseable files at index time.
- A syntactically invalid query is caught at `compilePatterns` time,
  not scan time — broken plugins fail loudly at import.
- Per-pattern `matches()` failures are swallowed so one broken query
  in a plugin doesn't block the rest.

## Tests

All 30 existing `topic-extractor.test.ts` tests pass **without any
changes to the test file** — they were written as input/output contract
tests (given this source file, expect these `ExtractedContract` objects)
and that contract is unchanged. Regression coverage includes:

- Kafka: Java `@KafkaListener` + `kafkaTemplate.send`; Node
  `producer.send` + `consumer.subscribe`; Go sarama producer/consumer
  (sync and async); kafka-go Writer/Reader; Python `KafkaConsumer` +
  `producer.send/produce`
- RabbitMQ: Java `@RabbitListener` + `rabbitTemplate.convertAndSend`;
  Node `channel.consume/publish/sendToQueue`; Python `basic_consume/
  basic_publish` with keyword args
- NATS: Go and Node `nc.Subscribe/Publish`; Go and Node JetStream
  `js.Subscribe/Publish`; Python `await nc.subscribe/publish`

Including the regression test for the sarama `ProducerMessage`
in-loop case — the AST-based query captures every literal in the
file independently, not just the first one after `NewSyncProducer`.

## Neighbor regression check

- `topic-extractor.test.ts` — 30/30 pass (rewritten extractor)
- `http-route-extractor.test.ts` — 18/18 pass (untouched)
- `grpc-extractor.test.ts` — 43/43 pass (untouched)
- `manifest-extractor.test.ts` — 8/8 pass (untouched)
- Full `npx tsc --noEmit` clean

## Scope discipline (per GUARDRAILS.md)

- Only files under `src/core/group/extractors/` are touched; no
  changes to other extractors, tests, MCP surface, or pipeline.ts.
- No CI/release/security config changes, no secrets.
- New tree-sitter imports all reference grammars that are already
  installed as dependencies (`tree-sitter`, `tree-sitter-javascript`,
  `tree-sitter-typescript`, `tree-sitter-python`, `tree-sitter-java`,
  `tree-sitter-go` — all in `package.json` for the existing pipeline).

## Phase 2 / phase 3 plan

- **Phase 2 (next commit):** rewrite `http-route-extractor.ts`
  Strategy B (regex fallback) on the same plugin pattern. Graph-assisted
  Strategy A stays as-is (already uses pipeline-built tree-sitter data
  via `HANDLES_ROUTE` Cypher queries).
- **Phase 3 (commit after):** rewrite `grpc-extractor.ts` for Java /
  Go / Python / TypeScript detection. `.proto` files are the one
  outstanding question — there is no `tree-sitter-proto` grammar
  installed; the in-tree string-sanitizing parser stays as a pragmatic
  exception with a comment, alternative being to add
  `tree-sitter-proto` as a dep (open for the maintainer).

Co-authored-by: Claude <noreply@anthropic.com>

* refactor(group): migrate http-route-extractor Strategy B to tree-sitter plugins

Phase 2 of the extractor refactor requested by @magyargergo on #796.
Same architecture as the phase 1 topic-extractor rewrite: a thin,
language-agnostic orchestrator plus per-language plugins that own
tree-sitter grammars and query sources. The top-level extractor file
no longer imports any tree-sitter grammar or query string.

## Architecture

```
src/core/group/extractors/
├── tree-sitter-scanner.ts          # shared, language-agnostic primitives
├── http-route-extractor.ts         # thin orchestrator (no grammar imports)
└── http-patterns/
    ├── types.ts                    # HttpDetection, HttpLanguagePlugin, HttpRole
    ├── index.ts                    # registry: ext → plugin + HTTP_SCAN_GLOB
    ├── java.ts                     # tree-sitter-java: Spring + RestTemplate/WebClient/OkHttp
    ├── go.ts                       # tree-sitter-go: gin/echo/HandleFunc + http/resty consumers
    ├── python.ts                   # tree-sitter-python: FastAPI + requests
    ├── php.ts                      # tree-sitter-php: Laravel Route::get/...
    └── node.ts                     # tree-sitter-javascript + tree-sitter-typescript:
                                    #   NestJS controllers, Express, fetch, axios
```

**Shared scanner (`tree-sitter-scanner.ts`)** — generalised from phase 1:
- `ScanMatch<TMeta>.captures` is now a full `CaptureMap` (every named
  capture the query binds, not just a single `@value`). Topic extractor
  updated to read `match.captures.value` accordingly.
- New `runCompiledPatterns(plugin, tree)` helper lets plugins run
  multiple query bundles against the same pre-parsed tree. This is
  needed for HTTP plugins that combine a class-prefix query with a
  method-route query (Spring, NestJS).
- `scanFile` becomes a thin wrapper over `parser.parse + runCompiledPatterns`.

**HTTP plugin shape** — unlike topic plugins, HTTP plugins expose a
`scan(tree)` function rather than a flat pattern list. This reflects
HTTP's more complex extraction: each detection needs method + path +
handler name, and framework patterns like Spring `@RequestMapping` /
NestJS `@Controller` require cross-referencing a class-level prefix
with method-level annotations. Plugins internally use
`compilePatterns` + `runCompiledPatterns` and walk the AST to resolve
the class/method relationships.

**Per-framework coverage:**

- **Java (`java.ts`)**
  - Spring: `@RequestMapping("/api/v2")` class prefix + `@(Get|Post|Put|
    Delete|Patch)Mapping("/sub")` method routes, joined via the
    enclosing `class_declaration` node id.
  - `RestTemplate.getForObject/postForEntity/put/delete/patchForObject` →
    method derived from API name.
  - `WebClient.method(HttpMethod.X, "/path")` → method from
    `HttpMethod.X` capture.
  - `new Request.Builder().url("/path")` → OkHttp consumer.

- **Go (`go.ts`)**
  - gin / echo / chi frameworks: `\w+.GET("/path", handler)` captures
    upper-case verb + handler identifier.
  - `net/http.HandleFunc("/path", handler)` → provider (default GET).
  - `http.Get/Post/Head` consumer, `http.NewRequest("METHOD", ...)`,
    resty `client.R().Get/Post/...`.

- **Python (`python.ts`)**
  - `@app.get("/path")` FastAPI decorators.
  - `requests.get/post/...` and `requests.request("METHOD", "url")`.

- **PHP (`php.ts`)**
  - Laravel `Route::get/post/.../patch('/path', ...)` via
    `scoped_call_expression`. Uses `PHP.php_only` to match the
    existing ingestion pipeline's grammar selection.

- **Node (`node.ts`) — JS + TS + TSX**
  - Pattern sources defined once, compiled against three grammar
    variants (`JavaScript`, `TypeScript.typescript`, `TypeScript.tsx`)
    because `Parser.Query` objects are not portable across grammars.
    Exports three plugins sharing the same `scan` logic.
  - NestJS: `@Controller('prefix')` decorators are siblings of the
    class in `export_statement` / `program`; `@Get(':id')` decorators
    are siblings of the method in `class_body`. The plugin walks
    decorator → next named sibling to find the decorated class /
    method, then combines the class prefix with the method path.
    Only emits NestJS detections when the enclosing class has a real
    `@Controller` decorator — prevents false positives from generic
    classes that happen to use `@Get` from another library.
  - Express: `(router|app).<verb>('/path', ...)`.
  - `fetch(url)` (default GET) + `fetch(url, { method: 'X' })`
    (uses two queries + a SyntaxNode-id dedupe set so URL literals
    aren't double-emitted by the options variant).
  - `axios.get/post/...`.

## Orchestrator changes

`http-route-extractor.ts` drops every `scanXxxProviders` / `scanXxxConsumers`
regex method and replaces them with a single source-scan loop that
delegates to `getPluginForFile(rel).scan(tree)`. The orchestrator
still owns:

- **Path normalization** (`normalizeHttpPath`, `normalizeConsumerPath`)
  — language-agnostic string processing shared by both strategies.
- **Graph-assisted Strategy A** (`HANDLES_ROUTE` / `FETCHES` / `CONTAINS`
  Cypher queries) — unchanged in spirit. The only regex helpers it
  used (`inferMethodFromFileScan`, `pickJavaHandlerName`) are now
  replaced by a lookup against the plugin's detections for the same
  file: for each route row, find the detection whose normalized path
  matches, and pull the HTTP method + handler name from it.
- **Per-file parse cache** — the orchestrator parses each relevant
  file at most once per `extract()` call. Both the graph-assisted
  enrichment loop and the source-scan fallback share the same
  `cachedDetections` map, so we never run the plugin twice for the
  same file.

## Why this is better than the regex version

1. **Comments and strings for free.** The old regex would match
   `// router.get('/fake')` as a real Express route; tree-sitter
   never visits string/comment nodes.
2. **Structural controller-prefix.** Spring and NestJS class-prefix
   joining is now scoped to the enclosing class via `class_declaration`
   node ids, eliminating file-wide state that broke when a file had
   multiple controllers.
3. **Precise NestJS disambiguation.** The plugin only emits a NestJS
   detection when the enclosing class has a real `@Controller`
   decorator — the old regex would fire on any `@Get(...)` in the
   file regardless of surrounding context.
4. **Language-agnostic extension.** Adding Ruby / Rust / Kotlin HTTP
   detection later means dropping one file in `http-patterns/` — no
   changes to the shared scanner, the orchestrator, or the Strategy A
   Cypher queries.

## Tests

- `http-route-extractor.test.ts` — **18/18 pass** (tests unchanged;
  they're contract-style input/output tests and the contract shape is
  unchanged). Covers Spring class prefix, Express, gin/echo, stdlib
  HandleFunc, NestJS, Laravel, FastAPI for providers and
  fetch/axios/python-requests/rest-template/webClient/okhttp/go-stdlib/
  resty for consumers, plus graph-first Strategy A for both.
- `topic-extractor.test.ts` — **30/30 pass** after the `captures.value`
  API migration.
- `grpc-extractor.test.ts` — 43/43 pass (untouched; phase 3).
- `manifest-extractor.test.ts` — 8/8 pass (untouched).
- `service.test.ts`, `sync.test.ts`, `storage.test.ts` — 41/41 pass.
- `npx tsc -p tsconfig.json --noEmit` clean.

## Scope discipline (per GUARDRAILS.md)

- Only files under `src/core/group/extractors/` are touched.
- No changes to pipeline.ts, MCP surface, ingestion, or tests.
- No CI / release / security / secrets changes.
- Tree-sitter grammars imported by plugins (`tree-sitter-java`,
  `tree-sitter-go`, `tree-sitter-python`, `tree-sitter-php`,
  `tree-sitter-javascript`, `tree-sitter-typescript`) are all already
  in `package.json` for the existing ingestion pipeline.

## Phase 3 plan

- **grpc-extractor** gets the same treatment: plugin-per-language under
  `grpc-patterns/` for Java / Go / Python / TS detection. `.proto`
  files remain an open question — no `tree-sitter-proto` grammar is
  installed, so the in-tree string-sanitizing parser from PR #796's
  self-review stays as a pragmatic exception unless the maintainer
  wants us to add `tree-sitter-proto` as a new dep.

Co-authored-by: Claude <noreply@anthropic.com>

* refactor(group): migrate grpc-extractor source scans to tree-sitter plugins

Phase 3 (final) of the extractor refactor requested by @magyargergo on
#796. Same architecture as phase 1 (topic) and phase 2 (http): thin
language-agnostic orchestrator + per-language plugins that own
tree-sitter grammars and query sources. With this commit the top-level
extractors under `src/core/group/extractors/` import ZERO tree-sitter
grammars and ZERO query strings — every grammar import lives in a
`*-patterns/<lang>.ts` plugin file, and the orchestrators go through
the registry indirection.

## Architecture

```
src/core/group/extractors/
├── tree-sitter-scanner.ts         # shared primitives (unchanged)
├── grpc-extractor.ts               # orchestrator (only `.proto` parser left)
└── grpc-patterns/
    ├── types.ts                    # GrpcDetection, GrpcLanguagePlugin, GrpcRole
    ├── index.ts                    # registry: ext → plugin + GRPC_SCAN_GLOB
    ├── go.ts                       # tree-sitter-go: RegisterXxxServer, Unimplemented, NewXxxClient
    ├── java.ts                     # tree-sitter-java: @GrpcService + XxxImplBase + newBlockingStub
    ├── python.ts                   # tree-sitter-python: add_XxxServicer_to_server + XxxStub
    └── node.ts                     # tree-sitter-javascript + tree-sitter-typescript:
                                    #   @GrpcMethod, @GrpcClient field type,
                                    #   .getService<X>('Svc'), new XxxServiceClient,
                                    #   loadPackageDefinition dynamic constructors
```

## Per-language coverage

**Go (`go.ts`)**
- Provider: `\w+.RegisterXxxServer(...)` via `call_expression →
  selector_expression → field_identifier` + JS regex filter
  `^Register(\w+)Server$`.
- Provider: `pb.UnimplementedXxxServer` embedded in a struct via
  `struct_type → field_declaration_list → field_declaration →
  qualified_type → type_identifier` + JS filter.
- Consumer: `\w+.NewXxxClient(...)` via the same call_expression
  query + JS filter `^New(\w+)Client$`.

**Java (`java.ts`)**
- Provider: `class X extends YyyGrpc.YyyImplBase` — two queries
  handle the scoped and plain forms. `scoped_type_identifier`'s
  children are positional (no `scope:`/`name:` fields), so the
  query matches the two `type_identifier` children by position.
- `#match? @inner "ImplBase$"` restricts matches at query time.
- Whether the class has `@GrpcService` or not controls only the
  `source` metadata label — the plugin walks the class_declaration's
  `modifiers` child in JS to detect the marker_annotation.
- Consumer: `YyyGrpc.newStub(ch)` / `newBlockingStub(ch)` via a
  `method_invocation` query with `#match? @method
  "^new(Blocking)?Stub$"`, service name extracted via
  `^(\w+)Grpc$` on the object identifier.

**Python (`python.ts`)**
- Single call-expression query covers both bare identifier and
  `obj.method` attribute forms:
  `(call function: [(identifier) @fn (attribute attribute: (identifier) @fn)])`.
- Plugin filters `@fn.text` against two JS regexes:
  `^add_(\w+)Servicer_to_server$` (provider) and `^(\w+)Stub$`
  (consumer), with a reserved-names ignore list for the Stub case
  (Mock / Test / Fake / Stub).

**Node — JavaScript + TypeScript + TSX (`node.ts`)**
- Pattern sources defined once, compiled three times (one per grammar)
  because `Parser.Query` objects are not portable across grammars.
  Exports three `GrpcLanguagePlugin`s sharing the same `scan`.
- `@GrpcMethod('Service', 'Method')`: decorator query captures the
  two string literals. Confidence is hard-coded 0.8 regardless of
  proto map resolution (matches the original regex version's
  behaviour).
- `@GrpcClient(...) field: XxxServiceClient`: decorator query
  captures the decorator node, plugin walks up to find the enclosing
  `public_field_definition` (decorators on fields are CHILDREN of
  the field definition in tree-sitter-typescript, not siblings) and
  reads its first `type_annotation → type_identifier`, then runs the
  `^(\w+Service)Client$` JS filter.
- `client.getService<X>('AuthService')`: call-expression query on
  `member_expression.property = "getService"` + string literal arg.
- `new XxxServiceClient(...)`: `new_expression` with a bare
  identifier constructor, filtered by `^(\w+Service)Client$` so
  generic `new AuthClient(...)` (missing the `Service` infix) does
  NOT falsely register as a consumer. Preserves the regression test
  `test_extract_ts_non_service_client_constructor_is_ignored`.
- `loadPackageDefinition` dynamic loader: gated on
  `tree.rootNode.text.includes('loadPackageDefinition')`. When set,
  `new foo.bar.Xxx(...)` qualified constructors with a capitalised
  property name register as consumers.

## Orchestrator changes

`grpc-extractor.ts` loses every `scanGoProviders` / `scanJavaProviders`
/ ... helper and replaces them with a single source-scan loop that:

1. Parses each file with the plugin's grammar (one shared `Parser`
   instance across all files, `setLanguage` called per plugin).
2. Calls `plugin.scan(tree)` to get `GrpcDetection[]`.
3. Converts each detection to an `ExtractedContract` via the private
   `detectionToContract` helper, which:
   - Looks the short service name up in the proto map (filled by
     the `.proto` parser).
   - Picks confidence = `confidenceWithProto` if resolved, else
     `confidenceWithoutProto`.
   - Builds a method-level contract id (`grpc::pkg.Svc/Method`) when
     the detection carries a `methodName` (TS `@GrpcMethod` only),
     otherwise a service-level id (`grpc::pkg.Svc/*`).

Everything else — the `.proto` parser, `buildProtoContext`,
`buildProtoMap`, `resolveProtoConflict`, `serviceContractId`,
`stripProtoCommentsAndStrings`, `extractServiceBlocks`, the dedupe
function — stays exactly as before. The `.proto` parser is kept as a
pragmatic exception to the "no regex in extractors" rule because no
`tree-sitter-proto` grammar is installed in the repo; a comment at the
top of the file explains this and flags the maintainer option of
adding `tree-sitter-proto` as a dependency.

## Why this is better than the regex version

1. **Comments and strings are respected for free.** Matched node types
   are only code constructs, never text inside comments or string
   literals.
2. **No false positives on partial names.** The old `(\w+?)Grpc`-style
   regexes would cross-match unrelated identifiers; structural queries
   restrict matches to the exact AST shape (`scoped_type_identifier →
   type_identifier` pairs, `method_invocation → identifier` etc.).
3. **NestJS `@GrpcClient` is structural, not regex-based.** The old
   regex required a specific textual layout
   (`@GrpcClient(...) private readonly foo!: XxxServiceClient`); the
   plugin now walks the AST, so modifier order / optional modifiers /
   multi-line formatting don't break it.
4. **Language-agnostic extension.** Adding Kotlin / Rust / C# gRPC
   detection later is a one-file edit in `grpc-patterns/index.ts` —
   no touches to the shared scanner, the orchestrator, or the proto
   parser.

## Tests

- `grpc-extractor.test.ts` — **43/43 pass** (tests unchanged; the
  contract shape is identical). Covers .proto parsing (including the
  brace-inside-string regression), Go provider/consumer,
  Java @GrpcService / plain ImplBase provider + newBlockingStub
  consumer, Python servicer + stub, TS @GrpcMethod + @GrpcClient +
  .getService + new XxxServiceClient + loadPackageDefinition + the
  `AuthClient` vs `AuthServiceClient` discrimination, dedupe across
  multiple patterns in one file, proto-aware confidence, and the
  inherited-package resolution for split proto definitions.
- `topic-extractor.test.ts` — 30/30 pass.
- `http-route-extractor.test.ts` — 18/18 pass.
- `manifest-extractor.test.ts` — 8/8 pass.
- `service.test.ts`, `sync.test.ts`, `storage.test.ts` — 41/41 pass.
- `npx tsc -p tsconfig.json --noEmit` clean.

## Scope discipline (per GUARDRAILS.md)

- Only files under `src/core/group/extractors/` are touched.
- No pipeline.ts, MCP surface, ingestion, CI / release / security, or
  test changes.
- New tree-sitter grammar imports (`tree-sitter-go`, `tree-sitter-java`,
  `tree-sitter-python`, `tree-sitter-javascript`, `tree-sitter-typescript`)
  are all already installed for the ingestion pipeline.

## End of phase series

This commit completes the three-phase extractor refactor:
  - **Phase 1** (`ea06d11`): topic-extractor → `topic-patterns/`
  - **Phase 2** (`b6015f6`): http-route-extractor → `http-patterns/`
  - **Phase 3** (this commit): grpc-extractor → `grpc-patterns/`

Every remaining regex-based extractor helper under the `src/core/group/
extractors/` directory is either (a) language-agnostic string
processing (path normalization, dedupe keys) or (b) the `.proto`
parser, which is documented as an explicit exception.

Co-authored-by: Claude <noreply@anthropic.com>

* feat(group): add tree-sitter-proto for .proto file parsing

Addresses @magyargergo's suggestion on #796 to replace the manual
string-sanitizing .proto parser with a tree-sitter grammar.

- **Vendored `tree-sitter-proto`** in `vendor/tree-sitter-proto/`.
  Grammar source from [coder3101/tree-sitter-proto](https://github.com/coder3101/tree-sitter-proto)
  (latest `grammar.js`), parser.c regenerated with `tree-sitter-cli
  0.24` to produce ABI version 14 — compatible with the project's
  `tree-sitter 0.25` runtime (which supports ABI ≤ 14). Added as
  `optionalDependency` with `file:./vendor/tree-sitter-proto`.

- **New `grpc-patterns/proto.ts` plugin** — uses the same
  `compilePatterns` + `runCompiledPatterns` infrastructure as every
  other plugin. Two queries:
  - `(package (full_ident) @pkg)` — package declaration
  - `(service (service_name) @service_name (rpc (rpc_name) @rpc_name))`
    — one match per (service, rpc) pair

- **Graceful fallback** — `tree-sitter-proto` is an optional
  dependency. If it fails to install (platform incompatibility) or
  fails the runtime smoke-test (`setLanguage` + `parse` on a trivial
  proto), `PROTO_GRPC_PLUGIN` stays `null` and the orchestrator
  uses the existing manual parser. The smoke-test catches the
  `SyntaxNode` TDZ error that occurs in vitest's fork-based test
  runner.

- **Orchestrator updated** — when `hasProtoPlugin` is true, `.proto`
  files are handled by the plugin loop (they're included in
  `GRPC_SCAN_GLOB`), and the manual `parseProtoFile` loop is
  skipped. `buildProtoContext` still runs to build the proto map
  for cross-referencing source-file detections.

1. **No manual comment/string stripping.** The old parser needed
   `stripProtoCommentsAndStrings` (110 lines) to avoid counting
   braces inside comments and string literals. tree-sitter handles
   this natively.
2. **No brace-depth tracking.** `extractServiceBlocks` used a manual
   depth counter to find service boundaries. tree-sitter's AST gives
   us `service` → `service_name` + `rpc` → `rpc_name` directly.
3. **Performance.** tree-sitter's C-based parser is faster than
   character-by-character JS scanning + regex on large proto files.

- `grpc-extractor.test.ts` — **43/43 pass** (unchanged)
- All other extractor tests — 99/99 pass
- `npx tsc -p tsconfig.json --noEmit` clean

Co-authored-by: Claude <noreply@anthropic.com>

* chore: add .gitignore for vendored tree-sitter-proto build artifacts

https://claude.ai/code/session_01SFUCxgKMMQ8EgRHYw91xPU

* fix: correct .gitignore paths for vendored tree-sitter-proto

Patterns should be relative to the .gitignore file's directory.

https://claude.ai/code/session_01SFUCxgKMMQ8EgRHYw91xPU

* refactor(group): address Copilot review feedback on #796

Six fixes suggested by the Copilot AI review:

1. **`normalizeHttpPath` root-path edge case** — stripping trailing
   slashes on the input `/` produced an empty string, yielding
   malformed contract ids like `http::GET::`. Now preserves `/` for
   the root handler/fetch case.

2. **Dedupe `scanFiles` call** — `extract()` was globbing the
   source-scan file list twice (once for the provider fallback, once
   for the consumer fallback). Moved to a single lazy call that
   memoizes the result for the rest of the method.

3. **HTTP `scanFiles` now ignores `**/vendor/**`** — every other
   extractor's glob already ignored vendored sources; the HTTP one
   didn't. Fixed for consistency.

4. **`loadPackageDefinition` check is now structural** — was calling
   `tree.rootNode.text.includes('loadPackageDefinition')` which forces
   materialization of the entire file text from the parse tree
   (expensive on large files). Replaced with a dedicated compiled
   query on `(call_expression function: [(identifier) | (member_expression)])`
   so the check stays in the AST domain.

5. **`grpc-extractor.ts` header docstring updated** — still claimed
   ".proto parsing is not tree-sitter-based because no grammar is
   installed". Now describes the actual behaviour: tree-sitter when
   `tree-sitter-proto` is available (optionalDependency), manual
   fallback otherwise.

6. **Eliminated the double proto file parse on the fallback path** —
   `buildProtoContext` already globs + parses every `.proto` file to
   build `servicesByName`. On the `!hasProtoPlugin` branch the
   extractor was globbing + parsing again via the now-removed
   `parseProtoFile` helper. The fallback branch now iterates the map
   that `buildProtoContext` already produced to emit provider
   contracts directly — single pass per proto file.

## Tests

- `topic-extractor.test.ts` — 30/30 pass
- `http-route-extractor.test.ts` — 18/18 pass
- `grpc-extractor.test.ts` — 43/43 pass
- `manifest-extractor.test.ts` — 8/8 pass
- `npx tsc -p tsconfig.json --noEmit` clean

Co-authored-by: Claude <noreply@anthropic.com>

* refactor(group): address Claude review feedback (bugs + dedup + hygiene) on #796

Follows up `2f28bfc` with the remaining items from the Claude AI review:

## Bugs

**Bug 2 — Label-unaware Cypher queries in `resolveSymbol`.**
The manifest-extractor's lookup queries were `MATCH (n) WHERE n.name = $x`
with no label filter, so a topic/service/package name could silently match
any node type (File, Variable, Import, Folder, …). Added label filters:
- `topic` → `(n:Function|Method|Class|Interface)` (topics are best-effort
  symbol-name matches against listener/publisher symbols)
- `grpc` method → `(n:Function|Method)`
- `grpc` service → `(n:Class|Interface)`
- `lib` → `(n:Package|Module)`

All 8 manifest-extractor tests still pass (mock executor is
label-agnostic, but the production LadybugDB graph now gets correctly
scoped queries).

**Bug 8 — Tautological `!handlerName` condition.**
`http-route-extractor.ts:extractProvidersGraph` had
`let handlerName = null; if (!method || !handlerName) { ... }` — the
`!handlerName` clause was always true since there was no intervening
assignment. Simplified to always run the plugin-scan lookup (we need
the handler name even when `methodFromRouteReason` already resolved
the method).

## Clean code / dedup

**Design 7 — `readSafe` was copy-pasted in all three orchestrators.**
Extracted to `extractors/fs-utils.ts` as the single source of truth
for the path-traversal guard. Dropped the three local copies and the
now-unused `fs`/`path` imports from topic-extractor.

**Style 10 — Language-specific `_test.go` skip in the topic orchestrator.**
Was `if (rel.endsWith('_test.go')) continue;` inside the language-
agnostic extraction loop. Pushed into the glob's ignore list
(`'**/*_test.go'`) alongside the existing `node_modules`, `vendor`,
`dist`, `build` entries, with a comment explaining that other
languages' test file conventions either live in separate directories
(Python `tests/`, Java `src/test/`) or are already covered by the
existing ignores.

## Already addressed in `2f28bfc` (mentioned again in Claude review)

- Bug 3: `normalizeHttpPath('/')` returns `''` — fixed
- Bug 4: double glob + double parse of `.proto` — fixed
- Bug 5: `scanFiles` called twice in HTTP — fixed
- Bug 6: missing `**/vendor/**` in HTTP glob — fixed
- Design 9 partially: `tree.rootNode.text.includes('loadPackageDefinition')`
  replaced with a dedicated structural query

## Deferred

- Bug 1 (`http::*::path` vs `http::GET::path` matching) — out of scope;
  sync.ts matching logic lands in #793, manifest extractor already
  emits correct synthetic uids for unresolved HTTP contracts.
- Design 9 full (change plugin `scan(tree)` → `scan(tree, source)`) —
  the only real use case (`loadPackageDefinition` gate) is already
  fixed via a structural query, so the interface change would be
  cosmetic churn without a concrete consumer.

## Tests

- `topic-extractor.test.ts` — 30/30 pass
- `http-route-extractor.test.ts` — 18/18 pass
- `grpc-extractor.test.ts` — 43/43 pass
- `manifest-extractor.test.ts` — 8/8 pass
- `npx tsc -p tsconfig.json --noEmit` clean

Co-authored-by: Claude <noreply@anthropic.com>

* docs+fix(group): address remaining Claude review items + add pipeline flow chart

## Fixes

**Remaining 🔴 — HTTP contract id wildcard format.** Documented the
`http::*::<path>` format as an intentional wildcard for manifest links
that omit the HTTP method, alongside the explicit-method form
(`GET::/path` → `http::GET::/path`). The docblock on `buildContractId`
now states both forms, notes that wildcard-aware matching is the
responsibility of the sync / cross-impact layer (#793), and
recommends the explicit-method form whenever the author knows the
method (it round-trips through exact equality without needing
wildcard logic downstream). Tests unchanged — the wildcard format is
what they've always asserted.

**Minor 1 — stale comment at `manifest-extractor.ts:124-126`.** The
comment claimed "creates a contract with an empty symbolUid/ref" but
the code switched to `manifestSymbolUid(repo, contractId)` a few
commits back. Updated to describe the actual synthetic-uid fallback
semantics and the cross-impact path that relies on both sides of the
join deriving the same uid.

**Minor 2 — exhaustiveness guard on `buildContractId`.** The
`switch(type)` covered all five current `ContractType` variants but
silently returned `undefined` if a new variant was added. Added a
`default: const _exhaustive: never = type; throw new Error(...)`
clause so the build fails loudly on an unhandled variant.

**Minor 3 — `tree.rootNode.text` in `grpc-patterns/node.ts`.** Already
fixed in `2f28bfc` via a dedicated structural query
(`LOAD_PACKAGE_DEFINITION_SPEC`). No action needed.

## New: pipeline flow chart (per @magyargergo's request)

Added `src/core/group/PIPELINE.md` with four Mermaid diagrams:
1. **High-level overview** — `group.yaml` → extractors + manifest →
   contract matching → `bridge.lbug` → `runGroupImpact`.
2. **Per-repo extractor two-strategy shape** — graph-assisted
   Strategy A vs. source-scan Strategy B.
3. **Plugin architecture** — orchestrator → registry →
   per-language `*-patterns/<lang>.ts` → `tree-sitter-scanner.ts` →
   `ExtractedContract`.
4. **Manifest extraction** — label-scoped `resolveSymbol` with the
   synthetic-uid fallback.
5. **Cross-impact query (#606)** — local impact → bridge join →
   cross-repo fan-out.

Each diagram is annotated with which PRs own which stage (this PR:
extractors + manifest; #795: bridge storage; #606: cross-impact
runtime) and points at the concrete files/functions involved.

## Tests

- 99/99 extractor tests pass
- `npx tsc -p tsconfig.json --noEmit` clean

Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
github714801013 pushed a commit to github714801013/GitNexus that referenced this pull request Apr 28, 2026
…abhigyanpatwari#606 split) (abhigyanpatwari#795)

* feat(group): bridge.lbug storage + contract matching expansion

Part 1 of 4 in the split of abhigyanpatwari#606 (ticket: abhigyanpatwari#791, closes abhigyanpatwari#790 with a
revised plan per @magyargergo's request).

## What changed

Adds the LadybugDB-backed bridge storage infrastructure and extends
the contract matching algorithm with wildcard support. All changes are
additive: storage.ts, sync.ts, service.ts, cli/group.ts, mcp/tools.ts
are left on their upstream main versions and will migrate to the new
bridge in follow-up PRs (abhigyanpatwari#792, abhigyanpatwari#793, abhigyanpatwari#794).

### Files

**New (844 LOC prod):**
- `gitnexus/src/core/group/bridge-db.ts` — atomic write-to-temp with
  `retryRename` for Windows EBUSY/EPERM, per-item write tolerance via
  `WriteBridgeReport`, `findContractNode` with three-tier symbol
  lookup (uid → filePath+name → filePath)
- `gitnexus/src/core/group/bridge-schema.ts` — schema DDL
- `gitnexus/src/core/group/normalization.ts` — contract ID
  canonicalization + `dedupeContracts` / `dedupeCrossLinks` helpers
  used by both matching and bridge write

**Modified (+137 LOC prod):**
- `gitnexus/src/core/group/matching.ts` — adds `runWildcardMatch` for
  `grpc::Service/*` wildcard consumers, `buildProviderIndex` helper,
  and canonical gRPC ID handling in `normalizeContractId`
- `gitnexus/src/core/group/types.ts` — `MatchType` gains `'wildcard'`;
  new `BridgeHandle` and `BridgeMeta` interfaces

**New tests (658 LOC):**
- `gitnexus/test/unit/group/bridge-db.test.ts` — core write/read round
  trip, `WriteBridgeReport` shape, dropped-links counter, retryRename
  behavior on EBUSY/ENOENT/EPERM/EACCES
- `gitnexus/test/unit/group/bridge-db-edge.test.ts` — edge cases
  (malformed meta, missing contract nodes, concurrent access)

**Modified tests (+225 LOC):**
- `gitnexus/test/unit/group/matching.test.ts` — wildcard consumer
  matching, gRPC canonical ID handling, same-service guard

### Self-review fixes folded in

Carried forward from the original abhigyanpatwari#606 self-review:
- `writeBridge` try/finally handle lifecycle + `handleClosed` sentinel
- `openBridgeDbReadOnly` partial-handle cleanup
- `writeBridgeMeta` uses `retryRename` for Windows consistency
- `retryRename` unit tests (was zero coverage)
- Per-item try/catch around every CREATE loop so one malformed contract
  doesn't abort the whole write
- Dropped cross-link counter (`linksDroppedMissingNode`)

### Why now

magyargergo asked for the abhigyanpatwari#606 PR to be split so we can iterate with
confidence (abhigyanpatwari#606 (comment)).
This is the foundational layer — pure infra, no user-facing surface,
no callers of the new APIs in this PR. Later PRs wire it in.

### How to verify

- `cd gitnexus && npx tsc --noEmit`
- `cd gitnexus && npx vitest run test/unit/group/bridge-db.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/bridge-db-edge.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/matching.test.ts --pool=forks`
- Pre-commit hook runs clean

### Risk / rollback

**Low.** All new code sits under `src/core/group/` in new files plus a
minimal `+16/-1` diff to `types.ts` and a `+136/-0` diff to `matching.ts`
(both purely additive). No existing callers reference the new APIs
(bridge-db, openBridgeOrFallback, runWildcardMatch) — the PRs that wire
them in come later in the split chain. Rollback = `git revert` of the
merge commit; no state introduced, no schema migration triggered.

### Scope discipline (per GUARDRAILS.md)

- Only the 8 files listed above are touched; no drive-by refactors
- No CI/release/security config changes
- No secrets, tokens, or machine-specific paths
- Content is lifted from the abhigyanpatwari#606 branch which already passed CI 11/11
  green on `d15b8cb` (before the split)

### Dependencies

- **Base:** `main` (no dependencies on other split PRs)
- **Blocks:** extractor expansion (abhigyanpatwari#792), sync pipeline (abhigyanpatwari#793),
  cross-impact feature (abhigyanpatwari#794)
- **Related ticket:** abhigyanpatwari#791

Co-authored-by: Claude <noreply@anthropic.com>

* fix(group): address @claude review on abhigyanpatwari#795

Addresses the findings from the automated review on PR abhigyanpatwari#795
(abhigyanpatwari#795 (comment)
— posted by @magyargergo / claude-code Action run).

### Medium severity (reviewer flagged as blockers)

- **bridge-db.ts `openBridgeDbReadOnly` bak recovery** — the `.bak`
  recovery path used bare `fsp.rename(bakPath, dbPath)`, which is
  exactly the scenario most likely to hit Windows EBUSY/EPERM (an
  interrupted writer still holding the handle for a few ms). Switched
  to `retryRename` for consistency with the rest of the file's
  Windows-safe rename path.
- **bridge-db.ts `ensureBridgeSchema` error detection** — the inline
  `msg.includes('already exists')` substring match has been lifted
  into a named constant `LBUG_ALREADY_EXISTS_MSG` with a comment
  documenting the coupling to LadybugDB's error message wording and
  why we can't use `IF NOT EXISTS` (LadybugDB DDL doesn't support it)
  or typed errors (LadybugDB's JS driver doesn't expose error codes).
  Also tightened the `catch (err: any)` to `catch (err: unknown)`.
- **bridge-db.ts `findContractNode` — extracted out of writeBridge**
  — the 35-line async closure living inside `writeBridge` has been
  lifted to three module-level functions: `createContractLookupIndex`,
  `indexContract`, and `findContractNode`. `findContractNode` is now
  a pure synchronous function taking a prebuilt index instead of
  doing its own DB queries. The `writeBridge` cross-link loop is now
  ~25 lines instead of ~100.
- **bridge-db.ts `findContractNode` — N+1 query elimination** — the
  old inner-closure version issued up to 6 DB round-trips per
  cross-link (2 endpoints × up to 3 tiers of fallback queries). For a
  group with 1000 cross-links, that's up to 6000 DB queries just to
  resolve endpoints. The new version consults an in-memory
  `ContractLookupIndex` built incrementally as contracts are inserted
  (`indexContract` called AFTER each successful insert so failed
  inserts don't poison the index). Cross-link resolution is now
  O(1) per link instead of O(3) DB queries per link, with zero DB
  round-trips during the cross-link loop.

### Minor severity

- **bridge-db.ts `queryBridge` empty-array guard** — if LadybugDB
  ever returns an empty `QueryResult[]` at the top level (shouldn't
  happen with single-statement calls, but driver contract isn't
  explicit), the old code would call `.getAll()` on `undefined` and
  crash with a confusing stack. Added an `unwrapQueryResult` helper
  that throws an explicit `'empty QueryResult array'` error instead,
  making a potential driver regression visible immediately.
- **normalization.ts `contractRichness` weights** — added a
  block-level comment documenting the weight ordering (+3 for
  symbolUid, +2 for each symbol-identifying field, +1 for service
  tag or non-manifest origin) and explicitly noting that the
  absolute numbers don't matter, only the relative ordering. Matches
  the "comment for contributors" suggestion in the review.
- **bridge-schema.ts `BRIDGE_SCHEMA_VERSION` migration comment** —
  added a 4-point contract explaining what bumping the constant
  means ("discard and re-sync" strategy for V1, no in-place
  migration yet, new migration logic should live in a separate
  `bridge-migrations.ts` module when it becomes necessary).
- **test/unit/group/fixtures.ts** — extracted the `makeContract`
  helper previously copy-pasted between `bridge-db.test.ts` and
  `bridge-db-edge.test.ts` into a shared fixtures module. Both test
  files now import from `./fixtures.js`. Kept the scope minimal:
  fixtures is NOT a general-purpose factory module, just the shared
  baseline contract builder.

### New tests

Added 9 pure-function unit tests for the now-extracted
`findContractNode` in `bridge-db.test.ts`:
  - returns null on empty index
  - tier 1 (symbolUid) match, including repo-scope and role-scope
    isolation
  - tier 2 (filePath + symbolName) fallback when symbolUid is empty
    or mismatches
  - tier 3 (filePath only) when exactly one contract lives in the
    file, and refusal when multiple do
  - priority ordering when multiple tiers could resolve

These are fully isolated — no DB, no temp directories, no native
LadybugDB binding — so they run in &lt;10ms total and are
immediately trustworthy as a regression safety net.

### Deliberately deferred (reviewer marked as "fine for now")

- `BridgeHandle._db` / `._conn` typing to `unknown` with casts in
  `bridge-db.ts` — reviewer's note: "The typing is fine for now."
- Batch inserts via `UNWIND` — needs LadybugDB support confirmation,
  tracked as a follow-up; the per-item pattern remains.
- `queryBridge` prepared-statement lifecycle — the current pattern
  (prepare → execute → GC) relies on LadybugDB's internals, worth
  verifying against their docs in a separate audit.

### Scope discipline (per `GUARDRAILS.md`)

- Only files touched by this PR (`bridge-db.ts`, `bridge-schema.ts`,
  `normalization.ts`, both bridge test files, new `fixtures.ts`) —
  no drive-by refactors
- No CI/release/security config changes
- No secrets

### Test + typecheck status

- `npx tsc --noEmit` clean
- `bridge-db.test.ts`: added 9 `findContractNode` tests, all pass in
  isolation. The full-file run still hits the pre-existing native
  LadybugDB cleanup segfault that flakes the reported count — same
  as every prior commit on this branch, not a regression.
- `bridge-db-edge.test.ts`: 4/4 pass
- `matching.test.ts`: 28/28 pass
- `types.test.ts`: 5/5 pass
- `retryRename` tests (4/4) and `findContractNode` tests (9/9)
  verified in isolation via `-t` filter

Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
github714801013 pushed a commit to github714801013/GitNexus that referenced this pull request Apr 28, 2026
…npatwari#606 split) (abhigyanpatwari#796)

* feat(group): extractor expansion + manifest extractor

Part 2 of 4 in the split of abhigyanpatwari#606 (ticket: abhigyanpatwari#792). Follows abhigyanpatwari#795
(bridge.lbug storage foundation, already merged), but this PR has no
code-level dependency on abhigyanpatwari#795 — it only imports types and the
ContractExtractor interface that existed on upstream main before
either PR. It could have been reviewed in parallel with abhigyanpatwari#795.

## What changed

Expands the 3 existing contract extractors with substantially more
language/framework coverage, and adds a new `manifest-extractor`
that resolves `group.yaml`-declared cross-links against the per-repo
graph via exact-name lookups.

### New file (228 LOC)

- `gitnexus/src/core/group/extractors/manifest-extractor.ts` —
  exact graph lookup for `group.yaml`-declared cross-links. HTTP
  paths are canonicalized before Route.name matching; gRPC is
  resolved by service/method name (NO `.proto`-filename fallback);
  topic and lib use exact-name match. Falls back to a synthetic
  `manifest::<repo>::<contractId>` uid when the graph has no
  matching symbol, so cross-impact traversal still has a stable
  anchor for the contract.

### Modified extractors (+958 LOC prod)

- `extractors/grpc-extractor.ts` (+522) — `.proto` parser with
  comment and string-literal sanitization (braces inside strings no
  longer truncate service bodies); package/service/method canonical
  IDs; server/client detection across Go (`grpc.NewServer`,
  `RegisterXxxServer`, `XxxGrpc.XxxImplBase`), Java (`@GrpcService`,
  `BlockingStub`), Python (`servicer_to_server`, `XxxStub`), and
  TypeScript/Node (`@GrpcMethod`, `ClientGrpc`, `loadPackageDefinition`).
- `extractors/http-route-extractor.ts` (+174) — Go gin/echo/stdlib
  `HandleFunc`, NestJS `@Controller`+`@Get`/etc, Python FastAPI
  decorators, Java Spring `@RequestMapping`/`@GetMapping`,
  restTemplate / WebClient / OkHttp consumers.
- `extractors/topic-extractor.ts` (+98) — sarama `ProducerMessage{}`
  struct literal detection (replaces a constructor-anchored regex
  that missed topics inside producer loops), kafka-go Writer/Reader,
  Python NATS (`await nc.subscribe`/`await nc.publish`), JetStream
  helpers.

### Modified and new tests (+1264 LOC)

- `grpc-extractor.test.ts` (+539) — full coverage of the new proto
  parser (strings-with-braces regression, comments-with-braces
  regression), per-language server/client detection
- `http-route-extractor.test.ts` (+240) — per-framework route
  extraction + normalization edge cases
- `topic-extractor.test.ts` (+177) — the sarama in-loop regression,
  JetStream, Python NATS, kafka-go Writer/Reader
- `manifest-extractor.test.ts` (+308 NEW) — HTTP path normalization,
  gRPC exact lookup with proto-fallback regression, lib and topic
  exact matching, synthetic-uid fallback behavior

### Self-review fixes folded in

Carried forward from the abhigyanpatwari#606 self-review (commit `d15b8cb`):

- **HIGH abhigyanpatwari#1** — `manifest-extractor.resolveSymbol` was too fuzzy.
  Previously used `CONTAINS` on route/name fields plus an
  unconditional `filePath ENDS WITH '.proto'` fallback for gRPC.
  Consequences: `/orders` matched `/suborders`, and any repo with
  any `.proto` file returned a random proto symbol for a gRPC
  manifest entry. Replaced with exact equality + deterministic
  `ORDER BY` + synthetic-uid fallback for unresolved manifests.
  Regression tests included.
- **MED abhigyanpatwari#3** — gRPC proto parser brace-depth counting now sanitizes
  strings and comments first (`stripProtoCommentsAndStrings`). A
  valid proto with `option deprecated_reason = "use NewService {
  instead"` used to have its service body closed early by the `"{"`
  inside the literal, silently dropping methods after the offending
  string. Regression tests for both string-with-brace and
  comment-with-brace cases.
- **MED abhigyanpatwari#4** — sarama Kafka regex changed from
  `sarama.NewSyncProducer[\s\S]{0,300}?Topic:` (anchored on
  constructor, caught only first topic in a loop) to
  `sarama.ProducerMessage{...Topic:}` (matches every struct literal
  directly). Regression test with a for-loop that constructs
  multiple `ProducerMessage`s.
- **MED abhigyanpatwari#7** — `manifest-extractor.resolveSymbol` no longer has a
  silent `catch { /* fall through */ }`. Errors from the graph
  executor are logged via `console.warn` with link type, contract
  name, repo key, and error message before falling through to the
  synthetic-uid path.

## Why

Reviewer focus here is pure regex / parser correctness — no
storage, no Cypher queries, no algorithmic changes to the cross-link
algorithm. Separating this from the bridge foundation PR (abhigyanpatwari#795)
meant reviewers could stay in a single mental mode (parsing logic)
instead of context-switching between DDL, Cypher, and regex.

## How to verify

- `cd gitnexus && npx tsc --noEmit`
- `cd gitnexus && npx vitest run test/unit/group/grpc-extractor.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/http-route-extractor.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/topic-extractor.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/manifest-extractor.test.ts --pool=forks`

Local pre-push: typecheck clean, all 99 extractor unit tests pass
(grpc 43, http 18, topic 30, manifest 8).

## Risk / rollback

**Low.** Extractors have no user-facing surface in this PR — they
produce `ExtractedContract[]` that is consumed by `sync.ts` in the
next split (abhigyanpatwari#793). No existing behavior changes for users who don't
run a `group sync`. Rollback = `git revert` of the merge commit;
the modifications to `grpc-extractor.ts` / `http-route-extractor.ts`
/ `topic-extractor.ts` revert to the pre-PR versions that still
work (they're subsets of the new functionality).

## Scope discipline (per GUARDRAILS.md)

- Only the 8 files above are touched; no drive-by refactors
- No CI/release/security config changes
- No secrets or machine-specific paths
- Content lifted from abhigyanpatwari#606 (CI 11/11 green on `d15b8cb`)

## Dependencies

- **Base:** `main` (upstream already includes abhigyanpatwari#795 as `f6fb87f`)
- **Blocks:** sync pipeline (abhigyanpatwari#793) and the cross-impact feature (abhigyanpatwari#794)
- **Tracker issue:** abhigyanpatwari#792
- **Parent PR:** abhigyanpatwari#606

Co-authored-by: Claude <noreply@anthropic.com>

* refactor(group): migrate topic-extractor from regex to tree-sitter queries

Addresses @magyargergo's feedback on abhigyanpatwari#796 that regex-based lookups
should use tree-sitter nodes instead, and that the top-level
extractors must NOT carry language dependencies. This is phase 1 of
a multi-step migration — topic-extractor first because its patterns
are the most uniform (16 "call/annotation with first-arg string
literal" variants), which makes it a clean proof of the approach
before grpc-extractor and http-route-extractor get the same treatment.

## Architecture: language-agnostic orchestrator + per-language plugins

The top-level extractor is a thin orchestrator that never imports a
tree-sitter grammar or a query string. Per-language knowledge lives
in a new `topic-patterns/` folder with one file per language plus a
registry that maps file extensions to compiled plugins:

```
src/core/group/extractors/
├── tree-sitter-scanner.ts         # shared, language-agnostic scanning utilities
├── topic-extractor.ts              # thin orchestrator (no grammar imports)
└── topic-patterns/
    ├── types.ts                    # TopicMeta, Broker
    ├── index.ts                    # registry: extension → compiled provider
    ├── java.ts                     # tree-sitter-java + JAVA_TOPIC_PROVIDER
    ├── go.ts                       # tree-sitter-go + GO_TOPIC_PROVIDER
    ├── python.ts                   # tree-sitter-python + PYTHON_TOPIC_PROVIDER
    └── node.ts                     # tree-sitter-javascript + tree-sitter-typescript
                                    # → JAVASCRIPT_/TYPESCRIPT_/TSX_TOPIC_PROVIDER
```

**Shared scanner (`tree-sitter-scanner.ts`)** — defines
`PatternSpec<TMeta>`, `LanguagePatterns<TMeta>`, `CompiledPatterns<TMeta>`
and the `scanFile(parser, plugin, content)` helper. Plugins compile their
queries eagerly at module load via `compilePatterns()`, so a broken
pattern fails loudly at import time instead of silently at scan time.
`unquoteLiteral()` handles single/double/template quotes, Python
triple-quoted strings, and Go raw backtick strings.

**Per-language plugins** own:
- the tree-sitter grammar import (this is the ONLY place in
  `src/core/group/` where tree-sitter grammars are imported),
- the query S-expressions,
- the `TopicMeta` payload (role, broker, confidence, symbolName) that
  the orchestrator receives back on every match.

Each plugin uses a `@value` capture name to bind the topic literal node.
The JavaScript and TypeScript grammars share AST node names for every
construct we query, so `node.ts` defines the pattern sources once and
compiles them against `JavaScript`, `TypeScript.typescript`, and
`TypeScript.tsx` — exporting three providers because `Parser.Query`
objects are NOT portable across grammar instances.

**Registry (`topic-patterns/index.ts`)** — maps `.java` → Java provider,
`.go` → Go, `.py` → Python, `.js`/`.jsx` → JS, `.ts` → TS, `.tsx` → TSX.
Also exports `TOPIC_SCAN_GLOB` so adding a new language is a single
file-level edit (drop `topic-patterns/<lang>.ts`, import + register it
here — zero edits required in `topic-extractor.ts`).

**Orchestrator (`topic-extractor.ts`)** — ~110 lines, no grammar or
query imports. Per file: `getProviderForFile(rel)` → `scanFile(parser,
provider, content)` → `unquoteLiteral(valueText)` → `makeContract(...)`.
Reuses one `Parser` instance across files; the scanner calls
`setLanguage` per plugin.

## Why this is better than regex

1. **Comments and strings are respected for free.** The old regex
   would match `// kafkaTemplate.send("fake.topic")` as a real
   producer; tree-sitter never visits comments or string literals as
   code nodes, so false positives from commented-out code are
   eliminated.
2. **Struct/object literal patterns are structural, not textual.**
   `sarama.ProducerMessage{Topic: "..."}` no longer needs a 300-char
   lookahead (which was a known cross-match bug partly mitigated by a
   loop regression test in the self-review). The new query matches a
   specific `composite_literal` with a specific `qualified_type` and
   `keyed_element` — exactly one struct literal per match.
3. **No order-of-operations fragility.** Regex for
   `channel.publish` vs `channel.consume` was independent and
   file-wide; the AST scopes matches to the specific `call_expression`.
4. **Language-agnostic extension.** Adding Ruby, Rust, or C# topic
   detection later means dropping one file in `topic-patterns/` — no
   changes to shared scanner or orchestrator, and no tree-sitter
   imports leak into top-level code.

## Per-file fault tolerance

- Malformed files that tree-sitter can't parse are silently skipped
  (`parser.parse` is wrapped by `scanFile`). The ingestion pipeline
  already logs unparseable files at index time.
- A syntactically invalid query is caught at `compilePatterns` time,
  not scan time — broken plugins fail loudly at import.
- Per-pattern `matches()` failures are swallowed so one broken query
  in a plugin doesn't block the rest.

## Tests

All 30 existing `topic-extractor.test.ts` tests pass **without any
changes to the test file** — they were written as input/output contract
tests (given this source file, expect these `ExtractedContract` objects)
and that contract is unchanged. Regression coverage includes:

- Kafka: Java `@KafkaListener` + `kafkaTemplate.send`; Node
  `producer.send` + `consumer.subscribe`; Go sarama producer/consumer
  (sync and async); kafka-go Writer/Reader; Python `KafkaConsumer` +
  `producer.send/produce`
- RabbitMQ: Java `@RabbitListener` + `rabbitTemplate.convertAndSend`;
  Node `channel.consume/publish/sendToQueue`; Python `basic_consume/
  basic_publish` with keyword args
- NATS: Go and Node `nc.Subscribe/Publish`; Go and Node JetStream
  `js.Subscribe/Publish`; Python `await nc.subscribe/publish`

Including the regression test for the sarama `ProducerMessage`
in-loop case — the AST-based query captures every literal in the
file independently, not just the first one after `NewSyncProducer`.

## Neighbor regression check

- `topic-extractor.test.ts` — 30/30 pass (rewritten extractor)
- `http-route-extractor.test.ts` — 18/18 pass (untouched)
- `grpc-extractor.test.ts` — 43/43 pass (untouched)
- `manifest-extractor.test.ts` — 8/8 pass (untouched)
- Full `npx tsc --noEmit` clean

## Scope discipline (per GUARDRAILS.md)

- Only files under `src/core/group/extractors/` are touched; no
  changes to other extractors, tests, MCP surface, or pipeline.ts.
- No CI/release/security config changes, no secrets.
- New tree-sitter imports all reference grammars that are already
  installed as dependencies (`tree-sitter`, `tree-sitter-javascript`,
  `tree-sitter-typescript`, `tree-sitter-python`, `tree-sitter-java`,
  `tree-sitter-go` — all in `package.json` for the existing pipeline).

## Phase 2 / phase 3 plan

- **Phase 2 (next commit):** rewrite `http-route-extractor.ts`
  Strategy B (regex fallback) on the same plugin pattern. Graph-assisted
  Strategy A stays as-is (already uses pipeline-built tree-sitter data
  via `HANDLES_ROUTE` Cypher queries).
- **Phase 3 (commit after):** rewrite `grpc-extractor.ts` for Java /
  Go / Python / TypeScript detection. `.proto` files are the one
  outstanding question — there is no `tree-sitter-proto` grammar
  installed; the in-tree string-sanitizing parser stays as a pragmatic
  exception with a comment, alternative being to add
  `tree-sitter-proto` as a dep (open for the maintainer).

Co-authored-by: Claude <noreply@anthropic.com>

* refactor(group): migrate http-route-extractor Strategy B to tree-sitter plugins

Phase 2 of the extractor refactor requested by @magyargergo on abhigyanpatwari#796.
Same architecture as the phase 1 topic-extractor rewrite: a thin,
language-agnostic orchestrator plus per-language plugins that own
tree-sitter grammars and query sources. The top-level extractor file
no longer imports any tree-sitter grammar or query string.

## Architecture

```
src/core/group/extractors/
├── tree-sitter-scanner.ts          # shared, language-agnostic primitives
├── http-route-extractor.ts         # thin orchestrator (no grammar imports)
└── http-patterns/
    ├── types.ts                    # HttpDetection, HttpLanguagePlugin, HttpRole
    ├── index.ts                    # registry: ext → plugin + HTTP_SCAN_GLOB
    ├── java.ts                     # tree-sitter-java: Spring + RestTemplate/WebClient/OkHttp
    ├── go.ts                       # tree-sitter-go: gin/echo/HandleFunc + http/resty consumers
    ├── python.ts                   # tree-sitter-python: FastAPI + requests
    ├── php.ts                      # tree-sitter-php: Laravel Route::get/...
    └── node.ts                     # tree-sitter-javascript + tree-sitter-typescript:
                                    #   NestJS controllers, Express, fetch, axios
```

**Shared scanner (`tree-sitter-scanner.ts`)** — generalised from phase 1:
- `ScanMatch<TMeta>.captures` is now a full `CaptureMap` (every named
  capture the query binds, not just a single `@value`). Topic extractor
  updated to read `match.captures.value` accordingly.
- New `runCompiledPatterns(plugin, tree)` helper lets plugins run
  multiple query bundles against the same pre-parsed tree. This is
  needed for HTTP plugins that combine a class-prefix query with a
  method-route query (Spring, NestJS).
- `scanFile` becomes a thin wrapper over `parser.parse + runCompiledPatterns`.

**HTTP plugin shape** — unlike topic plugins, HTTP plugins expose a
`scan(tree)` function rather than a flat pattern list. This reflects
HTTP's more complex extraction: each detection needs method + path +
handler name, and framework patterns like Spring `@RequestMapping` /
NestJS `@Controller` require cross-referencing a class-level prefix
with method-level annotations. Plugins internally use
`compilePatterns` + `runCompiledPatterns` and walk the AST to resolve
the class/method relationships.

**Per-framework coverage:**

- **Java (`java.ts`)**
  - Spring: `@RequestMapping("/api/v2")` class prefix + `@(Get|Post|Put|
    Delete|Patch)Mapping("/sub")` method routes, joined via the
    enclosing `class_declaration` node id.
  - `RestTemplate.getForObject/postForEntity/put/delete/patchForObject` →
    method derived from API name.
  - `WebClient.method(HttpMethod.X, "/path")` → method from
    `HttpMethod.X` capture.
  - `new Request.Builder().url("/path")` → OkHttp consumer.

- **Go (`go.ts`)**
  - gin / echo / chi frameworks: `\w+.GET("/path", handler)` captures
    upper-case verb + handler identifier.
  - `net/http.HandleFunc("/path", handler)` → provider (default GET).
  - `http.Get/Post/Head` consumer, `http.NewRequest("METHOD", ...)`,
    resty `client.R().Get/Post/...`.

- **Python (`python.ts`)**
  - `@app.get("/path")` FastAPI decorators.
  - `requests.get/post/...` and `requests.request("METHOD", "url")`.

- **PHP (`php.ts`)**
  - Laravel `Route::get/post/.../patch('/path', ...)` via
    `scoped_call_expression`. Uses `PHP.php_only` to match the
    existing ingestion pipeline's grammar selection.

- **Node (`node.ts`) — JS + TS + TSX**
  - Pattern sources defined once, compiled against three grammar
    variants (`JavaScript`, `TypeScript.typescript`, `TypeScript.tsx`)
    because `Parser.Query` objects are not portable across grammars.
    Exports three plugins sharing the same `scan` logic.
  - NestJS: `@Controller('prefix')` decorators are siblings of the
    class in `export_statement` / `program`; `@Get(':id')` decorators
    are siblings of the method in `class_body`. The plugin walks
    decorator → next named sibling to find the decorated class /
    method, then combines the class prefix with the method path.
    Only emits NestJS detections when the enclosing class has a real
    `@Controller` decorator — prevents false positives from generic
    classes that happen to use `@Get` from another library.
  - Express: `(router|app).<verb>('/path', ...)`.
  - `fetch(url)` (default GET) + `fetch(url, { method: 'X' })`
    (uses two queries + a SyntaxNode-id dedupe set so URL literals
    aren't double-emitted by the options variant).
  - `axios.get/post/...`.

## Orchestrator changes

`http-route-extractor.ts` drops every `scanXxxProviders` / `scanXxxConsumers`
regex method and replaces them with a single source-scan loop that
delegates to `getPluginForFile(rel).scan(tree)`. The orchestrator
still owns:

- **Path normalization** (`normalizeHttpPath`, `normalizeConsumerPath`)
  — language-agnostic string processing shared by both strategies.
- **Graph-assisted Strategy A** (`HANDLES_ROUTE` / `FETCHES` / `CONTAINS`
  Cypher queries) — unchanged in spirit. The only regex helpers it
  used (`inferMethodFromFileScan`, `pickJavaHandlerName`) are now
  replaced by a lookup against the plugin's detections for the same
  file: for each route row, find the detection whose normalized path
  matches, and pull the HTTP method + handler name from it.
- **Per-file parse cache** — the orchestrator parses each relevant
  file at most once per `extract()` call. Both the graph-assisted
  enrichment loop and the source-scan fallback share the same
  `cachedDetections` map, so we never run the plugin twice for the
  same file.

## Why this is better than the regex version

1. **Comments and strings for free.** The old regex would match
   `// router.get('/fake')` as a real Express route; tree-sitter
   never visits string/comment nodes.
2. **Structural controller-prefix.** Spring and NestJS class-prefix
   joining is now scoped to the enclosing class via `class_declaration`
   node ids, eliminating file-wide state that broke when a file had
   multiple controllers.
3. **Precise NestJS disambiguation.** The plugin only emits a NestJS
   detection when the enclosing class has a real `@Controller`
   decorator — the old regex would fire on any `@Get(...)` in the
   file regardless of surrounding context.
4. **Language-agnostic extension.** Adding Ruby / Rust / Kotlin HTTP
   detection later means dropping one file in `http-patterns/` — no
   changes to the shared scanner, the orchestrator, or the Strategy A
   Cypher queries.

## Tests

- `http-route-extractor.test.ts` — **18/18 pass** (tests unchanged;
  they're contract-style input/output tests and the contract shape is
  unchanged). Covers Spring class prefix, Express, gin/echo, stdlib
  HandleFunc, NestJS, Laravel, FastAPI for providers and
  fetch/axios/python-requests/rest-template/webClient/okhttp/go-stdlib/
  resty for consumers, plus graph-first Strategy A for both.
- `topic-extractor.test.ts` — **30/30 pass** after the `captures.value`
  API migration.
- `grpc-extractor.test.ts` — 43/43 pass (untouched; phase 3).
- `manifest-extractor.test.ts` — 8/8 pass (untouched).
- `service.test.ts`, `sync.test.ts`, `storage.test.ts` — 41/41 pass.
- `npx tsc -p tsconfig.json --noEmit` clean.

## Scope discipline (per GUARDRAILS.md)

- Only files under `src/core/group/extractors/` are touched.
- No changes to pipeline.ts, MCP surface, ingestion, or tests.
- No CI / release / security / secrets changes.
- Tree-sitter grammars imported by plugins (`tree-sitter-java`,
  `tree-sitter-go`, `tree-sitter-python`, `tree-sitter-php`,
  `tree-sitter-javascript`, `tree-sitter-typescript`) are all already
  in `package.json` for the existing ingestion pipeline.

## Phase 3 plan

- **grpc-extractor** gets the same treatment: plugin-per-language under
  `grpc-patterns/` for Java / Go / Python / TS detection. `.proto`
  files remain an open question — no `tree-sitter-proto` grammar is
  installed, so the in-tree string-sanitizing parser from PR abhigyanpatwari#796's
  self-review stays as a pragmatic exception unless the maintainer
  wants us to add `tree-sitter-proto` as a new dep.

Co-authored-by: Claude <noreply@anthropic.com>

* refactor(group): migrate grpc-extractor source scans to tree-sitter plugins

Phase 3 (final) of the extractor refactor requested by @magyargergo on
abhigyanpatwari#796. Same architecture as phase 1 (topic) and phase 2 (http): thin
language-agnostic orchestrator + per-language plugins that own
tree-sitter grammars and query sources. With this commit the top-level
extractors under `src/core/group/extractors/` import ZERO tree-sitter
grammars and ZERO query strings — every grammar import lives in a
`*-patterns/<lang>.ts` plugin file, and the orchestrators go through
the registry indirection.

## Architecture

```
src/core/group/extractors/
├── tree-sitter-scanner.ts         # shared primitives (unchanged)
├── grpc-extractor.ts               # orchestrator (only `.proto` parser left)
└── grpc-patterns/
    ├── types.ts                    # GrpcDetection, GrpcLanguagePlugin, GrpcRole
    ├── index.ts                    # registry: ext → plugin + GRPC_SCAN_GLOB
    ├── go.ts                       # tree-sitter-go: RegisterXxxServer, Unimplemented, NewXxxClient
    ├── java.ts                     # tree-sitter-java: @GrpcService + XxxImplBase + newBlockingStub
    ├── python.ts                   # tree-sitter-python: add_XxxServicer_to_server + XxxStub
    └── node.ts                     # tree-sitter-javascript + tree-sitter-typescript:
                                    #   @GrpcMethod, @GrpcClient field type,
                                    #   .getService<X>('Svc'), new XxxServiceClient,
                                    #   loadPackageDefinition dynamic constructors
```

## Per-language coverage

**Go (`go.ts`)**
- Provider: `\w+.RegisterXxxServer(...)` via `call_expression →
  selector_expression → field_identifier` + JS regex filter
  `^Register(\w+)Server$`.
- Provider: `pb.UnimplementedXxxServer` embedded in a struct via
  `struct_type → field_declaration_list → field_declaration →
  qualified_type → type_identifier` + JS filter.
- Consumer: `\w+.NewXxxClient(...)` via the same call_expression
  query + JS filter `^New(\w+)Client$`.

**Java (`java.ts`)**
- Provider: `class X extends YyyGrpc.YyyImplBase` — two queries
  handle the scoped and plain forms. `scoped_type_identifier`'s
  children are positional (no `scope:`/`name:` fields), so the
  query matches the two `type_identifier` children by position.
- `#match? @inner "ImplBase$"` restricts matches at query time.
- Whether the class has `@GrpcService` or not controls only the
  `source` metadata label — the plugin walks the class_declaration's
  `modifiers` child in JS to detect the marker_annotation.
- Consumer: `YyyGrpc.newStub(ch)` / `newBlockingStub(ch)` via a
  `method_invocation` query with `#match? @method
  "^new(Blocking)?Stub$"`, service name extracted via
  `^(\w+)Grpc$` on the object identifier.

**Python (`python.ts`)**
- Single call-expression query covers both bare identifier and
  `obj.method` attribute forms:
  `(call function: [(identifier) @fn (attribute attribute: (identifier) @fn)])`.
- Plugin filters `@fn.text` against two JS regexes:
  `^add_(\w+)Servicer_to_server$` (provider) and `^(\w+)Stub$`
  (consumer), with a reserved-names ignore list for the Stub case
  (Mock / Test / Fake / Stub).

**Node — JavaScript + TypeScript + TSX (`node.ts`)**
- Pattern sources defined once, compiled three times (one per grammar)
  because `Parser.Query` objects are not portable across grammars.
  Exports three `GrpcLanguagePlugin`s sharing the same `scan`.
- `@GrpcMethod('Service', 'Method')`: decorator query captures the
  two string literals. Confidence is hard-coded 0.8 regardless of
  proto map resolution (matches the original regex version's
  behaviour).
- `@GrpcClient(...) field: XxxServiceClient`: decorator query
  captures the decorator node, plugin walks up to find the enclosing
  `public_field_definition` (decorators on fields are CHILDREN of
  the field definition in tree-sitter-typescript, not siblings) and
  reads its first `type_annotation → type_identifier`, then runs the
  `^(\w+Service)Client$` JS filter.
- `client.getService<X>('AuthService')`: call-expression query on
  `member_expression.property = "getService"` + string literal arg.
- `new XxxServiceClient(...)`: `new_expression` with a bare
  identifier constructor, filtered by `^(\w+Service)Client$` so
  generic `new AuthClient(...)` (missing the `Service` infix) does
  NOT falsely register as a consumer. Preserves the regression test
  `test_extract_ts_non_service_client_constructor_is_ignored`.
- `loadPackageDefinition` dynamic loader: gated on
  `tree.rootNode.text.includes('loadPackageDefinition')`. When set,
  `new foo.bar.Xxx(...)` qualified constructors with a capitalised
  property name register as consumers.

## Orchestrator changes

`grpc-extractor.ts` loses every `scanGoProviders` / `scanJavaProviders`
/ ... helper and replaces them with a single source-scan loop that:

1. Parses each file with the plugin's grammar (one shared `Parser`
   instance across all files, `setLanguage` called per plugin).
2. Calls `plugin.scan(tree)` to get `GrpcDetection[]`.
3. Converts each detection to an `ExtractedContract` via the private
   `detectionToContract` helper, which:
   - Looks the short service name up in the proto map (filled by
     the `.proto` parser).
   - Picks confidence = `confidenceWithProto` if resolved, else
     `confidenceWithoutProto`.
   - Builds a method-level contract id (`grpc::pkg.Svc/Method`) when
     the detection carries a `methodName` (TS `@GrpcMethod` only),
     otherwise a service-level id (`grpc::pkg.Svc/*`).

Everything else — the `.proto` parser, `buildProtoContext`,
`buildProtoMap`, `resolveProtoConflict`, `serviceContractId`,
`stripProtoCommentsAndStrings`, `extractServiceBlocks`, the dedupe
function — stays exactly as before. The `.proto` parser is kept as a
pragmatic exception to the "no regex in extractors" rule because no
`tree-sitter-proto` grammar is installed in the repo; a comment at the
top of the file explains this and flags the maintainer option of
adding `tree-sitter-proto` as a dependency.

## Why this is better than the regex version

1. **Comments and strings are respected for free.** Matched node types
   are only code constructs, never text inside comments or string
   literals.
2. **No false positives on partial names.** The old `(\w+?)Grpc`-style
   regexes would cross-match unrelated identifiers; structural queries
   restrict matches to the exact AST shape (`scoped_type_identifier →
   type_identifier` pairs, `method_invocation → identifier` etc.).
3. **NestJS `@GrpcClient` is structural, not regex-based.** The old
   regex required a specific textual layout
   (`@GrpcClient(...) private readonly foo!: XxxServiceClient`); the
   plugin now walks the AST, so modifier order / optional modifiers /
   multi-line formatting don't break it.
4. **Language-agnostic extension.** Adding Kotlin / Rust / C# gRPC
   detection later is a one-file edit in `grpc-patterns/index.ts` —
   no touches to the shared scanner, the orchestrator, or the proto
   parser.

## Tests

- `grpc-extractor.test.ts` — **43/43 pass** (tests unchanged; the
  contract shape is identical). Covers .proto parsing (including the
  brace-inside-string regression), Go provider/consumer,
  Java @GrpcService / plain ImplBase provider + newBlockingStub
  consumer, Python servicer + stub, TS @GrpcMethod + @GrpcClient +
  .getService + new XxxServiceClient + loadPackageDefinition + the
  `AuthClient` vs `AuthServiceClient` discrimination, dedupe across
  multiple patterns in one file, proto-aware confidence, and the
  inherited-package resolution for split proto definitions.
- `topic-extractor.test.ts` — 30/30 pass.
- `http-route-extractor.test.ts` — 18/18 pass.
- `manifest-extractor.test.ts` — 8/8 pass.
- `service.test.ts`, `sync.test.ts`, `storage.test.ts` — 41/41 pass.
- `npx tsc -p tsconfig.json --noEmit` clean.

## Scope discipline (per GUARDRAILS.md)

- Only files under `src/core/group/extractors/` are touched.
- No pipeline.ts, MCP surface, ingestion, CI / release / security, or
  test changes.
- New tree-sitter grammar imports (`tree-sitter-go`, `tree-sitter-java`,
  `tree-sitter-python`, `tree-sitter-javascript`, `tree-sitter-typescript`)
  are all already installed for the ingestion pipeline.

## End of phase series

This commit completes the three-phase extractor refactor:
  - **Phase 1** (`ea06d11`): topic-extractor → `topic-patterns/`
  - **Phase 2** (`b6015f6`): http-route-extractor → `http-patterns/`
  - **Phase 3** (this commit): grpc-extractor → `grpc-patterns/`

Every remaining regex-based extractor helper under the `src/core/group/
extractors/` directory is either (a) language-agnostic string
processing (path normalization, dedupe keys) or (b) the `.proto`
parser, which is documented as an explicit exception.

Co-authored-by: Claude <noreply@anthropic.com>

* feat(group): add tree-sitter-proto for .proto file parsing

Addresses @magyargergo's suggestion on abhigyanpatwari#796 to replace the manual
string-sanitizing .proto parser with a tree-sitter grammar.

- **Vendored `tree-sitter-proto`** in `vendor/tree-sitter-proto/`.
  Grammar source from [coder3101/tree-sitter-proto](https://github.com/coder3101/tree-sitter-proto)
  (latest `grammar.js`), parser.c regenerated with `tree-sitter-cli
  0.24` to produce ABI version 14 — compatible with the project's
  `tree-sitter 0.25` runtime (which supports ABI ≤ 14). Added as
  `optionalDependency` with `file:./vendor/tree-sitter-proto`.

- **New `grpc-patterns/proto.ts` plugin** — uses the same
  `compilePatterns` + `runCompiledPatterns` infrastructure as every
  other plugin. Two queries:
  - `(package (full_ident) @pkg)` — package declaration
  - `(service (service_name) @service_name (rpc (rpc_name) @rpc_name))`
    — one match per (service, rpc) pair

- **Graceful fallback** — `tree-sitter-proto` is an optional
  dependency. If it fails to install (platform incompatibility) or
  fails the runtime smoke-test (`setLanguage` + `parse` on a trivial
  proto), `PROTO_GRPC_PLUGIN` stays `null` and the orchestrator
  uses the existing manual parser. The smoke-test catches the
  `SyntaxNode` TDZ error that occurs in vitest's fork-based test
  runner.

- **Orchestrator updated** — when `hasProtoPlugin` is true, `.proto`
  files are handled by the plugin loop (they're included in
  `GRPC_SCAN_GLOB`), and the manual `parseProtoFile` loop is
  skipped. `buildProtoContext` still runs to build the proto map
  for cross-referencing source-file detections.

1. **No manual comment/string stripping.** The old parser needed
   `stripProtoCommentsAndStrings` (110 lines) to avoid counting
   braces inside comments and string literals. tree-sitter handles
   this natively.
2. **No brace-depth tracking.** `extractServiceBlocks` used a manual
   depth counter to find service boundaries. tree-sitter's AST gives
   us `service` → `service_name` + `rpc` → `rpc_name` directly.
3. **Performance.** tree-sitter's C-based parser is faster than
   character-by-character JS scanning + regex on large proto files.

- `grpc-extractor.test.ts` — **43/43 pass** (unchanged)
- All other extractor tests — 99/99 pass
- `npx tsc -p tsconfig.json --noEmit` clean

Co-authored-by: Claude <noreply@anthropic.com>

* chore: add .gitignore for vendored tree-sitter-proto build artifacts

https://claude.ai/code/session_01SFUCxgKMMQ8EgRHYw91xPU

* fix: correct .gitignore paths for vendored tree-sitter-proto

Patterns should be relative to the .gitignore file's directory.

https://claude.ai/code/session_01SFUCxgKMMQ8EgRHYw91xPU

* refactor(group): address Copilot review feedback on abhigyanpatwari#796

Six fixes suggested by the Copilot AI review:

1. **`normalizeHttpPath` root-path edge case** — stripping trailing
   slashes on the input `/` produced an empty string, yielding
   malformed contract ids like `http::GET::`. Now preserves `/` for
   the root handler/fetch case.

2. **Dedupe `scanFiles` call** — `extract()` was globbing the
   source-scan file list twice (once for the provider fallback, once
   for the consumer fallback). Moved to a single lazy call that
   memoizes the result for the rest of the method.

3. **HTTP `scanFiles` now ignores `**/vendor/**`** — every other
   extractor's glob already ignored vendored sources; the HTTP one
   didn't. Fixed for consistency.

4. **`loadPackageDefinition` check is now structural** — was calling
   `tree.rootNode.text.includes('loadPackageDefinition')` which forces
   materialization of the entire file text from the parse tree
   (expensive on large files). Replaced with a dedicated compiled
   query on `(call_expression function: [(identifier) | (member_expression)])`
   so the check stays in the AST domain.

5. **`grpc-extractor.ts` header docstring updated** — still claimed
   ".proto parsing is not tree-sitter-based because no grammar is
   installed". Now describes the actual behaviour: tree-sitter when
   `tree-sitter-proto` is available (optionalDependency), manual
   fallback otherwise.

6. **Eliminated the double proto file parse on the fallback path** —
   `buildProtoContext` already globs + parses every `.proto` file to
   build `servicesByName`. On the `!hasProtoPlugin` branch the
   extractor was globbing + parsing again via the now-removed
   `parseProtoFile` helper. The fallback branch now iterates the map
   that `buildProtoContext` already produced to emit provider
   contracts directly — single pass per proto file.

## Tests

- `topic-extractor.test.ts` — 30/30 pass
- `http-route-extractor.test.ts` — 18/18 pass
- `grpc-extractor.test.ts` — 43/43 pass
- `manifest-extractor.test.ts` — 8/8 pass
- `npx tsc -p tsconfig.json --noEmit` clean

Co-authored-by: Claude <noreply@anthropic.com>

* refactor(group): address Claude review feedback (bugs + dedup + hygiene) on abhigyanpatwari#796

Follows up `2f28bfc` with the remaining items from the Claude AI review:

## Bugs

**Bug 2 — Label-unaware Cypher queries in `resolveSymbol`.**
The manifest-extractor's lookup queries were `MATCH (n) WHERE n.name = $x`
with no label filter, so a topic/service/package name could silently match
any node type (File, Variable, Import, Folder, …). Added label filters:
- `topic` → `(n:Function|Method|Class|Interface)` (topics are best-effort
  symbol-name matches against listener/publisher symbols)
- `grpc` method → `(n:Function|Method)`
- `grpc` service → `(n:Class|Interface)`
- `lib` → `(n:Package|Module)`

All 8 manifest-extractor tests still pass (mock executor is
label-agnostic, but the production LadybugDB graph now gets correctly
scoped queries).

**Bug 8 — Tautological `!handlerName` condition.**
`http-route-extractor.ts:extractProvidersGraph` had
`let handlerName = null; if (!method || !handlerName) { ... }` — the
`!handlerName` clause was always true since there was no intervening
assignment. Simplified to always run the plugin-scan lookup (we need
the handler name even when `methodFromRouteReason` already resolved
the method).

## Clean code / dedup

**Design 7 — `readSafe` was copy-pasted in all three orchestrators.**
Extracted to `extractors/fs-utils.ts` as the single source of truth
for the path-traversal guard. Dropped the three local copies and the
now-unused `fs`/`path` imports from topic-extractor.

**Style 10 — Language-specific `_test.go` skip in the topic orchestrator.**
Was `if (rel.endsWith('_test.go')) continue;` inside the language-
agnostic extraction loop. Pushed into the glob's ignore list
(`'**/*_test.go'`) alongside the existing `node_modules`, `vendor`,
`dist`, `build` entries, with a comment explaining that other
languages' test file conventions either live in separate directories
(Python `tests/`, Java `src/test/`) or are already covered by the
existing ignores.

## Already addressed in `2f28bfc` (mentioned again in Claude review)

- Bug 3: `normalizeHttpPath('/')` returns `''` — fixed
- Bug 4: double glob + double parse of `.proto` — fixed
- Bug 5: `scanFiles` called twice in HTTP — fixed
- Bug 6: missing `**/vendor/**` in HTTP glob — fixed
- Design 9 partially: `tree.rootNode.text.includes('loadPackageDefinition')`
  replaced with a dedicated structural query

## Deferred

- Bug 1 (`http::*::path` vs `http::GET::path` matching) — out of scope;
  sync.ts matching logic lands in abhigyanpatwari#793, manifest extractor already
  emits correct synthetic uids for unresolved HTTP contracts.
- Design 9 full (change plugin `scan(tree)` → `scan(tree, source)`) —
  the only real use case (`loadPackageDefinition` gate) is already
  fixed via a structural query, so the interface change would be
  cosmetic churn without a concrete consumer.

## Tests

- `topic-extractor.test.ts` — 30/30 pass
- `http-route-extractor.test.ts` — 18/18 pass
- `grpc-extractor.test.ts` — 43/43 pass
- `manifest-extractor.test.ts` — 8/8 pass
- `npx tsc -p tsconfig.json --noEmit` clean

Co-authored-by: Claude <noreply@anthropic.com>

* docs+fix(group): address remaining Claude review items + add pipeline flow chart

## Fixes

**Remaining 🔴 — HTTP contract id wildcard format.** Documented the
`http::*::<path>` format as an intentional wildcard for manifest links
that omit the HTTP method, alongside the explicit-method form
(`GET::/path` → `http::GET::/path`). The docblock on `buildContractId`
now states both forms, notes that wildcard-aware matching is the
responsibility of the sync / cross-impact layer (abhigyanpatwari#793), and
recommends the explicit-method form whenever the author knows the
method (it round-trips through exact equality without needing
wildcard logic downstream). Tests unchanged — the wildcard format is
what they've always asserted.

**Minor 1 — stale comment at `manifest-extractor.ts:124-126`.** The
comment claimed "creates a contract with an empty symbolUid/ref" but
the code switched to `manifestSymbolUid(repo, contractId)` a few
commits back. Updated to describe the actual synthetic-uid fallback
semantics and the cross-impact path that relies on both sides of the
join deriving the same uid.

**Minor 2 — exhaustiveness guard on `buildContractId`.** The
`switch(type)` covered all five current `ContractType` variants but
silently returned `undefined` if a new variant was added. Added a
`default: const _exhaustive: never = type; throw new Error(...)`
clause so the build fails loudly on an unhandled variant.

**Minor 3 — `tree.rootNode.text` in `grpc-patterns/node.ts`.** Already
fixed in `2f28bfc` via a dedicated structural query
(`LOAD_PACKAGE_DEFINITION_SPEC`). No action needed.

## New: pipeline flow chart (per @magyargergo's request)

Added `src/core/group/PIPELINE.md` with four Mermaid diagrams:
1. **High-level overview** — `group.yaml` → extractors + manifest →
   contract matching → `bridge.lbug` → `runGroupImpact`.
2. **Per-repo extractor two-strategy shape** — graph-assisted
   Strategy A vs. source-scan Strategy B.
3. **Plugin architecture** — orchestrator → registry →
   per-language `*-patterns/<lang>.ts` → `tree-sitter-scanner.ts` →
   `ExtractedContract`.
4. **Manifest extraction** — label-scoped `resolveSymbol` with the
   synthetic-uid fallback.
5. **Cross-impact query (abhigyanpatwari#606)** — local impact → bridge join →
   cross-repo fan-out.

Each diagram is annotated with which PRs own which stage (this PR:
extractors + manifest; abhigyanpatwari#795: bridge storage; abhigyanpatwari#606: cross-impact
runtime) and points at the concrete files/functions involved.

## Tests

- 99/99 extractor tests pass
- `npx tsc -p tsconfig.json --noEmit` clean

Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[group] Bridge storage and contract matching foundation (split 2/5 of #606)

2 participants