Skip to content

updated mahalanobis threshold to be multi-dim aware#7

Merged
zander-raycraft merged 1 commit into
mainfrom
gh/issue-pr-filter
Mar 22, 2026
Merged

updated mahalanobis threshold to be multi-dim aware#7
zander-raycraft merged 1 commit into
mainfrom
gh/issue-pr-filter

Conversation

@zander-raycraft

Copy link
Copy Markdown
Owner

No description provided.

@zander-raycraft zander-raycraft marked this pull request as ready for review March 22, 2026 01:07
@zander-raycraft zander-raycraft merged commit f409fb7 into main Mar 22, 2026
4 of 5 checks passed
@github-actions

Copy link
Copy Markdown

PR description is too short

This PR's description is 0 characters, but the minimum is 50.

Please update the PR description to explain:

  • What this PR changes
  • Why the change is needed

Use the PR template as a guide. This check will re-run when you edit the description.

@github-actions

Copy link
Copy Markdown

CI Report

All checks passedcc2b13e

Pipeline

Stage Status Ubuntu Windows macOS
Typecheck success
Tests success

Tests

Metric Value
Total 3600
Passed 3580
Skipped 20
Files 1028
Duration 1m 55s

✅ All 3580 tests passed across 1028 files

20 test(s) skipped
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
  • Python match/case as-pattern type binding > resolves u.save() to User#save via match/case as-pattern binding
  • Python match/case as-pattern type binding > does NOT resolve u.save() to Repo#save (negative disambiguation)
  • Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
  • Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
  • Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
  • Swift self resolution > detects User and Repo classes, each with a save function
  • Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
  • Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
  • Swift parent resolution > emits EXTENDS edge: User → BaseModel
  • Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
  • Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
  • Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
  • Swift return type inference > detects User class and getUser function
  • Swift return type inference > detects save function on User (Swift class methods are Function nodes)
  • Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
  • Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
  • Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
  • Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()

Coverage

Metric Coverage Covered Base (main) Delta
Statements 68.96% 9079/13164 68.97%
Branches 60.08% 6159/10250 60.1%
Functions 71.47% 802/1122 71.56% 📉 -0.1%
Lines 71.22% 8094/11364 71.22%

📋 Full run · Coverage from Ubuntu · Generated by CI

zander-raycraft pushed a commit that referenced this pull request May 7, 2026
…atwari#756)

* Initial plan

* feat(SM-13): extract resolveFreeCall from resolveCallTarget

Extract the free-function call resolution path into a dedicated
`resolveFreeCall(calledName, filePath, ctx)` function that uses
`lookupExact` + import-scoped resolution via `ctx.resolve()`.

- Free function calls (foo()) now route through `resolveFreeCall`
- Swift/Kotlin implicit constructors (User()) delegate to
  `resolveStaticCall` within `resolveFreeCall`
- `resolveCallTarget` dispatches `callForm === 'free'` early,
  removing the inline freeFormHasClassTarget logic
- S0 block simplified to only handle `callForm === 'constructor'`
- Global (Tier 3) fallthrough preserved via ctx.resolve() until Phase 5
- 9 new unit tests for resolveFreeCall
- All 163 unit tests pass, all 1199 integration resolver tests pass

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/c5f2e73a-259a-438c-b5c8-286b82e3c215

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* chore: revert unrelated package-lock.json change

Agent-Logs-Url: https://github.com/abhigyanpatwari/GitNexus/sessions/c5f2e73a-259a-438c-b5c8-286b82e3c215

Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>

* fix(SM-13): address PR abhigyanpatwari#756 review findings on resolveFreeCall

Addresses all 7 findings from the PR abhigyanpatwari#756 review comment.

Code (R1, finding #1)
- Replace the literal `'Class' | 'Struct' | 'Record'` check in
  `hasClassTarget` with `INSTANTIABLE_CLASS_TYPES.has(c.type)`. Converts
  an invariant that was previously comment-enforced ("keep this list
  aligned with INSTANTIABLE_CLASS_TYPES") into one enforced structurally.
  Any future extension of the set propagates here automatically. The
  narrower Swift extension dedup block below still uses literal
  `'Class' | 'Struct'` by design — Swift extensions only produce Class
  duplicates in practice, Record is deliberately excluded there, and
  the inline comment now documents that asymmetry.

Tests (+12 regression scenarios)

Finding #2 — language coverage
- Go free function (doStuff())
- Python free function (def helper(): ... helper())
- Rust free function outside any impl block
- Java statically-imported function
- JavaScript module-level function
Each exercises `_resolveCallTargetForTesting` with `callForm='free'`
and the language-specific file extension. `resolveFreeCall` has no
file-extension branching, so these guard the dispatch chain per
language without assuming extractor-specific symbol shapes.

Finding #3 — argCount threading
- 2-arg overload selected when argCount=2
- 0-arg overload selected when argCount=0

Finding #5 — Tier 3 (global) resolution
- Function globally visible but not imported. Asserts exact
  `TIER_CONFIDENCE.global === 0.5` and `reason === 'global'` to catch
  silent drift if the tier table is ever refactored.

Finding #6 — preComputedArgTypes worker path
- String overload matched via preComputedArgTypes=['String']
- Int overload matched via preComputedArgTypes=['int'] (lowercase,
  mirroring the parse-worker's inferred-literal shape; stored 'Int' is
  normalized via normalizeJvmTypeName at comparison time)

Finding #7 — Enum null-route documentation
- Enum-only free call asserts `toBeNull()` with an explanatory comment
  linking to the INSTANTIABLE_CLASS_TYPES rationale. NOT marked skipped
  — current behavior is intentional, not broken.

Finding #4 — Swift extension dedup guard
- Two same-name Class entries at different path lengths; exercises the
  full dispatch chain:
    1. filterCallableCandidates with 'free' strips Class → length 0
    2. hasClassTarget triggers resolveStaticCall
    3. Homonym ambiguity null-routes per SM-12 round-1 contract
    4. Constructor-form retry repopulates with both Classes
    5. Dedup block sorts by filePath.length → shortest path wins

Verification
- `tsc --noEmit` clean
- 3064 unit tests pass (+12)
- 1766 integration tests pass
- Zero regressions

Plan: docs/plans/2026-04-09-003-fix-sm13-resolve-free-call-review-findings-plan.md
Review: abhigyanpatwari#756 (comment)

* refactor(SM-13): extract dedupSwiftExtensionCandidates shared helper

Follow-up to the PR abhigyanpatwari#756 review fix. SM-13 duplicated the Swift
extension same-name collision dedup block between `resolveCallTarget`
and `resolveFreeCall` — two copies of identical 15-line logic with the
same heuristic (`filePath.length` sort, Class/Struct-only, `length > 1`
guard). Extract a single shared helper so the two sites cannot drift.

Changes
- New `dedupSwiftExtensionCandidates(candidates, tier)` helper defined
  alongside `tryOverloadDisambiguation`, with JSDoc documenting:
  - The Swift extension scenario it addresses
  - Why it is intentionally narrower than INSTANTIABLE_CLASS_TYPES
    (Class/Struct only, not Record — C#/Kotlin records don't exhibit
    the multi-file definition pattern, widening risks accidental
    dedup of legitimately distinct record types)
  - The return-null-on-no-match contract so callers can fall through
- `resolveCallTarget` tail dedup (was lines 1593-1610): replaced with
  a single `dedupSwiftExtensionCandidates` call
- `resolveFreeCall` tail dedup (was lines 1994-2012): same replacement
- Net line count: -32 insertions, -9 deletions in the consumer sites,
  +36 for the shared helper + JSDoc

Verification
- `tsc --noEmit` clean
- 3064 unit tests pass (including the R7 Swift dedup guard test added
  in the previous commit that exercises the full free-form retry
  chain through this helper)
- 1766 integration tests pass
- Zero regressions

Follows-up on: abhigyanpatwari#756

* docs(SM-13): address PR abhigyanpatwari#756 final review — comment cleanup only

Three documentation-only findings from the approval review. No
behavior change, no new tests, no code path modifications.

Finding #1 — stale line-number comment
- The comment inside `resolveFreeCall` at the `hasClassTarget` site
  referenced "lines ~1994-2008" for the Swift extension dedup block.
  Those lines were the inlined pre-SM-13 version; the block has since
  been extracted to `dedupSwiftExtensionCandidates`. Replaced the line
  reference with the helper name so future readers don't chase dead
  line numbers.

Finding #2 — fuzzy-widening asymmetry undocumented
- `resolveFreeCall` intentionally has no `widenCache` parameter and no
  D2 fuzzy-widening pass (unlike `resolveCallTarget`'s member-call
  path). Added an explicit "Asymmetry vs `resolveCallTarget`" paragraph
  to the JSDoc so a caller comparing the two signatures knows the
  skipped pass is deliberate and tied to Phase 5.

Finding #3 — constructor-form retry reasons undocumented
- `resolveStaticCall` can return null for three distinct reasons
  (empty instantiable pool, homonym ambiguity, ownerless Constructor
  nodes). The retry below it unconditionally re-filters with
  `'constructor'` form, which is correct for all three but not
  obvious. Added a structured three-case comment enumerating each
  reason and linking (a) to the SM-12 null-route contract, (b) to
  the R7 dedup test, and (c) to the currently-uncovered ownerless-
  Constructor path (noted as a future test candidate).

Verification
- `tsc --noEmit` clean
- 175 `resolveFreeCall` + `resolveStaticCall` + sibling tests pass
  (sanity check — no behavior change expected)
- No regressions

Follows-up on: abhigyanpatwari#756 (comment)

* test(SM-13): cover ownerless-Constructor retry + PHP free function

Two low-severity test gaps from PR abhigyanpatwari#756 review comment 4215739052 —
previously addressed doc-only, now have concrete test coverage.

Finding #3 low — ownerless-Constructor retry path (previously comment-only)
- The retry after resolveStaticCall returns null handles three distinct
  null-return reasons. Cases (a) and (b) were already tested (Interface/
  Trait null-route from SM-12, Swift shadowing dedup from R7). Case (c) —
  resolveStaticCall step-4 bailout when the tiered pool contains
  ownerless Constructor nodes — was only covered by a comment.
- New test: Class + ownerless Constructor in tiered pool, callForm='free'.
  Exercises the full chain:
    1. resolveStaticCall step 3 walks classCandidates via
       lookupMethodByOwner — ownerless Constructor not in methodByOwner,
       nothing found.
    2. Step 4 detects Constructor in tiered pool, bails with null.
    3. resolveFreeCall retry re-runs filterCallableCandidates with
       'constructor' form, which prefers Constructor over Class per
       CONSTRUCTOR_TARGET_TYPES ordering.
    4. Single survivor returned.
- Asserts the Constructor node (not the Class) is the resolved target.

Low — PHP free function coverage gap
- The language coverage table in the same review flagged PHP free
  functions (top-level `function helper()` outside any class) as
  uncovered. Added a test mirroring the existing Go/Python/Rust/Java/
  JS language tests — exercises the `.php` dispatch path for free
  calls. Ruby and C/C++ remain uncovered; deferred to a future round
  since those languages also have other gaps in the broader test file.

Verification
- `tsc --noEmit` clean
- 3066 unit tests pass (+2 new regression tests)
- 1766 integration tests pass
- Zero regressions

Follows-up on: abhigyanpatwari#756 (comment)

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: magyargergo <11230420+magyargergo@users.noreply.github.com>
Co-authored-by: Gergo Magyar <gergomagyar@icloud.com>
zander-raycraft pushed a commit that referenced this pull request May 7, 2026
…npatwari#606 split) (abhigyanpatwari#796)

* feat(group): extractor expansion + manifest extractor

Part 2 of 4 in the split of abhigyanpatwari#606 (ticket: abhigyanpatwari#792). Follows abhigyanpatwari#795
(bridge.lbug storage foundation, already merged), but this PR has no
code-level dependency on abhigyanpatwari#795 — it only imports types and the
ContractExtractor interface that existed on upstream main before
either PR. It could have been reviewed in parallel with abhigyanpatwari#795.

## What changed

Expands the 3 existing contract extractors with substantially more
language/framework coverage, and adds a new `manifest-extractor`
that resolves `group.yaml`-declared cross-links against the per-repo
graph via exact-name lookups.

### New file (228 LOC)

- `gitnexus/src/core/group/extractors/manifest-extractor.ts` —
  exact graph lookup for `group.yaml`-declared cross-links. HTTP
  paths are canonicalized before Route.name matching; gRPC is
  resolved by service/method name (NO `.proto`-filename fallback);
  topic and lib use exact-name match. Falls back to a synthetic
  `manifest::<repo>::<contractId>` uid when the graph has no
  matching symbol, so cross-impact traversal still has a stable
  anchor for the contract.

### Modified extractors (+958 LOC prod)

- `extractors/grpc-extractor.ts` (+522) — `.proto` parser with
  comment and string-literal sanitization (braces inside strings no
  longer truncate service bodies); package/service/method canonical
  IDs; server/client detection across Go (`grpc.NewServer`,
  `RegisterXxxServer`, `XxxGrpc.XxxImplBase`), Java (`@GrpcService`,
  `BlockingStub`), Python (`servicer_to_server`, `XxxStub`), and
  TypeScript/Node (`@GrpcMethod`, `ClientGrpc`, `loadPackageDefinition`).
- `extractors/http-route-extractor.ts` (+174) — Go gin/echo/stdlib
  `HandleFunc`, NestJS `@Controller`+`@Get`/etc, Python FastAPI
  decorators, Java Spring `@RequestMapping`/`@GetMapping`,
  restTemplate / WebClient / OkHttp consumers.
- `extractors/topic-extractor.ts` (+98) — sarama `ProducerMessage{}`
  struct literal detection (replaces a constructor-anchored regex
  that missed topics inside producer loops), kafka-go Writer/Reader,
  Python NATS (`await nc.subscribe`/`await nc.publish`), JetStream
  helpers.

### Modified and new tests (+1264 LOC)

- `grpc-extractor.test.ts` (+539) — full coverage of the new proto
  parser (strings-with-braces regression, comments-with-braces
  regression), per-language server/client detection
- `http-route-extractor.test.ts` (+240) — per-framework route
  extraction + normalization edge cases
- `topic-extractor.test.ts` (+177) — the sarama in-loop regression,
  JetStream, Python NATS, kafka-go Writer/Reader
- `manifest-extractor.test.ts` (+308 NEW) — HTTP path normalization,
  gRPC exact lookup with proto-fallback regression, lib and topic
  exact matching, synthetic-uid fallback behavior

### Self-review fixes folded in

Carried forward from the abhigyanpatwari#606 self-review (commit `d15b8cb`):

- **HIGH #1** — `manifest-extractor.resolveSymbol` was too fuzzy.
  Previously used `CONTAINS` on route/name fields plus an
  unconditional `filePath ENDS WITH '.proto'` fallback for gRPC.
  Consequences: `/orders` matched `/suborders`, and any repo with
  any `.proto` file returned a random proto symbol for a gRPC
  manifest entry. Replaced with exact equality + deterministic
  `ORDER BY` + synthetic-uid fallback for unresolved manifests.
  Regression tests included.
- **MED #3** — gRPC proto parser brace-depth counting now sanitizes
  strings and comments first (`stripProtoCommentsAndStrings`). A
  valid proto with `option deprecated_reason = "use NewService {
  instead"` used to have its service body closed early by the `"{"`
  inside the literal, silently dropping methods after the offending
  string. Regression tests for both string-with-brace and
  comment-with-brace cases.
- **MED #4** — sarama Kafka regex changed from
  `sarama.NewSyncProducer[\s\S]{0,300}?Topic:` (anchored on
  constructor, caught only first topic in a loop) to
  `sarama.ProducerMessage{...Topic:}` (matches every struct literal
  directly). Regression test with a for-loop that constructs
  multiple `ProducerMessage`s.
- **MED #7** — `manifest-extractor.resolveSymbol` no longer has a
  silent `catch { /* fall through */ }`. Errors from the graph
  executor are logged via `console.warn` with link type, contract
  name, repo key, and error message before falling through to the
  synthetic-uid path.

## Why

Reviewer focus here is pure regex / parser correctness — no
storage, no Cypher queries, no algorithmic changes to the cross-link
algorithm. Separating this from the bridge foundation PR (abhigyanpatwari#795)
meant reviewers could stay in a single mental mode (parsing logic)
instead of context-switching between DDL, Cypher, and regex.

## How to verify

- `cd gitnexus && npx tsc --noEmit`
- `cd gitnexus && npx vitest run test/unit/group/grpc-extractor.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/http-route-extractor.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/topic-extractor.test.ts --pool=forks`
- `cd gitnexus && npx vitest run test/unit/group/manifest-extractor.test.ts --pool=forks`

Local pre-push: typecheck clean, all 99 extractor unit tests pass
(grpc 43, http 18, topic 30, manifest 8).

## Risk / rollback

**Low.** Extractors have no user-facing surface in this PR — they
produce `ExtractedContract[]` that is consumed by `sync.ts` in the
next split (abhigyanpatwari#793). No existing behavior changes for users who don't
run a `group sync`. Rollback = `git revert` of the merge commit;
the modifications to `grpc-extractor.ts` / `http-route-extractor.ts`
/ `topic-extractor.ts` revert to the pre-PR versions that still
work (they're subsets of the new functionality).

## Scope discipline (per GUARDRAILS.md)

- Only the 8 files above are touched; no drive-by refactors
- No CI/release/security config changes
- No secrets or machine-specific paths
- Content lifted from abhigyanpatwari#606 (CI 11/11 green on `d15b8cb`)

## Dependencies

- **Base:** `main` (upstream already includes abhigyanpatwari#795 as `1ff324c`)
- **Blocks:** sync pipeline (abhigyanpatwari#793) and the cross-impact feature (abhigyanpatwari#794)
- **Tracker issue:** abhigyanpatwari#792
- **Parent PR:** abhigyanpatwari#606

Co-authored-by: Claude <noreply@anthropic.com>

* refactor(group): migrate topic-extractor from regex to tree-sitter queries

Addresses @magyargergo's feedback on abhigyanpatwari#796 that regex-based lookups
should use tree-sitter nodes instead, and that the top-level
extractors must NOT carry language dependencies. This is phase 1 of
a multi-step migration — topic-extractor first because its patterns
are the most uniform (16 "call/annotation with first-arg string
literal" variants), which makes it a clean proof of the approach
before grpc-extractor and http-route-extractor get the same treatment.

## Architecture: language-agnostic orchestrator + per-language plugins

The top-level extractor is a thin orchestrator that never imports a
tree-sitter grammar or a query string. Per-language knowledge lives
in a new `topic-patterns/` folder with one file per language plus a
registry that maps file extensions to compiled plugins:

```
src/core/group/extractors/
├── tree-sitter-scanner.ts         # shared, language-agnostic scanning utilities
├── topic-extractor.ts              # thin orchestrator (no grammar imports)
└── topic-patterns/
    ├── types.ts                    # TopicMeta, Broker
    ├── index.ts                    # registry: extension → compiled provider
    ├── java.ts                     # tree-sitter-java + JAVA_TOPIC_PROVIDER
    ├── go.ts                       # tree-sitter-go + GO_TOPIC_PROVIDER
    ├── python.ts                   # tree-sitter-python + PYTHON_TOPIC_PROVIDER
    └── node.ts                     # tree-sitter-javascript + tree-sitter-typescript
                                    # → JAVASCRIPT_/TYPESCRIPT_/TSX_TOPIC_PROVIDER
```

**Shared scanner (`tree-sitter-scanner.ts`)** — defines
`PatternSpec<TMeta>`, `LanguagePatterns<TMeta>`, `CompiledPatterns<TMeta>`
and the `scanFile(parser, plugin, content)` helper. Plugins compile their
queries eagerly at module load via `compilePatterns()`, so a broken
pattern fails loudly at import time instead of silently at scan time.
`unquoteLiteral()` handles single/double/template quotes, Python
triple-quoted strings, and Go raw backtick strings.

**Per-language plugins** own:
- the tree-sitter grammar import (this is the ONLY place in
  `src/core/group/` where tree-sitter grammars are imported),
- the query S-expressions,
- the `TopicMeta` payload (role, broker, confidence, symbolName) that
  the orchestrator receives back on every match.

Each plugin uses a `@value` capture name to bind the topic literal node.
The JavaScript and TypeScript grammars share AST node names for every
construct we query, so `node.ts` defines the pattern sources once and
compiles them against `JavaScript`, `TypeScript.typescript`, and
`TypeScript.tsx` — exporting three providers because `Parser.Query`
objects are NOT portable across grammar instances.

**Registry (`topic-patterns/index.ts`)** — maps `.java` → Java provider,
`.go` → Go, `.py` → Python, `.js`/`.jsx` → JS, `.ts` → TS, `.tsx` → TSX.
Also exports `TOPIC_SCAN_GLOB` so adding a new language is a single
file-level edit (drop `topic-patterns/<lang>.ts`, import + register it
here — zero edits required in `topic-extractor.ts`).

**Orchestrator (`topic-extractor.ts`)** — ~110 lines, no grammar or
query imports. Per file: `getProviderForFile(rel)` → `scanFile(parser,
provider, content)` → `unquoteLiteral(valueText)` → `makeContract(...)`.
Reuses one `Parser` instance across files; the scanner calls
`setLanguage` per plugin.

## Why this is better than regex

1. **Comments and strings are respected for free.** The old regex
   would match `// kafkaTemplate.send("fake.topic")` as a real
   producer; tree-sitter never visits comments or string literals as
   code nodes, so false positives from commented-out code are
   eliminated.
2. **Struct/object literal patterns are structural, not textual.**
   `sarama.ProducerMessage{Topic: "..."}` no longer needs a 300-char
   lookahead (which was a known cross-match bug partly mitigated by a
   loop regression test in the self-review). The new query matches a
   specific `composite_literal` with a specific `qualified_type` and
   `keyed_element` — exactly one struct literal per match.
3. **No order-of-operations fragility.** Regex for
   `channel.publish` vs `channel.consume` was independent and
   file-wide; the AST scopes matches to the specific `call_expression`.
4. **Language-agnostic extension.** Adding Ruby, Rust, or C# topic
   detection later means dropping one file in `topic-patterns/` — no
   changes to shared scanner or orchestrator, and no tree-sitter
   imports leak into top-level code.

## Per-file fault tolerance

- Malformed files that tree-sitter can't parse are silently skipped
  (`parser.parse` is wrapped by `scanFile`). The ingestion pipeline
  already logs unparseable files at index time.
- A syntactically invalid query is caught at `compilePatterns` time,
  not scan time — broken plugins fail loudly at import.
- Per-pattern `matches()` failures are swallowed so one broken query
  in a plugin doesn't block the rest.

## Tests

All 30 existing `topic-extractor.test.ts` tests pass **without any
changes to the test file** — they were written as input/output contract
tests (given this source file, expect these `ExtractedContract` objects)
and that contract is unchanged. Regression coverage includes:

- Kafka: Java `@KafkaListener` + `kafkaTemplate.send`; Node
  `producer.send` + `consumer.subscribe`; Go sarama producer/consumer
  (sync and async); kafka-go Writer/Reader; Python `KafkaConsumer` +
  `producer.send/produce`
- RabbitMQ: Java `@RabbitListener` + `rabbitTemplate.convertAndSend`;
  Node `channel.consume/publish/sendToQueue`; Python `basic_consume/
  basic_publish` with keyword args
- NATS: Go and Node `nc.Subscribe/Publish`; Go and Node JetStream
  `js.Subscribe/Publish`; Python `await nc.subscribe/publish`

Including the regression test for the sarama `ProducerMessage`
in-loop case — the AST-based query captures every literal in the
file independently, not just the first one after `NewSyncProducer`.

## Neighbor regression check

- `topic-extractor.test.ts` — 30/30 pass (rewritten extractor)
- `http-route-extractor.test.ts` — 18/18 pass (untouched)
- `grpc-extractor.test.ts` — 43/43 pass (untouched)
- `manifest-extractor.test.ts` — 8/8 pass (untouched)
- Full `npx tsc --noEmit` clean

## Scope discipline (per GUARDRAILS.md)

- Only files under `src/core/group/extractors/` are touched; no
  changes to other extractors, tests, MCP surface, or pipeline.ts.
- No CI/release/security config changes, no secrets.
- New tree-sitter imports all reference grammars that are already
  installed as dependencies (`tree-sitter`, `tree-sitter-javascript`,
  `tree-sitter-typescript`, `tree-sitter-python`, `tree-sitter-java`,
  `tree-sitter-go` — all in `package.json` for the existing pipeline).

## Phase 2 / phase 3 plan

- **Phase 2 (next commit):** rewrite `http-route-extractor.ts`
  Strategy B (regex fallback) on the same plugin pattern. Graph-assisted
  Strategy A stays as-is (already uses pipeline-built tree-sitter data
  via `HANDLES_ROUTE` Cypher queries).
- **Phase 3 (commit after):** rewrite `grpc-extractor.ts` for Java /
  Go / Python / TypeScript detection. `.proto` files are the one
  outstanding question — there is no `tree-sitter-proto` grammar
  installed; the in-tree string-sanitizing parser stays as a pragmatic
  exception with a comment, alternative being to add
  `tree-sitter-proto` as a dep (open for the maintainer).

Co-authored-by: Claude <noreply@anthropic.com>

* refactor(group): migrate http-route-extractor Strategy B to tree-sitter plugins

Phase 2 of the extractor refactor requested by @magyargergo on abhigyanpatwari#796.
Same architecture as the phase 1 topic-extractor rewrite: a thin,
language-agnostic orchestrator plus per-language plugins that own
tree-sitter grammars and query sources. The top-level extractor file
no longer imports any tree-sitter grammar or query string.

## Architecture

```
src/core/group/extractors/
├── tree-sitter-scanner.ts          # shared, language-agnostic primitives
├── http-route-extractor.ts         # thin orchestrator (no grammar imports)
└── http-patterns/
    ├── types.ts                    # HttpDetection, HttpLanguagePlugin, HttpRole
    ├── index.ts                    # registry: ext → plugin + HTTP_SCAN_GLOB
    ├── java.ts                     # tree-sitter-java: Spring + RestTemplate/WebClient/OkHttp
    ├── go.ts                       # tree-sitter-go: gin/echo/HandleFunc + http/resty consumers
    ├── python.ts                   # tree-sitter-python: FastAPI + requests
    ├── php.ts                      # tree-sitter-php: Laravel Route::get/...
    └── node.ts                     # tree-sitter-javascript + tree-sitter-typescript:
                                    #   NestJS controllers, Express, fetch, axios
```

**Shared scanner (`tree-sitter-scanner.ts`)** — generalised from phase 1:
- `ScanMatch<TMeta>.captures` is now a full `CaptureMap` (every named
  capture the query binds, not just a single `@value`). Topic extractor
  updated to read `match.captures.value` accordingly.
- New `runCompiledPatterns(plugin, tree)` helper lets plugins run
  multiple query bundles against the same pre-parsed tree. This is
  needed for HTTP plugins that combine a class-prefix query with a
  method-route query (Spring, NestJS).
- `scanFile` becomes a thin wrapper over `parser.parse + runCompiledPatterns`.

**HTTP plugin shape** — unlike topic plugins, HTTP plugins expose a
`scan(tree)` function rather than a flat pattern list. This reflects
HTTP's more complex extraction: each detection needs method + path +
handler name, and framework patterns like Spring `@RequestMapping` /
NestJS `@Controller` require cross-referencing a class-level prefix
with method-level annotations. Plugins internally use
`compilePatterns` + `runCompiledPatterns` and walk the AST to resolve
the class/method relationships.

**Per-framework coverage:**

- **Java (`java.ts`)**
  - Spring: `@RequestMapping("/api/v2")` class prefix + `@(Get|Post|Put|
    Delete|Patch)Mapping("/sub")` method routes, joined via the
    enclosing `class_declaration` node id.
  - `RestTemplate.getForObject/postForEntity/put/delete/patchForObject` →
    method derived from API name.
  - `WebClient.method(HttpMethod.X, "/path")` → method from
    `HttpMethod.X` capture.
  - `new Request.Builder().url("/path")` → OkHttp consumer.

- **Go (`go.ts`)**
  - gin / echo / chi frameworks: `\w+.GET("/path", handler)` captures
    upper-case verb + handler identifier.
  - `net/http.HandleFunc("/path", handler)` → provider (default GET).
  - `http.Get/Post/Head` consumer, `http.NewRequest("METHOD", ...)`,
    resty `client.R().Get/Post/...`.

- **Python (`python.ts`)**
  - `@app.get("/path")` FastAPI decorators.
  - `requests.get/post/...` and `requests.request("METHOD", "url")`.

- **PHP (`php.ts`)**
  - Laravel `Route::get/post/.../patch('/path', ...)` via
    `scoped_call_expression`. Uses `PHP.php_only` to match the
    existing ingestion pipeline's grammar selection.

- **Node (`node.ts`) — JS + TS + TSX**
  - Pattern sources defined once, compiled against three grammar
    variants (`JavaScript`, `TypeScript.typescript`, `TypeScript.tsx`)
    because `Parser.Query` objects are not portable across grammars.
    Exports three plugins sharing the same `scan` logic.
  - NestJS: `@Controller('prefix')` decorators are siblings of the
    class in `export_statement` / `program`; `@Get(':id')` decorators
    are siblings of the method in `class_body`. The plugin walks
    decorator → next named sibling to find the decorated class /
    method, then combines the class prefix with the method path.
    Only emits NestJS detections when the enclosing class has a real
    `@Controller` decorator — prevents false positives from generic
    classes that happen to use `@Get` from another library.
  - Express: `(router|app).<verb>('/path', ...)`.
  - `fetch(url)` (default GET) + `fetch(url, { method: 'X' })`
    (uses two queries + a SyntaxNode-id dedupe set so URL literals
    aren't double-emitted by the options variant).
  - `axios.get/post/...`.

## Orchestrator changes

`http-route-extractor.ts` drops every `scanXxxProviders` / `scanXxxConsumers`
regex method and replaces them with a single source-scan loop that
delegates to `getPluginForFile(rel).scan(tree)`. The orchestrator
still owns:

- **Path normalization** (`normalizeHttpPath`, `normalizeConsumerPath`)
  — language-agnostic string processing shared by both strategies.
- **Graph-assisted Strategy A** (`HANDLES_ROUTE` / `FETCHES` / `CONTAINS`
  Cypher queries) — unchanged in spirit. The only regex helpers it
  used (`inferMethodFromFileScan`, `pickJavaHandlerName`) are now
  replaced by a lookup against the plugin's detections for the same
  file: for each route row, find the detection whose normalized path
  matches, and pull the HTTP method + handler name from it.
- **Per-file parse cache** — the orchestrator parses each relevant
  file at most once per `extract()` call. Both the graph-assisted
  enrichment loop and the source-scan fallback share the same
  `cachedDetections` map, so we never run the plugin twice for the
  same file.

## Why this is better than the regex version

1. **Comments and strings for free.** The old regex would match
   `// router.get('/fake')` as a real Express route; tree-sitter
   never visits string/comment nodes.
2. **Structural controller-prefix.** Spring and NestJS class-prefix
   joining is now scoped to the enclosing class via `class_declaration`
   node ids, eliminating file-wide state that broke when a file had
   multiple controllers.
3. **Precise NestJS disambiguation.** The plugin only emits a NestJS
   detection when the enclosing class has a real `@Controller`
   decorator — the old regex would fire on any `@Get(...)` in the
   file regardless of surrounding context.
4. **Language-agnostic extension.** Adding Ruby / Rust / Kotlin HTTP
   detection later means dropping one file in `http-patterns/` — no
   changes to the shared scanner, the orchestrator, or the Strategy A
   Cypher queries.

## Tests

- `http-route-extractor.test.ts` — **18/18 pass** (tests unchanged;
  they're contract-style input/output tests and the contract shape is
  unchanged). Covers Spring class prefix, Express, gin/echo, stdlib
  HandleFunc, NestJS, Laravel, FastAPI for providers and
  fetch/axios/python-requests/rest-template/webClient/okhttp/go-stdlib/
  resty for consumers, plus graph-first Strategy A for both.
- `topic-extractor.test.ts` — **30/30 pass** after the `captures.value`
  API migration.
- `grpc-extractor.test.ts` — 43/43 pass (untouched; phase 3).
- `manifest-extractor.test.ts` — 8/8 pass (untouched).
- `service.test.ts`, `sync.test.ts`, `storage.test.ts` — 41/41 pass.
- `npx tsc -p tsconfig.json --noEmit` clean.

## Scope discipline (per GUARDRAILS.md)

- Only files under `src/core/group/extractors/` are touched.
- No changes to pipeline.ts, MCP surface, ingestion, or tests.
- No CI / release / security / secrets changes.
- Tree-sitter grammars imported by plugins (`tree-sitter-java`,
  `tree-sitter-go`, `tree-sitter-python`, `tree-sitter-php`,
  `tree-sitter-javascript`, `tree-sitter-typescript`) are all already
  in `package.json` for the existing ingestion pipeline.

## Phase 3 plan

- **grpc-extractor** gets the same treatment: plugin-per-language under
  `grpc-patterns/` for Java / Go / Python / TS detection. `.proto`
  files remain an open question — no `tree-sitter-proto` grammar is
  installed, so the in-tree string-sanitizing parser from PR abhigyanpatwari#796's
  self-review stays as a pragmatic exception unless the maintainer
  wants us to add `tree-sitter-proto` as a new dep.

Co-authored-by: Claude <noreply@anthropic.com>

* refactor(group): migrate grpc-extractor source scans to tree-sitter plugins

Phase 3 (final) of the extractor refactor requested by @magyargergo on
abhigyanpatwari#796. Same architecture as phase 1 (topic) and phase 2 (http): thin
language-agnostic orchestrator + per-language plugins that own
tree-sitter grammars and query sources. With this commit the top-level
extractors under `src/core/group/extractors/` import ZERO tree-sitter
grammars and ZERO query strings — every grammar import lives in a
`*-patterns/<lang>.ts` plugin file, and the orchestrators go through
the registry indirection.

## Architecture

```
src/core/group/extractors/
├── tree-sitter-scanner.ts         # shared primitives (unchanged)
├── grpc-extractor.ts               # orchestrator (only `.proto` parser left)
└── grpc-patterns/
    ├── types.ts                    # GrpcDetection, GrpcLanguagePlugin, GrpcRole
    ├── index.ts                    # registry: ext → plugin + GRPC_SCAN_GLOB
    ├── go.ts                       # tree-sitter-go: RegisterXxxServer, Unimplemented, NewXxxClient
    ├── java.ts                     # tree-sitter-java: @GrpcService + XxxImplBase + newBlockingStub
    ├── python.ts                   # tree-sitter-python: add_XxxServicer_to_server + XxxStub
    └── node.ts                     # tree-sitter-javascript + tree-sitter-typescript:
                                    #   @GrpcMethod, @GrpcClient field type,
                                    #   .getService<X>('Svc'), new XxxServiceClient,
                                    #   loadPackageDefinition dynamic constructors
```

## Per-language coverage

**Go (`go.ts`)**
- Provider: `\w+.RegisterXxxServer(...)` via `call_expression →
  selector_expression → field_identifier` + JS regex filter
  `^Register(\w+)Server$`.
- Provider: `pb.UnimplementedXxxServer` embedded in a struct via
  `struct_type → field_declaration_list → field_declaration →
  qualified_type → type_identifier` + JS filter.
- Consumer: `\w+.NewXxxClient(...)` via the same call_expression
  query + JS filter `^New(\w+)Client$`.

**Java (`java.ts`)**
- Provider: `class X extends YyyGrpc.YyyImplBase` — two queries
  handle the scoped and plain forms. `scoped_type_identifier`'s
  children are positional (no `scope:`/`name:` fields), so the
  query matches the two `type_identifier` children by position.
- `#match? @inner "ImplBase$"` restricts matches at query time.
- Whether the class has `@GrpcService` or not controls only the
  `source` metadata label — the plugin walks the class_declaration's
  `modifiers` child in JS to detect the marker_annotation.
- Consumer: `YyyGrpc.newStub(ch)` / `newBlockingStub(ch)` via a
  `method_invocation` query with `#match? @method
  "^new(Blocking)?Stub$"`, service name extracted via
  `^(\w+)Grpc$` on the object identifier.

**Python (`python.ts`)**
- Single call-expression query covers both bare identifier and
  `obj.method` attribute forms:
  `(call function: [(identifier) @fn (attribute attribute: (identifier) @fn)])`.
- Plugin filters `@fn.text` against two JS regexes:
  `^add_(\w+)Servicer_to_server$` (provider) and `^(\w+)Stub$`
  (consumer), with a reserved-names ignore list for the Stub case
  (Mock / Test / Fake / Stub).

**Node — JavaScript + TypeScript + TSX (`node.ts`)**
- Pattern sources defined once, compiled three times (one per grammar)
  because `Parser.Query` objects are not portable across grammars.
  Exports three `GrpcLanguagePlugin`s sharing the same `scan`.
- `@GrpcMethod('Service', 'Method')`: decorator query captures the
  two string literals. Confidence is hard-coded 0.8 regardless of
  proto map resolution (matches the original regex version's
  behaviour).
- `@GrpcClient(...) field: XxxServiceClient`: decorator query
  captures the decorator node, plugin walks up to find the enclosing
  `public_field_definition` (decorators on fields are CHILDREN of
  the field definition in tree-sitter-typescript, not siblings) and
  reads its first `type_annotation → type_identifier`, then runs the
  `^(\w+Service)Client$` JS filter.
- `client.getService<X>('AuthService')`: call-expression query on
  `member_expression.property = "getService"` + string literal arg.
- `new XxxServiceClient(...)`: `new_expression` with a bare
  identifier constructor, filtered by `^(\w+Service)Client$` so
  generic `new AuthClient(...)` (missing the `Service` infix) does
  NOT falsely register as a consumer. Preserves the regression test
  `test_extract_ts_non_service_client_constructor_is_ignored`.
- `loadPackageDefinition` dynamic loader: gated on
  `tree.rootNode.text.includes('loadPackageDefinition')`. When set,
  `new foo.bar.Xxx(...)` qualified constructors with a capitalised
  property name register as consumers.

## Orchestrator changes

`grpc-extractor.ts` loses every `scanGoProviders` / `scanJavaProviders`
/ ... helper and replaces them with a single source-scan loop that:

1. Parses each file with the plugin's grammar (one shared `Parser`
   instance across all files, `setLanguage` called per plugin).
2. Calls `plugin.scan(tree)` to get `GrpcDetection[]`.
3. Converts each detection to an `ExtractedContract` via the private
   `detectionToContract` helper, which:
   - Looks the short service name up in the proto map (filled by
     the `.proto` parser).
   - Picks confidence = `confidenceWithProto` if resolved, else
     `confidenceWithoutProto`.
   - Builds a method-level contract id (`grpc::pkg.Svc/Method`) when
     the detection carries a `methodName` (TS `@GrpcMethod` only),
     otherwise a service-level id (`grpc::pkg.Svc/*`).

Everything else — the `.proto` parser, `buildProtoContext`,
`buildProtoMap`, `resolveProtoConflict`, `serviceContractId`,
`stripProtoCommentsAndStrings`, `extractServiceBlocks`, the dedupe
function — stays exactly as before. The `.proto` parser is kept as a
pragmatic exception to the "no regex in extractors" rule because no
`tree-sitter-proto` grammar is installed in the repo; a comment at the
top of the file explains this and flags the maintainer option of
adding `tree-sitter-proto` as a dependency.

## Why this is better than the regex version

1. **Comments and strings are respected for free.** Matched node types
   are only code constructs, never text inside comments or string
   literals.
2. **No false positives on partial names.** The old `(\w+?)Grpc`-style
   regexes would cross-match unrelated identifiers; structural queries
   restrict matches to the exact AST shape (`scoped_type_identifier →
   type_identifier` pairs, `method_invocation → identifier` etc.).
3. **NestJS `@GrpcClient` is structural, not regex-based.** The old
   regex required a specific textual layout
   (`@GrpcClient(...) private readonly foo!: XxxServiceClient`); the
   plugin now walks the AST, so modifier order / optional modifiers /
   multi-line formatting don't break it.
4. **Language-agnostic extension.** Adding Kotlin / Rust / C# gRPC
   detection later is a one-file edit in `grpc-patterns/index.ts` —
   no touches to the shared scanner, the orchestrator, or the proto
   parser.

## Tests

- `grpc-extractor.test.ts` — **43/43 pass** (tests unchanged; the
  contract shape is identical). Covers .proto parsing (including the
  brace-inside-string regression), Go provider/consumer,
  Java @GrpcService / plain ImplBase provider + newBlockingStub
  consumer, Python servicer + stub, TS @GrpcMethod + @GrpcClient +
  .getService + new XxxServiceClient + loadPackageDefinition + the
  `AuthClient` vs `AuthServiceClient` discrimination, dedupe across
  multiple patterns in one file, proto-aware confidence, and the
  inherited-package resolution for split proto definitions.
- `topic-extractor.test.ts` — 30/30 pass.
- `http-route-extractor.test.ts` — 18/18 pass.
- `manifest-extractor.test.ts` — 8/8 pass.
- `service.test.ts`, `sync.test.ts`, `storage.test.ts` — 41/41 pass.
- `npx tsc -p tsconfig.json --noEmit` clean.

## Scope discipline (per GUARDRAILS.md)

- Only files under `src/core/group/extractors/` are touched.
- No pipeline.ts, MCP surface, ingestion, CI / release / security, or
  test changes.
- New tree-sitter grammar imports (`tree-sitter-go`, `tree-sitter-java`,
  `tree-sitter-python`, `tree-sitter-javascript`, `tree-sitter-typescript`)
  are all already installed for the ingestion pipeline.

## End of phase series

This commit completes the three-phase extractor refactor:
  - **Phase 1** (`ea06d11`): topic-extractor → `topic-patterns/`
  - **Phase 2** (`b6015f6`): http-route-extractor → `http-patterns/`
  - **Phase 3** (this commit): grpc-extractor → `grpc-patterns/`

Every remaining regex-based extractor helper under the `src/core/group/
extractors/` directory is either (a) language-agnostic string
processing (path normalization, dedupe keys) or (b) the `.proto`
parser, which is documented as an explicit exception.

Co-authored-by: Claude <noreply@anthropic.com>

* feat(group): add tree-sitter-proto for .proto file parsing

Addresses @magyargergo's suggestion on abhigyanpatwari#796 to replace the manual
string-sanitizing .proto parser with a tree-sitter grammar.

- **Vendored `tree-sitter-proto`** in `vendor/tree-sitter-proto/`.
  Grammar source from [coder3101/tree-sitter-proto](https://github.com/coder3101/tree-sitter-proto)
  (latest `grammar.js`), parser.c regenerated with `tree-sitter-cli
  0.24` to produce ABI version 14 — compatible with the project's
  `tree-sitter 0.25` runtime (which supports ABI ≤ 14). Added as
  `optionalDependency` with `file:./vendor/tree-sitter-proto`.

- **New `grpc-patterns/proto.ts` plugin** — uses the same
  `compilePatterns` + `runCompiledPatterns` infrastructure as every
  other plugin. Two queries:
  - `(package (full_ident) @pkg)` — package declaration
  - `(service (service_name) @service_name (rpc (rpc_name) @rpc_name))`
    — one match per (service, rpc) pair

- **Graceful fallback** — `tree-sitter-proto` is an optional
  dependency. If it fails to install (platform incompatibility) or
  fails the runtime smoke-test (`setLanguage` + `parse` on a trivial
  proto), `PROTO_GRPC_PLUGIN` stays `null` and the orchestrator
  uses the existing manual parser. The smoke-test catches the
  `SyntaxNode` TDZ error that occurs in vitest's fork-based test
  runner.

- **Orchestrator updated** — when `hasProtoPlugin` is true, `.proto`
  files are handled by the plugin loop (they're included in
  `GRPC_SCAN_GLOB`), and the manual `parseProtoFile` loop is
  skipped. `buildProtoContext` still runs to build the proto map
  for cross-referencing source-file detections.

1. **No manual comment/string stripping.** The old parser needed
   `stripProtoCommentsAndStrings` (110 lines) to avoid counting
   braces inside comments and string literals. tree-sitter handles
   this natively.
2. **No brace-depth tracking.** `extractServiceBlocks` used a manual
   depth counter to find service boundaries. tree-sitter's AST gives
   us `service` → `service_name` + `rpc` → `rpc_name` directly.
3. **Performance.** tree-sitter's C-based parser is faster than
   character-by-character JS scanning + regex on large proto files.

- `grpc-extractor.test.ts` — **43/43 pass** (unchanged)
- All other extractor tests — 99/99 pass
- `npx tsc -p tsconfig.json --noEmit` clean

Co-authored-by: Claude <noreply@anthropic.com>

* chore: add .gitignore for vendored tree-sitter-proto build artifacts

https://claude.ai/code/session_01SFUCxgKMMQ8EgRHYw91xPU

* fix: correct .gitignore paths for vendored tree-sitter-proto

Patterns should be relative to the .gitignore file's directory.

https://claude.ai/code/session_01SFUCxgKMMQ8EgRHYw91xPU

* refactor(group): address Copilot review feedback on abhigyanpatwari#796

Six fixes suggested by the Copilot AI review:

1. **`normalizeHttpPath` root-path edge case** — stripping trailing
   slashes on the input `/` produced an empty string, yielding
   malformed contract ids like `http::GET::`. Now preserves `/` for
   the root handler/fetch case.

2. **Dedupe `scanFiles` call** — `extract()` was globbing the
   source-scan file list twice (once for the provider fallback, once
   for the consumer fallback). Moved to a single lazy call that
   memoizes the result for the rest of the method.

3. **HTTP `scanFiles` now ignores `**/vendor/**`** — every other
   extractor's glob already ignored vendored sources; the HTTP one
   didn't. Fixed for consistency.

4. **`loadPackageDefinition` check is now structural** — was calling
   `tree.rootNode.text.includes('loadPackageDefinition')` which forces
   materialization of the entire file text from the parse tree
   (expensive on large files). Replaced with a dedicated compiled
   query on `(call_expression function: [(identifier) | (member_expression)])`
   so the check stays in the AST domain.

5. **`grpc-extractor.ts` header docstring updated** — still claimed
   ".proto parsing is not tree-sitter-based because no grammar is
   installed". Now describes the actual behaviour: tree-sitter when
   `tree-sitter-proto` is available (optionalDependency), manual
   fallback otherwise.

6. **Eliminated the double proto file parse on the fallback path** —
   `buildProtoContext` already globs + parses every `.proto` file to
   build `servicesByName`. On the `!hasProtoPlugin` branch the
   extractor was globbing + parsing again via the now-removed
   `parseProtoFile` helper. The fallback branch now iterates the map
   that `buildProtoContext` already produced to emit provider
   contracts directly — single pass per proto file.

## Tests

- `topic-extractor.test.ts` — 30/30 pass
- `http-route-extractor.test.ts` — 18/18 pass
- `grpc-extractor.test.ts` — 43/43 pass
- `manifest-extractor.test.ts` — 8/8 pass
- `npx tsc -p tsconfig.json --noEmit` clean

Co-authored-by: Claude <noreply@anthropic.com>

* refactor(group): address Claude review feedback (bugs + dedup + hygiene) on abhigyanpatwari#796

Follows up `2f28bfc` with the remaining items from the Claude AI review:

## Bugs

**Bug 2 — Label-unaware Cypher queries in `resolveSymbol`.**
The manifest-extractor's lookup queries were `MATCH (n) WHERE n.name = $x`
with no label filter, so a topic/service/package name could silently match
any node type (File, Variable, Import, Folder, …). Added label filters:
- `topic` → `(n:Function|Method|Class|Interface)` (topics are best-effort
  symbol-name matches against listener/publisher symbols)
- `grpc` method → `(n:Function|Method)`
- `grpc` service → `(n:Class|Interface)`
- `lib` → `(n:Package|Module)`

All 8 manifest-extractor tests still pass (mock executor is
label-agnostic, but the production LadybugDB graph now gets correctly
scoped queries).

**Bug 8 — Tautological `!handlerName` condition.**
`http-route-extractor.ts:extractProvidersGraph` had
`let handlerName = null; if (!method || !handlerName) { ... }` — the
`!handlerName` clause was always true since there was no intervening
assignment. Simplified to always run the plugin-scan lookup (we need
the handler name even when `methodFromRouteReason` already resolved
the method).

## Clean code / dedup

**Design 7 — `readSafe` was copy-pasted in all three orchestrators.**
Extracted to `extractors/fs-utils.ts` as the single source of truth
for the path-traversal guard. Dropped the three local copies and the
now-unused `fs`/`path` imports from topic-extractor.

**Style 10 — Language-specific `_test.go` skip in the topic orchestrator.**
Was `if (rel.endsWith('_test.go')) continue;` inside the language-
agnostic extraction loop. Pushed into the glob's ignore list
(`'**/*_test.go'`) alongside the existing `node_modules`, `vendor`,
`dist`, `build` entries, with a comment explaining that other
languages' test file conventions either live in separate directories
(Python `tests/`, Java `src/test/`) or are already covered by the
existing ignores.

## Already addressed in `2f28bfc` (mentioned again in Claude review)

- Bug 3: `normalizeHttpPath('/')` returns `''` — fixed
- Bug 4: double glob + double parse of `.proto` — fixed
- Bug 5: `scanFiles` called twice in HTTP — fixed
- Bug 6: missing `**/vendor/**` in HTTP glob — fixed
- Design 9 partially: `tree.rootNode.text.includes('loadPackageDefinition')`
  replaced with a dedicated structural query

## Deferred

- Bug 1 (`http::*::path` vs `http::GET::path` matching) — out of scope;
  sync.ts matching logic lands in abhigyanpatwari#793, manifest extractor already
  emits correct synthetic uids for unresolved HTTP contracts.
- Design 9 full (change plugin `scan(tree)` → `scan(tree, source)`) —
  the only real use case (`loadPackageDefinition` gate) is already
  fixed via a structural query, so the interface change would be
  cosmetic churn without a concrete consumer.

## Tests

- `topic-extractor.test.ts` — 30/30 pass
- `http-route-extractor.test.ts` — 18/18 pass
- `grpc-extractor.test.ts` — 43/43 pass
- `manifest-extractor.test.ts` — 8/8 pass
- `npx tsc -p tsconfig.json --noEmit` clean

Co-authored-by: Claude <noreply@anthropic.com>

* docs+fix(group): address remaining Claude review items + add pipeline flow chart

## Fixes

**Remaining 🔴 — HTTP contract id wildcard format.** Documented the
`http::*::<path>` format as an intentional wildcard for manifest links
that omit the HTTP method, alongside the explicit-method form
(`GET::/path` → `http::GET::/path`). The docblock on `buildContractId`
now states both forms, notes that wildcard-aware matching is the
responsibility of the sync / cross-impact layer (abhigyanpatwari#793), and
recommends the explicit-method form whenever the author knows the
method (it round-trips through exact equality without needing
wildcard logic downstream). Tests unchanged — the wildcard format is
what they've always asserted.

**Minor 1 — stale comment at `manifest-extractor.ts:124-126`.** The
comment claimed "creates a contract with an empty symbolUid/ref" but
the code switched to `manifestSymbolUid(repo, contractId)` a few
commits back. Updated to describe the actual synthetic-uid fallback
semantics and the cross-impact path that relies on both sides of the
join deriving the same uid.

**Minor 2 — exhaustiveness guard on `buildContractId`.** The
`switch(type)` covered all five current `ContractType` variants but
silently returned `undefined` if a new variant was added. Added a
`default: const _exhaustive: never = type; throw new Error(...)`
clause so the build fails loudly on an unhandled variant.

**Minor 3 — `tree.rootNode.text` in `grpc-patterns/node.ts`.** Already
fixed in `2f28bfc` via a dedicated structural query
(`LOAD_PACKAGE_DEFINITION_SPEC`). No action needed.

## New: pipeline flow chart (per @magyargergo's request)

Added `src/core/group/PIPELINE.md` with four Mermaid diagrams:
1. **High-level overview** — `group.yaml` → extractors + manifest →
   contract matching → `bridge.lbug` → `runGroupImpact`.
2. **Per-repo extractor two-strategy shape** — graph-assisted
   Strategy A vs. source-scan Strategy B.
3. **Plugin architecture** — orchestrator → registry →
   per-language `*-patterns/<lang>.ts` → `tree-sitter-scanner.ts` →
   `ExtractedContract`.
4. **Manifest extraction** — label-scoped `resolveSymbol` with the
   synthetic-uid fallback.
5. **Cross-impact query (abhigyanpatwari#606)** — local impact → bridge join →
   cross-repo fan-out.

Each diagram is annotated with which PRs own which stage (this PR:
extractors + manifest; abhigyanpatwari#795: bridge storage; abhigyanpatwari#606: cross-impact
runtime) and points at the concrete files/functions involved.

## Tests

- 99/99 extractor tests pass
- `npx tsc -p tsconfig.json --noEmit` clean

Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant