Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions gitnexus/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions gitnexus/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@
"optionalDependencies": {
"tree-sitter-dart": "github:UserNobody14/tree-sitter-dart#80e23c07b64494f7e21090bb3450223ef0b192f4",
"tree-sitter-kotlin": "^0.3.8",
"tree-sitter-proto": "file:./vendor/tree-sitter-proto",
"tree-sitter-swift": "^0.6.0"
},
"devDependencies": {
Expand Down
139 changes: 139 additions & 0 deletions gitnexus/src/core/group/PIPELINE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Group Analysis Pipeline

Flow chart of the cross-repo contract extraction + matching pipeline.
This covers what runs **inside this PR** (extractors + manifest) and
the downstream handoff to the bridge storage (PR #795) and
cross-impact query (PR #606).

## High-level overview

```mermaid
flowchart TD
A[group.yaml] --> B[GroupConfig parser]
B --> C{For each repo<br/>in group}
C --> D[Per-repo LadybugDB<br/>indexed by main pipeline]

D --> E1[TopicExtractor]
D --> E2[HttpRouteExtractor]
D --> E3[GrpcExtractor]

E1 --> F[ExtractedContract array<br/>per repo]
E2 --> F
E3 --> F

B --> M[ManifestExtractor]
M --> G[Manifest contracts<br/>+ cross-links]

F --> H[Contract matching<br/>exact + wildcard]
G --> H

H --> I[(bridge.lbug<br/>#795)]

I --> J[runGroupImpact<br/>#606]
J --> K[CrossRepoImpact]
```

## Per-repo extractor pipeline

Each extractor under `src/core/group/extractors/` follows the same
two-strategy shape:

```mermaid
flowchart TD
R[RepoHandle + CypherExecutor<br/>for this repo] --> S{Graph-assisted<br/>Strategy A<br/>available?}

S -->|yes| A1[Cypher query against<br/>per-repo LadybugDB]
A1 --> A2{non-empty<br/>result?}
A2 -->|yes| OUT[ExtractedContract array]
A2 -->|no| B1

S -->|no| B1[Source-scan Strategy B]
B1 --> B2[glob repo source files]
B2 --> B3{ext in registry?}
B3 -->|yes| B4[Per-language plugin<br/>scan parsed tree]
B3 -->|no| SKIP[skip file]
B4 --> OUT

SKIP --> B2
```

**Strategy A** (graph-assisted) uses Cypher over edges already produced
by the main ingestion pipeline:
- HTTP: `HANDLES_ROUTE` / `FETCHES` edges from `(File)-[]->(Route)`
- topic: none (pipeline doesn't yet produce topic nodes — Strategy B only)
- gRPC: none (Strategy B + proto map only)

**Strategy B** (source-scan) is 100% tree-sitter based after this PR.
Each `*-patterns/<lang>.ts` plugin owns its grammar + S-expression
queries; the top-level orchestrator imports neither.

## Plugin architecture

```mermaid
flowchart LR
O[Orchestrator<br/>topic|http|grpc-extractor.ts] --> REG[REGISTRY<br/>*-patterns/index.ts]
REG --> P1[java.ts<br/>tree-sitter-java]
REG --> P2[go.ts<br/>tree-sitter-go]
REG --> P3[python.ts<br/>tree-sitter-python]
REG --> P4[node.ts<br/>JS + TS + TSX]
REG --> P5[php.ts<br/>tree-sitter-php<br/>HTTP only]
REG --> P6[proto.ts<br/>tree-sitter-proto<br/>gRPC only, optional]

P1 --> SCAN[tree-sitter-scanner.ts<br/>compilePatterns + runCompiledPatterns]
P2 --> SCAN
P3 --> SCAN
P4 --> SCAN
P5 --> SCAN
P6 --> SCAN

SCAN --> DET[Detection objects<br/>TopicMeta / HttpDetection / GrpcDetection]
DET --> O
O --> CT[ExtractedContract array]
```

The orchestrator never imports a grammar. Adding a new language /
framework = drop one file in `*-patterns/`, register it in
`index.ts`. No orchestrator edits required.

## Manifest extraction

```mermaid
flowchart TD
Y[group.yaml links] --> ME[ManifestExtractor]
ME --> LOOP{for each link}
LOOP --> RES[resolveSymbol<br/>label-scoped Cypher]
RES --> OK{found?}
OK -->|yes| REF[real symbol uid + ref]
OK -->|no| SYN[synthetic uid<br/>manifest::repo::cid]

REF --> EMIT[emit provider + consumer<br/>Contract objects<br/>+ CrossLink]
SYN --> EMIT

EMIT --> BRIDGE[(bridge.lbug<br/>#795)]
```

Label-scoped queries in `resolveSymbol` keep accidental cross-matches
out:
- `topic` → `(n:Function|Method|Class|Interface)`
- `grpc` method → `(n:Function|Method)`, service → `(n:Class|Interface)`
- `lib` → `(n:Package|Module)`

## Cross-impact query (PR #606)

```mermaid
flowchart TD
U[User changes symbol S<br/>in repo R] --> LI[Local impact engine<br/>per-repo uid expansion]
LI --> IDS[Affected uid set]

IDS --> BR[Bridge query<br/>MATCH Contract WHERE uid IN ids]
BR --> CL[CrossLink traversal]
CL --> OTHER[Matching contract in<br/>other repo]

OTHER --> FE[Fan-out impact<br/>to consuming repo]
FE --> OUT[CrossRepoImpact<br/>per affected repo]
```

The bridge stores every extracted contract keyed by `symbolUid`.
Manifest-sourced contracts use the synthetic uid form so both sides
of the `(local impact) ↔ (bridge query)` join derive the same uid
without coordinating through any shared state.
23 changes: 23 additions & 0 deletions gitnexus/src/core/group/extractors/fs-utils.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import * as fs from 'node:fs';
import * as path from 'node:path';

/**
* Safely read a file inside a repo, rejecting any path that escapes
* `repoPath` via `..` traversal or absolute segments. Returns `null` if
* the path is outside the repo or the file can't be read.
*
* Used by every source-scan extractor under this directory. Kept as a
* single shared implementation so the path-traversal guard (security-
* sensitive) lives in exactly one place.
*/
export function readSafe(repoPath: string, rel: string): string | null {
const abs = path.resolve(repoPath, rel);
const base = path.resolve(repoPath);
const relToBase = path.relative(base, abs);
if (relToBase.startsWith('..') || path.isAbsolute(relToBase)) return null;
try {
return fs.readFileSync(abs, 'utf-8');
} catch {
return null;
}
}
Loading
Loading