diff --git a/docs/code-indexing/cobol/README.md b/docs/code-indexing/cobol/README.md
new file mode 100644
index 0000000000..c96eb4626f
--- /dev/null
+++ b/docs/code-indexing/cobol/README.md
@@ -0,0 +1,100 @@
+# COBOL Code Indexing
+
+GitNexus indexes COBOL codebases using a **regex-only extraction** strategy, bypassing tree-sitter entirely. This document explains why, how the pipeline works, and links to detailed sub-documents.
+
+## Why Regex-Only?
+
+The tree-sitter-cobol grammar (v0.0.1) has three critical limitations that make it unusable for production indexing:
+
+| Issue | Impact | Severity |
+|-------|--------|----------|
+| External scanner hangs on ~5% of files | No timeout mechanism exists for the C scanner; the process blocks indefinitely | **Blocking** |
+| Only ~15% of paragraph headers detected | Most procedure-division paragraphs are invisible to the grammar | High |
+| Patch markers in cols 1-6 cause parse errors | Enterprise COBOL uses non-standard sequence area content (e.g., `mzADD`, `estero`, `#FIX`) | High |
+
+Because the external scanner hang cannot be interrupted (there is no `setTimeoutMicros` equivalent for tree-sitter), using tree-sitter-cobol would hang the indexing pipeline on a non-trivial fraction of real-world files.
+
+The regex-only approach provides:
+
+- **Speed**: ~1ms per file average extraction time
+- **Reliability**: zero hangs, zero crashes across 13,000+ files
+- **Coverage**: captures all critical symbols -- program name, paragraphs, sections, CALL, PERFORM, COPY, data items (01-77, 88-level), file declarations, FD entries, EXEC SQL/CICS blocks, ENTRY points, and MOVE statements
+
+## Architecture
+
+```mermaid
+flowchart TD
+ A[Repository Scan] --> B{File Detection}
+ B -->|Extension match| C[COBOL file]
+ B -->|GITNEXUS_COBOL_DIRS match| C
+ B -->|No match| Z[Skip]
+
+ C --> D{Copybook?}
+ D -->|Yes| E[Add to Copybook Map]
+ D -->|No| F[Source Program]
+
+ E --> G[COPY Expansion Engine]
+ F --> G
+
+ G -->|Inline copybook content| H[Expanded Source]
+ H --> I[Patch Marker Cleanup]
+ I --> J[Regex State Machine]
+
+ J --> K[Extracted Symbols]
+ K --> L[Graph Model Builder]
+ L --> M[Knowledge Graph]
+
+ subgraph "Per-Chunk Processing"
+ G
+ H
+ I
+ J
+ K
+ L
+ end
+
+ subgraph "Post-Processing"
+ M --> N[Community Detection]
+ M --> O[Process Detection]
+ M --> P[Contract Detection]
+ end
+
+ style J fill:#e8f5e9,stroke:#2e7d32
+ style G fill:#e3f2fd,stroke:#1565c0
+```
+
+## COBOL vs Tree-Sitter Languages
+
+| Feature | COBOL (Regex) | Tree-Sitter Languages |
+|---------|--------------|----------------------|
+| Parser | Single-pass regex state machine | tree-sitter grammar + queries |
+| Speed | ~1ms/file | ~5ms/file |
+| AST available | No | Yes |
+| COPY expansion | Yes (pre-processing step) | N/A |
+| Deep indexing | Data items, SQL, CICS, FD, ENTRY | Type annotations, generics, etc. |
+| Call extraction | PERFORM (intra-file) + CALL (cross-program) | AST-based call site detection |
+| Import extraction | COPY statements | `import`/`require`/`use`/`#include` |
+| Coverage | All critical symbols | Language-dependent query coverage |
+| Failure mode | Never hangs | External scanner can hang (COBOL only) |
+
+## Sub-Documents
+
+| Document | Description |
+|----------|-------------|
+| [File Detection](./file-detection.md) | Extension mapping, `GITNEXUS_COBOL_DIRS`, copybook classification |
+| [COPY Expansion](./copy-expansion.md) | Copybook inlining, REPLACING transformations, cycle detection |
+| [Regex Extraction](./regex-extraction.md) | State machine, regex patterns, line processing |
+| [Deep Indexing](./deep-indexing.md) | Data items, EXEC SQL/CICS, file declarations, FD, ENTRY, MOVE |
+| [Graph Model](./graph-model.md) | COBOL-specific node types, edge types, full annotated example |
+| [Performance](./performance.md) | Benchmarks, worker pool tuning, caps, troubleshooting |
+
+## Key Source Files
+
+| File | Purpose |
+|------|---------|
+| `gitnexus/src/core/ingestion/cobol-preprocessor.ts` | Patch marker cleanup + regex extraction engine |
+| `gitnexus/src/core/ingestion/cobol-copy-expander.ts` | COPY statement expansion with REPLACING |
+| `gitnexus/src/core/ingestion/utils.ts` | `getLanguageFromPath`, `getLanguageFromFilename` |
+| `gitnexus/src/core/ingestion/pipeline.ts` | `isCobolCopybook`, `expandCobolCopies`, `detectCrossProgamContracts` |
+| `gitnexus/src/core/ingestion/workers/parse-worker.ts` | `processCobolRegexOnly` -- graph model builder |
+| `gitnexus/src/core/ingestion/workers/worker-pool.ts` | Configurable sub-batch size for COBOL |
diff --git a/docs/code-indexing/cobol/copy-expansion.md b/docs/code-indexing/cobol/copy-expansion.md
new file mode 100644
index 0000000000..7c6aaa2a3e
--- /dev/null
+++ b/docs/code-indexing/cobol/copy-expansion.md
@@ -0,0 +1,157 @@
+# COBOL COPY Expansion
+
+The COPY statement is COBOL's include mechanism -- analogous to `#include` in C or `import` in modern languages. GitNexus expands COPY statements **before** regex extraction so that symbols defined inside copybooks (data items, paragraphs, etc.) are visible in the program's extracted graph.
+
+## Supported Syntax
+
+### Basic COPY
+
+```cobol
+COPY CPSESP.
+COPY "WORKGRID.CPY".
+```
+
+Inlines the content of the named copybook, replacing the COPY line(s).
+
+### COPY with REPLACING
+
+```cobol
+COPY CPSESP REPLACING "ANAZI-KEY" BY "LK-KEY".
+COPY CPSESP REPLACING LEADING "ESP-" BY "LK-ESP-"
+ LEADING "KPSESPL" BY "LK-KPSESPL".
+COPY LINKAGE REPLACING TRAILING "-IN" BY "-OUT".
+```
+
+Three REPLACING types are supported:
+
+| Type | Syntax | Behavior | Example |
+| ------------ | ------------------------------------ | --------------------------------------- | -------------------------------- |
+| **EXACT** | `REPLACING "OLD" BY "NEW"` | Replace exact identifier matches | `ANAZI-KEY` becomes `LK-KEY` |
+| **LEADING** | `REPLACING LEADING "PFX-" BY "NEW-"` | Replace prefix on all COBOL identifiers | `ESP-NAME` becomes `LK-ESP-NAME` |
+| **TRAILING** | `REPLACING TRAILING "-IN" BY "-OUT"` | Replace suffix on all COBOL identifiers | `DATA-IN` becomes `DATA-OUT` |
+
+Multiple REPLACING clauses can appear in a single COPY statement. They are applied in order to each COBOL identifier in the copybook content.
+
+### Multi-Line COPY
+
+COPY statements can span multiple lines (standard COBOL continuation rules apply):
+
+```cobol
+ COPY CPSESP REPLACING
+ - LEADING "ESP-" BY "LK-ESP-"
+ - LEADING "KPSESPL" BY "LK-KPSESPL".
+```
+
+Continuation lines (indicator `-` in column 7) are merged before COPY statement scanning.
+
+## Expansion Flow
+
+```mermaid
+sequenceDiagram
+ participant Pipeline
+ participant Expander as COPY Expander
+ participant Resolver
+ participant Reader
+
+ Pipeline->>Pipeline: Identify all COBOL files
+ Pipeline->>Pipeline: Classify copybooks vs programs
+ Pipeline->>Reader: Read all copybook content upfront
+ Reader-->>Pipeline: Copybook content map (name -> content)
+
+ loop For each source file in chunk
+ Pipeline->>Expander: expandCopies(content, filePath, resolveFile, readFile)
+ Expander->>Expander: Merge continuation lines
+ Expander->>Expander: Detect COPY statements via regex
+
+ loop For each COPY statement (reverse order)
+ Expander->>Resolver: resolveFile(copyTarget)
+ Resolver-->>Expander: Copybook key or null
+
+ alt Resolved successfully
+ Expander->>Reader: readFile(resolvedKey)
+ Reader-->>Expander: Copybook content
+
+ Expander->>Expander: Apply REPLACING transformations
+ Expander->>Expander: Recurse for nested COPYs (depth + 1)
+ Expander->>Expander: Splice expanded content into output
+ else Not resolved
+ Expander->>Expander: Keep original COPY line
+ end
+ end
+
+ Expander-->>Pipeline: Expanded content + resolution metadata
+ Pipeline->>Pipeline: Replace file content with expanded content
+ end
+```
+
+The return type `CopyExpansionResult` contains `expandedContent` and `copyResolutions`. The `expansionDepth` field has been removed from the return type (it was unused by callers).
+
+COPY statement line numbers in `CopyResolution` are 1-based (consistent with the preprocessor's line numbering). The splice operation that replaces COPY lines with expanded content adjusts for 0-based array indexing internally.
+
+## Cycle Detection
+
+Circular COPY references (e.g., copybook A includes copybook B which includes copybook A) are detected and handled:
+
+1. Each expansion chain maintains a `visited` set of resolved copybook paths
+2. If a copybook path is already in the visited set, the expansion is skipped
+3. A `warnedCircular` set (internal to `expandCopies()`, not a parameter) deduplicates warning messages within a single file expansion
+
+Known circular copybooks in PROJECT-NAME: `ANAZI`, `ANDIP`, `QDIPE` (self-referential includes).
+
+## Max Depth
+
+Nested COPY expansion is limited to **10 levels** (`DEFAULT_MAX_DEPTH`). If a COPY chain exceeds this depth, a warning is logged and the remaining COPY statements are left unexpanded.
+
+## Max Total Expansions
+
+A breadth amplification guard caps the total number of COPY expansions across all branches within a single file to **500** (`MAX_TOTAL_EXPANSIONS`). This prevents exponential blowup from diamond-shaped COPY graphs where N copybooks each include N other copybooks. Once the limit is reached, further COPY statements in that file are left unexpanded and a single warning is logged.
+
+## REPLACING Application Detail
+
+The REPLACING engine works by scanning all COBOL identifiers (matching `\b[A-Z][A-Z0-9-]*\b`) in the copybook content and applying each replacement rule:
+
+```
+Original copybook content:
+ 05 ESP-NAME PIC X(30).
+ 05 ESP-CODE PIC X(10).
+ 05 KPSESPL-FLAG PIC X(01).
+
+After REPLACING LEADING "ESP-" BY "LK-ESP-" LEADING "KPSESPL" BY "LK-KPSESPL":
+ 05 LK-ESP-NAME PIC X(30).
+ 05 LK-ESP-CODE PIC X(10).
+ 05 LK-KPSESPL-FLAG PIC X(01).
+```
+
+For LEADING replacements, the engine checks if each identifier starts with the `from` prefix (case-insensitive) and replaces only the prefix portion, preserving the rest of the identifier.
+
+For TRAILING replacements, the same logic applies to suffixes.
+
+For EXACT replacements, only identifiers that match the `from` value exactly (case-insensitive) are replaced.
+
+## Copybook Resolution
+
+The resolver tries multiple strategies to match a COPY target name to a copybook file:
+
+1. **Exact match**: `COPY CPSESP` resolves to copybook named `CPSESP`
+2. **Strip extension**: `COPY WORKGRID.CPY` strips `.CPY` and resolves to `WORKGRID`
+3. **Add extension**: `COPY CPSESP` tries `CPSESP.CPY` and `CPSESP.COPY`
+
+If no match is found, the COPY statement is left in place (unexpanded) and a resolution record with `resolvedPath: null` is created.
+
+## Pipeline Integration
+
+The expansion runs **per chunk**, after file content is read but before dispatch to worker threads:
+
+1. All copybook files are read upfront (they are typically small, collectively under 100MB)
+2. Per chunk, the copybook map is merged with chunk content (in case a chunk contains copybooks)
+3. Only programs (not copybooks themselves) undergo expansion
+4. The expanded content replaces the original content in-place before worker dispatch
+
+## Inline Comment Handling
+
+The copy expander's `stripInlineComment()` helper is quote-aware: pipe characters (`|`) inside single- or double-quoted strings are preserved. This matches the same quote-aware logic used by the preprocessor.
+
+## Source Files
+
+- `gitnexus/src/core/ingestion/cobol-copy-expander.ts` -- `expandCopies()`, `parseReplacingClause()`, `applyReplacing()`
+- `gitnexus/src/core/ingestion/pipeline.ts` -- `expandCobolCopies()`, copybook map construction, chunk integration
diff --git a/docs/code-indexing/cobol/deep-indexing.md b/docs/code-indexing/cobol/deep-indexing.md
new file mode 100644
index 0000000000..f283767820
--- /dev/null
+++ b/docs/code-indexing/cobol/deep-indexing.md
@@ -0,0 +1,312 @@
+# COBOL Deep Indexing
+
+Beyond basic symbol extraction (program name, paragraphs, CALL, PERFORM, COPY), GitNexus performs deep indexing of COBOL-specific constructs: data items, EXEC SQL/CICS blocks, file declarations, FD entries, ENTRY points, and MOVE statements.
+
+## Data Items
+
+### Level Numbers
+
+| Level Range | Meaning | Graph Node Type |
+|-------------|---------|-----------------|
+| 01 | Record (group item) | `Record` |
+| 02-49 | Elementary/group items | `Property` |
+| 66 | RENAMES | `Property` |
+| 77 | Independent item | `Property` |
+| 88 | Condition name | `Const` |
+
+FILLER items are skipped (no useful name for the graph).
+
+### Clauses Parsed
+
+The `parseDataItemClauses()` function extracts these clauses from the trailing text of a data item declaration:
+
+| Clause | Pattern | Example |
+|--------|---------|---------|
+| `PIC` / `PICTURE` | `\bPIC(?:TURE)?\s+(?:IS\s+)?(\S+)` | `PIC X(30)`, `PICTURE IS 9(5)V99` |
+| `USAGE` | `\bUSAGE\s+(?:IS\s+)?(COMP\|BINARY\|...)` | `USAGE IS COMP-3`, `BINARY` |
+| `REDEFINES` | `\bREDEFINES\s+([A-Z][A-Z0-9-]+)` | `REDEFINES WK-DATE-NUM` |
+| `OCCURS` | `\bOCCURS\s+(\d+)` | `OCCURS 12 TIMES` |
+
+Standalone COMP variants (without the `USAGE` keyword) are also detected: `COMP`, `COMP-1` through `COMP-6`, `COMP-X`, `BINARY`, `PACKED-DECIMAL`.
+
+### Data Hierarchy
+
+Data items form a hierarchical structure based on level numbers. The extractor uses a **stack algorithm**:
+
+```
+Processing order:
+ 01 WK-RECORD -> push {01, WK-RECORD} -> parent: Module
+ 05 WK-NAME -> push {05, WK-NAME} -> parent: WK-RECORD (01 < 05)
+ 10 WK-FIRST -> push {10, WK-FIRST} -> parent: WK-NAME (05 < 10)
+ 10 WK-LAST -> pop WK-FIRST, push -> parent: WK-NAME (05 < 10)
+ 05 WK-CODE -> pop WK-LAST, WK-NAME -> parent: WK-RECORD (01 < 05)
+ 88 WK-ACTIVE -> (88 handled separately) -> parent: WK-CODE
+```
+
+The stack maintains items where each entry's level is strictly less than the next. When a new item arrives with a level <= the top of stack, items are popped until the stack top has a smaller level. A `CONTAINS` edge is created from the stack top to the new item.
+
+For 88-level condition names, the parent is the immediately preceding non-88 data item (found by scanning backwards).
+
+### Annotated Example
+
+```cobol
+ 01 WK-EMPLOYEE.
+ 05 WK-EMP-ID PIC 9(6).
+ 05 WK-EMP-NAME PIC X(30).
+ 05 WK-EMP-STATUS PIC X(01).
+ 88 WK-ACTIVE VALUE "A".
+ 88 WK-INACTIVE VALUE "I".
+ 05 WK-SALARY PIC 9(7)V99 COMP-3.
+ 05 WK-DEPT PIC X(04) OCCURS 3 TIMES.
+```
+
+Produces:
+- `Record` node: `WK-EMPLOYEE` (level 01, section: working-storage)
+- `Property` nodes: `WK-EMP-ID`, `WK-EMP-NAME`, `WK-EMP-STATUS`, `WK-SALARY`, `WK-DEPT`
+- `Const` nodes: `WK-ACTIVE` (values: `A`), `WK-INACTIVE` (values: `I`)
+- `CONTAINS` edges: `WK-EMPLOYEE -> WK-EMP-ID`, `WK-EMPLOYEE -> WK-EMP-NAME`, etc.
+- `CONTAINS` edges: `WK-EMP-STATUS -> WK-ACTIVE`, `WK-EMP-STATUS -> WK-INACTIVE`
+
+### Data Item Cap
+
+A maximum of **500 data items per file** (`MAX_DATA_ITEMS_PER_FILE`) are processed. Some COBOL programs (especially after COPY expansion) can have 10,000+ data items, which would cause graph bloat and push the V8 relationship Map past its 16.7M entry limit across thousands of files.
+
+The cap applies after extraction: the first 500 items in source order are kept. Since 01-level records appear first, critical top-level structure is preserved.
+
+## EXEC SQL
+
+EXEC SQL blocks are accumulated across lines between `EXEC SQL` and `END-EXEC`, then parsed as a unit.
+
+### Operation Classification
+
+The first SQL keyword determines the operation:
+
+| First Keyword | Operation |
+|---------------|-----------|
+| `SELECT` | SELECT |
+| `INSERT` | INSERT |
+| `UPDATE` | UPDATE |
+| `DELETE` | DELETE |
+| `DECLARE` | DECLARE |
+| `OPEN` | OPEN |
+| `CLOSE` | CLOSE |
+| `FETCH` | FETCH |
+| *(anything else)* | OTHER |
+
+### Table Extraction
+
+Tables are extracted from SQL clauses:
+
+| Clause Pattern | Example |
+|----------------|---------|
+| `FROM
` | `SELECT * FROM EMPLOYEES` |
+| `INSERT INTO ` | `INSERT INTO EMPLOYEES` |
+| `UPDATE ` | `UPDATE EMPLOYEES SET ...` |
+| `JOIN ` | `LEFT JOIN DEPARTMENTS ON ...` |
+
+Note: The `INTO` pattern is restricted to `INSERT INTO` to avoid false positives from `FETCH ... INTO :host-var` and `SELECT ... INTO :host-var` statements, where `INTO` introduces host variables rather than table names.
+
+### Cursor Detection
+
+```cobol
+ EXEC SQL
+ DECLARE C-EMPLOYEES CURSOR FOR
+ SELECT EMP-ID, EMP-NAME FROM EMPLOYEES
+ WHERE DEPT = :WK-DEPT
+ END-EXEC
+```
+
+Extracts: cursor `C-EMPLOYEES`, table `EMPLOYEES`, host variable `WK-DEPT`.
+
+### Host Variables
+
+Host variables are COBOL variables referenced in SQL with a `:` prefix. The colon is stripped:
+
+```sql
+WHERE EMP-ID = :WK-EMP-ID AND DEPT = :WK-DEPT
+```
+
+Extracts: `WK-EMP-ID`, `WK-DEPT`.
+
+### Graph Output
+
+- `CodeElement` node per table, with description `sql-table op:{OP}`
+- `CodeElement` node per cursor, with description `sql-cursor`
+- `ACCESSES` edge from Module to each CodeElement
+- Deduplication: if the same table appears in multiple SQL blocks, only one node is created
+
+## EXEC CICS
+
+EXEC CICS blocks are accumulated and parsed similarly to SQL blocks.
+
+### Command Detection
+
+Two-word commands are detected first (matched against the block start):
+
+```
+SEND MAP, RECEIVE MAP, SEND TEXT, SEND CONTROL, READ NEXT, READ PREV
+```
+
+If no two-word command matches, the first word is used (e.g., `LINK`, `XCTL`, `RETURN`, `READ`, `WRITE`).
+
+### Extraction
+
+| Element | Pattern | Example |
+|---------|---------|---------|
+| MAP name | `MAP('name')` or `MAP("name")` | `EXEC CICS SEND MAP('EMPMENU')` |
+| PROGRAM name | `PROGRAM('name')` or `PROGRAM("name")` | `EXEC CICS LINK PROGRAM('BGTABUP')` |
+| TRANSID | `TRANSID('name')` or `TRANSID("name")` | `EXEC CICS START TRANSID('EMP1')` |
+
+### Graph Output
+
+- MAP: `CodeElement` node with description `cics-map cmd:{CMD}` + `ACCESSES` edge from Module
+- PROGRAM: `CALLS` edge (cross-program call via CICS LINK/XCTL)
+- TRANSID: `CodeElement` node with description `cics-transid cmd:{CMD}` + `ACCESSES` edge from Module
+
+### Annotated Example
+
+```cobol
+ EXEC CICS
+ SEND MAP('EMPMENU')
+ MAPSET('EMPSET')
+ FROM(WK-MAP-DATA)
+ ERASE
+ END-EXEC
+```
+
+Produces:
+- `CodeElement` node: `EMPMENU` (description: `cics-map cmd:SEND MAP`)
+- `ACCESSES` edge: Module -> `EMPMENU`
+
+## File Declarations
+
+SELECT statements in the INPUT-OUTPUT SECTION are accumulated across multiple lines (until a period terminator) and parsed for:
+
+| Clause | Pattern | Example |
+|--------|---------|---------|
+| SELECT | `SELECT ` | `SELECT MASTER-FILE` |
+| ASSIGN | `ASSIGN TO ` | `ASSIGN TO "MASTER.DAT"` |
+| ORGANIZATION | `ORGANIZATION IS ` | `ORGANIZATION IS INDEXED` |
+| ACCESS | `ACCESS MODE IS ` | `ACCESS MODE IS DYNAMIC` |
+| RECORD KEY | `RECORD KEY IS ` | `RECORD KEY IS WK-EMP-ID` |
+| FILE STATUS | `FILE STATUS IS ` | `FILE STATUS IS WK-FILE-STATUS` |
+
+### Graph Output
+
+- `CodeElement` node with description containing all parsed clauses (e.g., `select org:INDEXED access:DYNAMIC key:WK-EMP-ID status:WK-FILE-STATUS assign:MASTER.DAT`)
+- `RECORD_KEY_OF` edge: from Property node to CodeElement (confidence 0.8)
+- `FILE_STATUS_OF` edge: from Property node to CodeElement (confidence 0.8)
+
+## FD Entries
+
+FD (File Description) entries associate a file name with its record layout:
+
+```cobol
+ FD MASTER-FILE.
+ 01 MASTER-RECORD.
+ 05 MR-EMP-ID PIC 9(6).
+ 05 MR-EMP-NAME PIC X(30).
+```
+
+The extractor tracks `pendingFdName` state: when an `FD` line is seen, the next 01-level data item becomes its record.
+
+### Graph Output
+
+- `CodeElement` node with description `fd record:{recordName}`
+- `CONTAINS` edge: FD CodeElement -> Record node
+- `CONTAINS` edge: SELECT CodeElement -> FD CodeElement (linking file declaration to file description)
+
+## ENTRY Points
+
+The `ENTRY` statement defines additional entry points into a COBOL program (in addition to the main program entry):
+
+```cobol
+ ENTRY "SUBPROG" USING WK-PARAM-1 WK-PARAM-2.
+```
+
+### Graph Output
+
+- `Constructor` node with description `entry params:{param1},{param2}` (or just `entry` if no parameters)
+- `CONTAINS` edge: Module -> Constructor
+- Symbol table entry (so the entry point is discoverable by name)
+
+## PROCEDURE DIVISION USING
+
+```cobol
+ PROCEDURE DIVISION USING WK-INPUT-REC WK-OUTPUT-REC.
+```
+
+The USING clause identifies parameters received by the program from its caller.
+
+### Graph Output
+
+- `RECEIVES` edge: Module -> Property (for each parameter name, confidence 0.8)
+
+## MOVE Statements
+
+MOVE statements produce `ACCESSES` edges in the graph:
+
+```cobol
+ MOVE WK-NAME TO OUT-NAME.
+ MOVE CORRESPONDING WK-INPUT TO WK-OUTPUT.
+ MOVE CORR WK-IN TO WK-OUT.
+```
+
+### Extraction Details
+
+- Source and target identifiers are captured
+- `CORRESPONDING` and its abbreviation `CORR` are both recognized (bulk field-by-field move)
+- Figurative constants (SPACES, ZEROS, LOW-VALUES, HIGH-VALUES, QUOTES, ALL) are skipped
+- The enclosing paragraph (`caller`) is tracked for context
+
+### MOVE CORRESPONDING / CORR Edge Reasons
+
+MOVE CORRESPONDING (and CORR) produces distinct edge reasons to differentiate from simple MOVE:
+
+| Edge | Reason (simple MOVE) | Reason (CORRESPONDING/CORR) |
+|------|---------------------|-----------------------------|
+| Read (source) | `cobol-move-read` | `cobol-move-corresponding-read` |
+| Write (target) | `cobol-move-write` | `cobol-move-corresponding-write` |
+
+This distinction allows queries to find bulk field-by-field moves separately from simple variable assignments.
+
+## GO TO DEPENDING ON
+
+The `GO TO` statement with multiple targets and a `DEPENDING ON` clause is a computed branch:
+
+```cobol
+ GO TO PARA-1 PARA-2 PARA-3
+ DEPENDING ON WK-SELECTOR.
+```
+
+All target paragraph names are extracted and emitted as separate `gotos` entries. Each target produces a `CALLS` edge in the graph (same semantics as PERFORM). The `DEPENDING ON` variable is not currently tracked as a data-flow dependency.
+
+## SORT INPUT/OUTPUT PROCEDURE
+
+SORT and MERGE statements can specify procedural entry points instead of file-based I/O:
+
+```cobol
+ SORT SORT-FILE ON ASCENDING KEY SORT-KEY
+ INPUT PROCEDURE IS PREPARE-INPUT
+ OUTPUT PROCEDURE IS FORMAT-OUTPUT.
+```
+
+`INPUT PROCEDURE IS` and `OUTPUT PROCEDURE IS` targets are extracted as control-flow targets (same as PERFORM). They produce `performs` entries and corresponding `CALLS` edges in the graph.
+
+## Fixed-Format Literal Continuation
+
+In fixed-format COBOL, string literals can span multiple lines using the continuation indicator (`-` in column 7). When a continuation line starts with a quote character, the extractor joins it with the predecessor by removing the trailing quote from the previous line and the opening quote from the continuation:
+
+```
+Line N: MOVE "THIS IS A LONG STRI
+Line N+1 (cont): - "NG VALUE" TO WK-FIELD.
+Merged: MOVE "THIS IS A LONG STRING VALUE" TO WK-FIELD.
+```
+
+The trailing `"` on line N and the opening `"` on line N+1 are both removed, producing a seamless literal. If no matching quote is found on the predecessor line, the continuation is appended as-is.
+
+## Source Files
+
+- `gitnexus/src/core/ingestion/cobol-preprocessor.ts` -- All extraction logic, clause parsers, EXEC block parsers
+- `gitnexus/src/core/ingestion/workers/parse-worker.ts` -- `processCobolRegexOnly()`, graph node/edge emission
+- `gitnexus/src/core/ingestion/parsing-processor.ts` -- Sequential fallback with same `MAX_DATA_ITEMS_PER_FILE` cap
diff --git a/docs/code-indexing/cobol/file-detection.md b/docs/code-indexing/cobol/file-detection.md
new file mode 100644
index 0000000000..60b60918a7
--- /dev/null
+++ b/docs/code-indexing/cobol/file-detection.md
@@ -0,0 +1,126 @@
+# COBOL File Detection
+
+GitNexus detects COBOL files through two mechanisms: extension-based mapping and directory-based override for extensionless files. This document covers both, plus the copybook/program classification logic.
+
+## Extension Mapping
+
+### Program Extensions
+
+| Extension | Type |
+|-----------|------|
+| `.cbl` | COBOL program |
+| `.cob` | COBOL program |
+| `.cobol` | COBOL program |
+
+### Copybook Extensions
+
+| Extension | Type | Notes |
+|-----------|------|-------|
+| `.cpy` | Copybook | Standard |
+| `.copy` | Copybook | Standard |
+| `.gnm` / `.GNM` | Copybook | Enterprise (GnuCOBOL naming) |
+| `.fd` / `.FD` | Copybook | File Description fragment |
+| `.wrk` / `.WRK` | Copybook | Working-Storage fragment |
+| `.sel` / `.SEL` | Copybook | SELECT clause fragment |
+| `.open` / `.OPEN` | Copybook | File OPEN fragment |
+| `.close` / `.CLOSE` | Copybook | File CLOSE fragment |
+| `.ini` / `.INI` | Copybook | Initialization fragment |
+| `.def` / `.DEF` | Copybook | Definition fragment |
+
+All extension matching is case-sensitive in `getLanguageFromFilename` (the extensions above are matched as written, including uppercase variants like `.GNM`).
+
+## Extensionless File Detection: `GITNEXUS_COBOL_DIRS`
+
+Many enterprise COBOL repositories use extensionless files -- the filename alone identifies the program (e.g., `s/BGTABFL` is the source for program `BGTABFL`). GitNexus handles this via the `GITNEXUS_COBOL_DIRS` environment variable.
+
+### Configuration
+
+Set `GITNEXUS_COBOL_DIRS` to a comma-separated list of directory names:
+
+```bash
+# Files in s/, c/, and wfproc/ directories (at any depth) are treated as COBOL
+export GITNEXUS_COBOL_DIRS=s,c,wfproc
+```
+
+The matching is **case-insensitive** and checks all path segments:
+
+- `/repo/s/BGTABFL` -- matches segment `s` -- COBOL
+- `/repo/src/c/CPSESP` -- matches segment `c` -- COBOL
+- `/repo/wfproc/WF001` -- matches segment `wfproc` -- COBOL
+- `/repo/docs/README` -- no matching segment -- skipped
+
+### Decision Tree
+
+```mermaid
+flowchart TD
+ A[getLanguageFromPath] --> B[getLanguageFromFilename]
+ B --> C{Known extension?}
+ C -->|Yes .cbl/.cob/.cobol/.cpy/...| D[Return COBOL]
+ C -->|Yes .ts/.py/.java/...| E[Return other language]
+ C -->|No match| F{Has extension?}
+
+ F -->|"Has dot in basename"| G[Return null]
+ F -->|"No dot = extensionless"| H{GITNEXUS_COBOL_DIRS set?}
+
+ H -->|No| G
+ H -->|Yes| I{Any path segment
matches a configured dir?}
+
+ I -->|Yes| D
+ I -->|No| G
+
+ style D fill:#e8f5e9,stroke:#2e7d32
+ style G fill:#ffebee,stroke:#c62828
+```
+
+### Implementation Detail
+
+The `GITNEXUS_COBOL_DIRS` value is parsed once (on first call) and cached in a `Set`:
+
+```typescript
+// From gitnexus/src/core/ingestion/utils.ts
+const getCobolDirs = (): Set => {
+ if (_cobolDirs) return _cobolDirs;
+ const raw = process.env.GITNEXUS_COBOL_DIRS;
+ _cobolDirs = raw
+ ? new Set(raw.split(',').map(d => d.trim().toLowerCase()))
+ : new Set();
+ return _cobolDirs;
+};
+```
+
+The path segment check splits the full path on `/` and tests each segment against the cached set.
+
+## Copybook vs Program Classification
+
+After a file is identified as COBOL, it must be classified as either a **program** (to be parsed for symbols) or a **copybook** (to be loaded into the copybook map for COPY expansion).
+
+### Classification Rules
+
+A COBOL file is classified as a **copybook** if ANY of these conditions is true:
+
+1. It has a recognized copybook extension (`.cpy`, `.copy`, `.gnm`, `.fd`, `.wrk`, `.sel`, `.open`, `.close`, `.ini`, `.def`)
+2. It is an extensionless file whose path contains a directory segment matching one of: `c`, `copy`, `copybooks`, `copylib`, `cpy`
+
+A file is classified as a **program** if:
+
+1. It has a program extension (`.cbl`, `.cob`, `.cobol`), OR
+2. It is extensionless and does NOT match any copybook directory pattern
+
+### Copybook Name Resolution
+
+Copybook names are derived from the filename:
+
+- Strip the extension (if any)
+- Convert to uppercase
+
+Examples:
+- `c/CPSESP` -- name: `CPSESP`
+- `copy/workgrid.cpy` -- name: `WORKGRID`
+- `c/ANAZI.GNM` -- name: `ANAZI`
+
+This name is used to resolve `COPY CPSESP.` statements during expansion.
+
+## Source Files
+
+- `gitnexus/src/core/ingestion/utils.ts` -- `getLanguageFromPath()`, `getLanguageFromFilename()`, `getCobolDirs()`
+- `gitnexus/src/core/ingestion/pipeline.ts` -- `isCobolCopybook()`, `getCopybookName()`, `COPYBOOK_EXTENSIONS`, `COBOL_PROGRAM_EXTENSIONS`
diff --git a/docs/code-indexing/cobol/graph-model.md b/docs/code-indexing/cobol/graph-model.md
new file mode 100644
index 0000000000..de82c0723d
--- /dev/null
+++ b/docs/code-indexing/cobol/graph-model.md
@@ -0,0 +1,193 @@
+# COBOL Graph Model
+
+This document describes the graph nodes and edges that GitNexus creates for COBOL codebases. The COBOL graph model is richer than most tree-sitter languages because it captures domain-specific constructs: file declarations, FD entries, data hierarchies, SQL tables, CICS maps, and cross-program contracts.
+
+## Entity-Relationship Diagram
+
+```mermaid
+erDiagram
+ File ||--o{ Module : DEFINES
+ File ||--o{ Function : DEFINES
+ File ||--o{ Namespace : DEFINES
+ File ||--o{ Record : DEFINES
+ File ||--o{ Property : DEFINES
+ File ||--o{ Const : DEFINES
+ File ||--o{ CodeElement : DEFINES
+ File ||--o{ Constructor : DEFINES
+ File }o--o{ File : IMPORTS
+
+ Module ||--o{ Record : CONTAINS
+ Module ||--o{ Constructor : CONTAINS
+ Module }o--o{ CodeElement : ACCESSES
+ Module }o--o{ Module : CALLS
+ Module }o--o{ Module : CONTRACTS
+ Module }o--o{ Property : RECEIVES
+
+ Record ||--o{ Property : CONTAINS
+ Record ||--o{ Const : CONTAINS
+ Record }o--o{ Record : REDEFINES
+
+ Property ||--o{ Property : CONTAINS
+ Property ||--o{ Const : CONTAINS
+ Property }o--o{ Property : REDEFINES
+ Property }o--o{ CodeElement : RECORD_KEY_OF
+ Property }o--o{ CodeElement : FILE_STATUS_OF
+
+ CodeElement ||--o{ CodeElement : CONTAINS
+ CodeElement ||--o{ Record : CONTAINS
+
+ Function }o--o{ Function : CALLS
+```
+
+## Node Types
+
+| Node Type | COBOL Concept | Created From | Example |
+|-----------|--------------|--------------|---------|
+| `Module` | PROGRAM-ID | `PROGRAM-ID. BGTABFL` | Name: `BGTABFL`, description may include author and date |
+| `Function` | Paragraph | `PROCESS-RECORD.` at column 8 | Name: `PROCESS-RECORD` |
+| `Namespace` | Procedure section | `MAIN-LOGIC SECTION.` at column 8 | Name: `MAIN-LOGIC` |
+| `Record` | 01-level data item | `01 WK-EMPLOYEE.` | Description: `level:01 section:working-storage` |
+| `Property` | 02-49/66/77 data item | `05 WK-NAME PIC X(30).` | Description: `level:05 pic:X(30) section:working-storage` |
+| `Const` | 88-level condition | `88 WK-ACTIVE VALUE "A".` | Description: `level:88 values:A` |
+| `CodeElement` | SELECT, FD, SQL table, CICS map, cursor, transid | Various | Description varies by subtype |
+| `Constructor` | ENTRY point | `ENTRY "SUBPROG" USING WK-DATA` | Description: `entry params:WK-DATA` |
+
+### CodeElement Subtypes
+
+CodeElement is used for multiple COBOL constructs, distinguished by their description prefix:
+
+| Subtype | ID Pattern | Description Format | Example |
+|---------|-----------|-------------------|---------|
+| File SELECT | `CodeElement:{path}:SELECT:{name}` | `select org:INDEXED access:DYNAMIC ...` | `SELECT MASTER-FILE` |
+| FD entry | `CodeElement:{path}:FD:{name}` | `fd record:{recordName}` | `FD MASTER-FILE` |
+| SQL table | `CodeElement:{path}:sql-table:{name}` | `sql-table op:SELECT` | Table `EMPLOYEES` |
+| SQL cursor | `CodeElement:{path}:sql-cursor:{name}` | `sql-cursor` | Cursor `C-EMPLOYEES` |
+| CICS map | `CodeElement:{path}:cics-map:{name}` | `cics-map cmd:SEND MAP` | Map `EMPMENU` |
+| CICS transid | `CodeElement:{path}:cics-transid:{name}` | `cics-transid cmd:START` | Transid `EMP1` |
+
+## Edge Types
+
+| Edge Type | Source | Target | Created By | Confidence | Example |
+|-----------|--------|--------|-----------|------------|---------|
+| `DEFINES` | File | any node | File defines its symbols | 1.0 | File -> Module `BGTABFL` |
+| `CALLS` | Function | Function | `PERFORM X [THRU Y]` | (via call-processor) | `PROCESS-RECORD` -> `CALC-TAX` |
+| `CALLS` | Module | Module | `CALL "BGTABUP"` | (via call-processor) | `BGTABFL` -> `BGTABUP` |
+| `CALLS` | Module | Module | `EXEC CICS LINK PROGRAM('X')` | (via call-processor) | `BGTABFL` -> `BGTABUP` |
+| `IMPORTS` | File | File | `COPY copybook` | (via import-processor) | Source file -> Copybook file |
+| `CONTAINS` | Module | Record | Data hierarchy root | 1.0 | `BGTABFL` -> `WK-EMPLOYEE` |
+| `CONTAINS` | Record | Property | Data hierarchy | 1.0 | `WK-EMPLOYEE` -> `WK-NAME` |
+| `CONTAINS` | Property | Property | Nested data items | 1.0 | `WK-ADDRESS` -> `WK-CITY` |
+| `CONTAINS` | Record/Property | Const | 88-level parent | 1.0 | `WK-STATUS` -> `WK-ACTIVE` |
+| `CONTAINS` | CodeElement (FD) | Record | FD record link | 1.0 | `FD:MASTER-FILE` -> `MASTER-RECORD` |
+| `CONTAINS` | CodeElement (SELECT) | CodeElement (FD) | SELECT-FD link | 0.9 | `SELECT:MASTER-FILE` -> `FD:MASTER-FILE` |
+| `CONTAINS` | Module | Constructor | ENTRY in module | 1.0 | `BGTABFL` -> `SUBPROG` |
+| `REDEFINES` | Record | Record | `01 X REDEFINES Y` | 1.0 | `WK-DATE-NUM` -> `WK-DATE-ALPHA` |
+| `REDEFINES` | Property | Property | `05 X REDEFINES Y` | 1.0 | `WK-CODE-NUM` -> `WK-CODE-ALPHA` |
+| `RECORD_KEY_OF` | Property | CodeElement (SELECT) | `RECORD KEY IS field` | 0.8 | `WK-EMP-ID` -> `SELECT:MASTER-FILE` |
+| `FILE_STATUS_OF` | Property | CodeElement (SELECT) | `FILE STATUS IS field` | 0.8 | `WK-FS` -> `SELECT:MASTER-FILE` |
+| `ACCESSES` | Module | CodeElement | EXEC SQL/CICS | 0.9 | `BGTABFL` -> `sql-table:EMPLOYEES` |
+| `RECEIVES` | Module | Property | `PROCEDURE USING` | 0.8 | `BGTABFL` -> `WK-INPUT-REC` |
+| `CONTRACTS` | Module | Module | Shared copybook detection | 0.9 | `BGTABFL` -> `BGTABUP` (via `CPSESP`) |
+
+## Full Annotated Example
+
+Given this COBOL program:
+
+```cobol
+ IDENTIFICATION DIVISION.
+ PROGRAM-ID. EMPMAINT.
+ AUTHOR. Development Team.
+
+ ENVIRONMENT DIVISION.
+ INPUT-OUTPUT SECTION.
+ FILE-CONTROL.
+ SELECT EMP-FILE
+ ASSIGN TO "EMPLOYEE.DAT"
+ ORGANIZATION IS INDEXED
+ ACCESS MODE IS DYNAMIC
+ RECORD KEY IS EMP-ID
+ FILE STATUS IS WS-FILE-STATUS.
+
+ DATA DIVISION.
+ FILE SECTION.
+ FD EMP-FILE.
+ 01 EMP-RECORD.
+ 05 EMP-ID PIC 9(6).
+ 05 EMP-NAME PIC X(30).
+
+ WORKING-STORAGE SECTION.
+ 01 WS-FLAGS.
+ 05 WS-FILE-STATUS PIC X(02).
+ 05 WS-EOF-FLAG PIC X(01).
+ 88 WS-EOF VALUE "Y".
+
+ LINKAGE SECTION.
+ 01 LK-SEARCH-KEY PIC 9(6).
+
+ PROCEDURE DIVISION USING LK-SEARCH-KEY.
+ MAIN-LOGIC SECTION.
+ MAIN-START.
+ PERFORM OPEN-FILE
+ PERFORM PROCESS-RECORDS
+ PERFORM CLOSE-FILE
+ STOP RUN.
+
+ OPEN-FILE.
+ OPEN I-O EMP-FILE.
+
+ PROCESS-RECORDS.
+ MOVE LK-SEARCH-KEY TO EMP-ID
+ EXEC SQL
+ SELECT EMP_SALARY INTO :WS-SALARY
+ FROM EMPLOYEES
+ WHERE EMP_ID = :EMP-ID
+ END-EXEC
+ CALL "EMPREPORT".
+
+ CLOSE-FILE.
+ CLOSE EMP-FILE.
+```
+
+The graph produced contains:
+
+**Nodes:**
+- `Module`: EMPMAINT (description: `author:Development Team`)
+- `Namespace`: MAIN-LOGIC
+- `Function`: MAIN-START, OPEN-FILE, PROCESS-RECORDS, CLOSE-FILE
+- `Record`: EMP-RECORD, WS-FLAGS, LK-SEARCH-KEY
+- `Property`: EMP-ID, EMP-NAME, WS-FILE-STATUS, WS-EOF-FLAG
+- `Const`: WS-EOF (values: Y)
+- `CodeElement`: SELECT:EMP-FILE, FD:EMP-FILE, sql-table:EMPLOYEES
+- (COPY imports, if any, would produce File IMPORTS edges)
+
+**Edges:**
+- `DEFINES`: File -> all nodes
+- `CONTAINS`: EMPMAINT -> EMP-RECORD, EMPMAINT -> WS-FLAGS, EMPMAINT -> LK-SEARCH-KEY
+- `CONTAINS`: EMP-RECORD -> EMP-ID, EMP-RECORD -> EMP-NAME
+- `CONTAINS`: WS-FLAGS -> WS-FILE-STATUS, WS-FLAGS -> WS-EOF-FLAG
+- `CONTAINS`: WS-EOF-FLAG -> WS-EOF
+- `CONTAINS`: FD:EMP-FILE -> EMP-RECORD
+- `CONTAINS`: SELECT:EMP-FILE -> FD:EMP-FILE
+- `CALLS`: MAIN-START -> OPEN-FILE, MAIN-START -> PROCESS-RECORDS, MAIN-START -> CLOSE-FILE
+- `CALLS`: EMPMAINT -> EMPREPORT (external CALL)
+- `ACCESSES`: EMPMAINT -> sql-table:EMPLOYEES
+- `RECEIVES`: EMPMAINT -> LK-SEARCH-KEY (PROCEDURE USING)
+- `RECORD_KEY_OF`: EMP-ID -> SELECT:EMP-FILE
+- `FILE_STATUS_OF`: WS-FILE-STATUS -> SELECT:EMP-FILE
+
+## How COBOL Differs from Tree-Sitter Languages
+
+| Aspect | COBOL | Tree-Sitter Languages |
+|--------|-------|----------------------|
+| Node variety | 8 types (Module, Function, Namespace, Record, Property, Const, CodeElement, Constructor) | Typically 4-6 (Function, Class, Method, Interface, Module, Const) |
+| Domain edges | RECORD_KEY_OF, FILE_STATUS_OF, ACCESSES, RECEIVES, CONTRACTS, REDEFINES | Primarily CALLS, IMPORTS, EXTENDS, IMPLEMENTS |
+| Data hierarchy | Deep CONTAINS chains (01 -> 05 -> 10 -> 88) | Flat class members |
+| Cross-program calls | CALL "name" + CICS LINK PROGRAM | Import-based resolution |
+| Contract detection | Shared COPY copybook between caller/callee | Not applicable |
+| Metadata | AUTHOR, DATE-WRITTEN on Module | JSDoc/docstring (not indexed) |
+
+## Source Files
+
+- `gitnexus/src/core/ingestion/workers/parse-worker.ts` -- `processCobolRegexOnly()`, node/edge emission logic
+- `gitnexus/src/core/ingestion/pipeline.ts` -- `detectCrossProgamContracts()` for CONTRACTS edges
+- `gitnexus/src/core/ingestion/cobol-preprocessor.ts` -- `CobolRegexResults` interface (all extracted data)
diff --git a/docs/code-indexing/cobol/performance.md b/docs/code-indexing/cobol/performance.md
new file mode 100644
index 0000000000..b0f69e7019
--- /dev/null
+++ b/docs/code-indexing/cobol/performance.md
@@ -0,0 +1,261 @@
+# COBOL Performance and Tuning
+
+This document covers real-world benchmarks, worker pool configuration, memory management, known limitations, and troubleshooting for COBOL indexing.
+
+## PROJECT-NAME Benchmark
+
+The PROJECT-NAME project is a large Italian payroll system written in COBOL. It serves as the primary benchmark for COBOL indexing performance.
+
+### Input
+
+| Metric | Value |
+| --------------------------- | ---------------------------------------------------------------------------- |
+| Paths scanned | 14,217 |
+| Parseable files | 13,129 |
+| Total source size | 224 MB |
+| Chunks | 12 (at 20 MB budget) |
+| Copybooks loaded | 2,976 |
+| Copybooks used in expansion | 2,955 |
+| Key directories | `s/` (7773 programs), `c/` (3036 copybooks), `wfproc/` (1973 workflow files) |
+
+### Output
+
+| Metric | Value |
+| ---------------------- | ------ |
+| Graph nodes | 2.79M |
+| Graph edges | 5.67M |
+| Clusters (communities) | 16,679 |
+| Execution flows | 300 |
+
+### Timing
+
+| Phase | Duration |
+| ------------------------------- | ----------------- |
+| Total | ~251s |
+| KuzuDB write | 132s |
+| Full-text search indexing | 6.7s |
+| Regex extraction (avg per file) | ~1ms |
+| COPY expansion + deep indexing | Remainder (~112s) |
+
+### Indexing Command
+
+```bash
+cd /path/to/PROJECT-NAME
+GITNEXUS_COBOL_DIRS=s,c,wfproc GITNEXUS_VERBOSE=1 node --max-old-space-size=8192 \
+ /path/to/gitnexus/dist/cli/index.js analyze --force
+```
+
+## Open-Source Benchmarks
+
+### CardDemo (AWS)
+
+| Metric | Value |
+| ------ | ----- |
+| Graph nodes | 12,323 |
+| Graph edges | 8,893 |
+| Total time | 7.4s |
+
+### ACAS
+
+| Metric | Value |
+| ------ | ----- |
+| Graph nodes | 14,016 |
+| Graph edges | 15,452 |
+| Total time | 9.3s |
+
+### Micro-Benchmark (Single-File Extraction)
+
+| Metric | Value |
+| ------ | ----- |
+| Per-iteration | 0.65ms |
+| Throughput | ~382K lines/sec |
+
+## Worker Pool Tuning
+
+### Sub-Batch Size
+
+The worker pool splits each worker's chunk into sub-batches to bound peak memory per `postMessage` serialization. COBOL repos use a smaller sub-batch size than the default:
+
+| Parameter | Default | COBOL Mode |
+| --------------------- | ----------- | ------------------- |
+| Sub-batch size | 1,500 files | 200 files |
+| Per sub-batch timeout | 120s | 120s (configurable) |
+
+**Why 200?** COBOL regex extraction + preprocessing takes ~1ms per file on average, but with COPY expansion and deep indexing the effective time is ~150ms per file. At sub-batch size 1500, that would be ~225s per sub-batch, exceeding the 120s timeout.
+
+COBOL mode is activated automatically when `GITNEXUS_COBOL_DIRS` is set:
+
+```typescript
+// From pipeline.ts
+const cobolSubBatch = process.env.GITNEXUS_COBOL_DIRS ? 200 : undefined;
+workerPool = createWorkerPool(workerUrl, undefined, cobolSubBatch);
+```
+
+### Worker Count
+
+Workers default to `min(8, cpus - 1)`. For COBOL repos, this is usually sufficient since regex extraction is CPU-bound but fast. The bottleneck is typically KuzuDB write, not extraction.
+
+### Timeout Configuration
+
+| Environment Variable | Default | Purpose |
+| ------------------------------------ | --------------- | --------------------------------------------------- |
+| `GITNEXUS_WORKER_TIMEOUT_MS` | 120,000 (2 min) | Per sub-batch processing timeout |
+| `GITNEXUS_WORKER_STARTUP_TIMEOUT_MS` | 60,000 (1 min) | Worker initialization timeout (tree-sitter loading) |
+
+For COBOL-only repos, worker startup is faster because tree-sitter native modules are loaded lazily (skipped entirely if only COBOL files are present).
+
+## Data Item Cap
+
+### Configuration
+
+```typescript
+const MAX_DATA_ITEMS_PER_FILE = 500;
+```
+
+This constant appears in both `parse-worker.ts` (worker path) and `parsing-processor.ts` (sequential fallback).
+
+### Rationale
+
+Some COBOL programs, especially after COPY expansion, can have 10,000+ data items. At that scale:
+
+- The in-memory relationship Map (for CONTAINS, REDEFINES, etc.) approaches the V8 16.7M entry limit across thousands of files
+- KuzuDB write time increases linearly with edge count
+- Most deep-nested items (level 20+) are rarely queried individually
+
+### Impact
+
+The cap truncates data items beyond the 500th in source order. Since 01-level Records appear first in COBOL source, the cap preserves:
+
+- All 01-level record definitions
+- The most important 02-49 level items (those closest to the record root)
+- 88-level conditions associated with early items
+
+To increase the cap for specific needs, modify the `MAX_DATA_ITEMS_PER_FILE` constant in both files.
+
+## Memory Management
+
+### COPY Expansion Breadth Guard
+
+A per-file `MAX_TOTAL_EXPANSIONS = 500` limit prevents exponential blowup from diamond-shaped COPY graphs (e.g., N copybooks each containing N COPY statements). Once the limit is reached, further COPY statements in that file are left unexpanded. See [copy-expansion.md](copy-expansion.md) for details.
+
+### COPY Expansion Memory
+
+All copybook content is loaded upfront into a Map before chunk processing begins. For PROJECT-NAME:
+
+- 2,976 copybooks, typically under 100MB total
+- The Map is shared (read-only) across chunk iterations
+- Per-chunk, the copybook map is merged with chunk file content (in case a chunk contains copybooks not in the pre-loaded set)
+- After all chunks are processed, the copybook map is freed (`cobolCopybookContents = undefined`)
+
+### Chunk Budget
+
+Source files are grouped into chunks of max 20MB (`CHUNK_BYTE_BUDGET`). Each chunk's lifecycle:
+
+1. Read file content into memory
+2. Expand COPY statements (mutates content in-place)
+3. Dispatch to workers for extraction
+4. Workers return serialized results
+5. Merge results into graph
+6. Chunk content goes out of scope (GC reclaims)
+
+This ensures only ~20MB of source + ~200-400MB of working memory (ASTs, extracted records, serialization) is active at any time.
+
+### Shared Warning Deduplication
+
+The `warnedCircular` set (used by the COPY expansion engine) is shared across all files in a chunk. This prevents the same circular copybook warning (e.g., `ANAZI includes itself`) from being logged thousands of times.
+
+## Known Limitations
+
+| Limitation | Impact | Workaround |
+| ---------------------------------------- | --------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
+| tree-sitter-cobol hangs on ~5% of files | Cannot use tree-sitter for COBOL | Regex-only extraction (current approach) |
+| Data item cap (500/file) | May miss deeply nested items in large programs | Increase `MAX_DATA_ITEMS_PER_FILE` in source |
+| Circular copybooks (ANAZI, ANDIP, QDIPE) | Self-referential includes cannot be expanded | Detected and skipped with warning |
+| wfproc/ files may not be pure COBOL | Workflow files may produce extraction noise | Exclude `wfproc` from `GITNEXUS_COBOL_DIRS` if problematic |
+| No MOVE DATA_FLOW edges yet | Data flow between variables not in graph | Reserved for future release |
+| Continuation line handling | Some complex multi-line continuations (especially in string literals spanning 3+ lines) may not merge correctly | Known edge case; affects <0.1% of lines |
+| Single-line EXEC blocks | `EXEC SQL SELECT ... END-EXEC` on one line is handled, but pathological nesting is not | Extremely rare in practice |
+| Extension case sensitivity | `.GNM` and `.gnm` are matched differently | Use the exact case from the codebase |
+
+## Troubleshooting
+
+### "COPY expansion failed"
+
+```
+[pipeline] COPY expansion failed for s/BGTABFL: Cannot read properties of null
+```
+
+**Cause:** A copybook referenced by a COPY statement cannot be found.
+
+**Fix:**
+
+1. Verify `GITNEXUS_COBOL_DIRS` includes the directory containing copybooks (typically `c`)
+2. Check that copybook filenames match the COPY target (case-insensitive, after stripping extensions)
+3. Ensure copybook files are not in `.gitignore`
+
+### Worker sub-batch timeout
+
+```
+Worker 3 sub-batch timed out after 120s (chunk: 200 items)
+```
+
+**Cause:** A sub-batch took longer than the timeout. Typically happens when one file is extremely large (50,000+ lines after COPY expansion).
+
+**Fix:** Increase the timeout:
+
+```bash
+GITNEXUS_WORKER_TIMEOUT_MS=300000 gitnexus analyze
+```
+
+### Memory errors (heap out of memory)
+
+```
+FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
+```
+
+**Fix:** Increase Node.js heap size:
+
+```bash
+node --max-old-space-size=16384 /path/to/gitnexus/dist/cli/index.js analyze
+```
+
+For very large repos (>500MB source), consider `--max-old-space-size=32768`.
+
+### Concurrent analyze corruption
+
+**Rule:** Only ONE `gitnexus analyze` process should run at a time per repository. Concurrent writes to KuzuDB corrupt the database.
+
+If corruption occurs:
+
+```bash
+# Remove the KuzuDB directory and re-index
+rm -rf .gitnexus/kuzu
+gitnexus analyze --force
+```
+
+### Slow KuzuDB write phase
+
+The KuzuDB write phase (132s for PROJECT-NAME) is the bottleneck for large COBOL repos. This is proportional to the number of nodes and edges being written. Reducing `MAX_DATA_ITEMS_PER_FILE` or excluding non-essential directories from `GITNEXUS_COBOL_DIRS` can help.
+
+### Verbose output
+
+Enable verbose logging to see per-phase timing and statistics:
+
+```bash
+GITNEXUS_VERBOSE=1 gitnexus analyze
+```
+
+This outputs:
+
+- Scan statistics (paths, parseable files, chunk count)
+- Worker pool configuration (worker count, sub-batch size)
+- COPY expansion statistics (copybooks loaded, files expanded)
+- Community and process detection results
+- Contract detection results
+
+## Source Files
+
+- `gitnexus/src/core/ingestion/workers/worker-pool.ts` -- `DEFAULT_SUB_BATCH_SIZE`, `SUB_BATCH_TIMEOUT_MS`, `WORKER_STARTUP_TIMEOUT_MS`
+- `gitnexus/src/core/ingestion/pipeline.ts` -- `CHUNK_BYTE_BUDGET`, COBOL sub-batch configuration, chunk lifecycle
+- `gitnexus/src/core/ingestion/workers/parse-worker.ts` -- `MAX_DATA_ITEMS_PER_FILE`, `processCobolRegexOnly()`
+- `gitnexus/src/core/ingestion/parsing-processor.ts` -- Sequential fallback `MAX_DATA_ITEMS_PER_FILE`
diff --git a/docs/code-indexing/cobol/regex-extraction.md b/docs/code-indexing/cobol/regex-extraction.md
new file mode 100644
index 0000000000..9f37c10c93
--- /dev/null
+++ b/docs/code-indexing/cobol/regex-extraction.md
@@ -0,0 +1,206 @@
+# COBOL Regex Extraction
+
+The `extractCobolSymbolsWithRegex()` function in `cobol-preprocessor.ts` performs single-pass, state-machine-driven extraction of all COBOL symbols. This document describes the state machine, line processing flow, and every regex pattern used.
+
+## State Machine: Division Tracking
+
+The extractor tracks which COBOL division is currently being processed. Division transitions are detected by the `RE_DIVISION` pattern.
+
+```mermaid
+stateDiagram-v2
+ [*] --> null : Start of file
+ null --> identification : IDENTIFICATION DIVISION
+ identification --> environment : ENVIRONMENT DIVISION
+ environment --> data : DATA DIVISION
+ data --> procedure : PROCEDURE DIVISION
+
+ note right of identification
+ Extracts: PROGRAM-ID, AUTHOR, DATE-WRITTEN
+ end note
+ note right of environment
+ Extracts: SELECT ... ASSIGN ... (file declarations)
+ end note
+ note right of data
+ Extracts: FD entries, data items (01-77, 88), COPY
+ end note
+ note right of procedure
+ Extracts: paragraphs, sections, PERFORM, CALL,
+ ENTRY, MOVE, EXEC SQL/CICS
+ end note
+```
+
+## State Machine: Data Section Tracking
+
+Within the DATA DIVISION, a secondary state machine tracks the current section to tag data items with their origin.
+
+```mermaid
+stateDiagram-v2
+ [*] --> unknown : DATA DIVISION entered
+ unknown --> working_storage : WORKING-STORAGE SECTION
+ unknown --> linkage : LINKAGE SECTION
+ unknown --> file : FILE SECTION
+ unknown --> local_storage : LOCAL-STORAGE SECTION
+ working_storage --> linkage : LINKAGE SECTION
+ working_storage --> file : FILE SECTION
+ linkage --> working_storage : WORKING-STORAGE SECTION
+ file --> working_storage : WORKING-STORAGE SECTION
+ file --> linkage : LINKAGE SECTION
+ local_storage --> working_storage : WORKING-STORAGE SECTION
+```
+
+Within the ENVIRONMENT DIVISION, the `currentEnvSection` tracks whether we are in `INPUT-OUTPUT` or `CONFIGURATION` section. SELECT statement accumulation only occurs in `INPUT-OUTPUT`.
+
+## Line Processing Flow
+
+Each raw source line goes through this pipeline:
+
+```
+Raw line
+ |
+ v
+Length < 7? ---------> Skip (flush pending if any)
+ |
+ v
+Indicator col 7
+ |
+ +-- '*' or '/' -----> Comment: skip entirely
+ |
+ +-- '-' ------------> Continuation: append to pending line
+ |
+ +-- other ----------> Normal: flush pending, strip inline comments (|),
+ buffer as new pending logical line
+```
+
+After all lines are processed, the final pending line is flushed, along with any accumulated SELECT statement, SORT/MERGE accumulator, and any open EXEC block (truncated file without `END-EXEC`).
+
+### Inline Comment Stripping
+
+Enterprise COBOL (particularly Italian dialect) uses the pipe character `|` as an inline comment marker. The `stripInlineComment()` helper is **quote-aware**: it tracks whether the scan position is inside a single- or double-quoted string and only treats `|` as a comment marker when outside quotes. Pipe characters inside string literals are preserved.
+
+Free-format `*>` inline comment stripping uses the same quote-aware approach: the scanner walks character by character, toggling quote state, and only recognizes `*>` as a comment marker when not inside a quoted string.
+
+### Patch Marker Handling
+
+The `preprocessCobolSource()` function (run before extraction in the worker) replaces non-standard content in columns 1-6. Standard COBOL expects spaces or digit sequence numbers in this area. If any letter or `#` character is found, the entire sequence area is replaced with 6 spaces:
+
+```
+Before: mzADD MOVE WK-AMT TO WK-TOTAL
+After: MOVE WK-AMT TO WK-TOTAL
+```
+
+This preserves exact line count for position mapping.
+
+## Regex Pattern Reference
+
+All patterns are compiled once as module-level constants and reused across calls.
+
+### Division and Section Detection
+
+| Constant | Pattern | Purpose | Example Match |
+|----------|---------|---------|---------------|
+| `RE_DIVISION` | `\b(IDENTIFICATION\|ENVIRONMENT\|DATA\|PROCEDURE)\s+DIVISION\b` | Division boundary | `PROCEDURE DIVISION` |
+| `RE_SECTION` | `\b(WORKING-STORAGE\|LINKAGE\|FILE\|LOCAL-STORAGE\|INPUT-OUTPUT\|CONFIGURATION)\s+SECTION\b` | Section boundary | `WORKING-STORAGE SECTION` |
+
+### IDENTIFICATION DIVISION
+
+| Constant | Pattern | Purpose | Example Match |
+|----------|---------|---------|---------------|
+| `RE_PROGRAM_ID` | `\bPROGRAM-ID\.\s*([A-Z][A-Z0-9-]*)` | Program name | `PROGRAM-ID. BGTABFL` |
+| `RE_AUTHOR` | `^\s+AUTHOR\.\s*(.+)` | Author metadata | `AUTHOR. D. Smith` |
+| `RE_DATE_WRITTEN` | `^\s+DATE-WRITTEN\.\s*(.+)` | Date metadata | `DATE-WRITTEN. 2024-01-15` |
+
+### ENVIRONMENT DIVISION
+
+| Constant | Pattern | Purpose | Example Match |
+|----------|---------|---------|---------------|
+| `RE_SELECT_START` | `\bSELECT\s+(?:OPTIONAL\s+)?([A-Z][A-Z0-9-]+)` | File SELECT start (with optional `SELECT OPTIONAL` support) | `SELECT MASTER-FILE`, `SELECT OPTIONAL TRANS-FILE` |
+
+SELECT statements are accumulated across multiple lines until a period terminator is found, then parsed for ASSIGN, ORGANIZATION, ACCESS, RECORD KEY, and FILE STATUS clauses.
+
+### DATA DIVISION
+
+| Constant | Pattern | Purpose | Example Match |
+|----------|---------|---------|---------------|
+| `RE_FD` | `^\s+FD\s+([A-Z][A-Z0-9-]+)` | File description | `FD MASTER-FILE` |
+| `RE_DATA_ITEM` | `^\s+(\d{1,2})\s+([A-Z][A-Z0-9-]+)\s*(.*)` | Data item (01-77) | `05 WK-NAME PIC X(30)` |
+| `RE_ANONYMOUS_REDEFINES` | `^\s+(\d{1,2})\s+REDEFINES\s+([A-Z][A-Z0-9-]+)` | Anonymous REDEFINES | `01 REDEFINES WK-REC` |
+| `RE_88_LEVEL` | `^\s+88\s+([A-Z][A-Z0-9-]+)\s+VALUES?\s+(?:ARE\s+)?(.+)` | Condition name | `88 WK-ACTIVE VALUE "Y"` |
+
+The trailing clauses of `RE_DATA_ITEM` are parsed by `parseDataItemClauses()` for PIC, USAGE, OCCURS, and REDEFINES.
+
+### PROCEDURE DIVISION
+
+| Constant | Pattern | Purpose | Example Match |
+|----------|---------|---------|---------------|
+| `RE_PROC_SECTION` | `^ ([A-Z][A-Z0-9-]+)\s+SECTION\.\s*$` | Procedure section header | ` MAIN-LOGIC SECTION.` |
+| `RE_PROC_PARAGRAPH` | `^ ([A-Z][A-Z0-9-]+)\.\s*$` | Paragraph header | ` PROCESS-RECORD.` |
+| `RE_PERFORM` | `\bPERFORM\s+([A-Z][A-Z0-9-]+)(?:\s+THRU\s+([A-Z][A-Z0-9-]+))?` | PERFORM call | `PERFORM CALC-TAX THRU CALC-TAX-EXIT` |
+| `RE_PROC_USING` | `\bPROCEDURE\s+DIVISION\s+USING\s+([\s\S]*?)(?:\.\|$)` | USING parameters | `PROCEDURE DIVISION USING WK-PARAM` |
+| `RE_ENTRY` | `\bENTRY\s+"([^"]+)"(?:\s+USING\s+([\s\S]*?))?(?:\.\|$)` | ENTRY point | `ENTRY "SUBPROG" USING WK-DATA` |
+| `RE_MOVE` | `\bMOVE\s+((?:CORRESPONDING\|CORR)\s+)?([A-Z][A-Z0-9-]+)\s+TO\s+(.+)` | MOVE statement (supports CORR abbreviation and multi-target) | `MOVE WK-NAME TO OUT-NAME`, `MOVE CORR WK-IN TO WK-OUT` |
+
+The USING parameter list (`RE_PROC_USING`) is split on `\bRETURNING\b` before tokenization -- any RETURNING clause and everything after it is excluded from the parameter list (`.split(/\bRETURNING\b/i)[0]`).
+
+Note: `RE_PROC_SECTION` and `RE_PROC_PARAGRAPH` require exactly 7 spaces of leading indentation (COBOL area A starting at column 8). This is the standard COBOL paragraph indentation.
+
+### All-Division Patterns
+
+These patterns are checked regardless of current division:
+
+| Constant | Pattern | Purpose | Example Match |
+|----------|---------|---------|---------------|
+| `RE_CALL` | `\bCALL\s+"([^"]+)"` | External program call | `CALL "BGTABUP"` |
+| `RE_COPY_UNQUOTED` | `\bCOPY\s+([A-Z][A-Z0-9-]+)(?:\s\|\.)` | COPY (unquoted) | `COPY CPSESP.` |
+| `RE_COPY_QUOTED` | `\bCOPY\s+"([^"]+)"(?:\s\|\.)` | COPY (quoted) | `COPY "WORKGRID.CPY".` |
+
+### SORT/MERGE Support
+
+| Constant | Purpose |
+|----------|---------|
+| `SORT_CLAUSE_NOISE` | Set of SORT/MERGE clause keywords filtered from USING/GIVING file lists: `ON`, `ASCENDING`, `DESCENDING`, `KEY`, `WITH`, `DUPLICATES`, `IN`, `ORDER`, `COLLATING`, `SEQUENCE`, `IS`, `THROUGH`, `THRU`, `INPUT`, `OUTPUT`, `PROCEDURE` |
+
+SORT and MERGE statements are accumulated across multiple lines (like SELECT) until a period terminator is found, then parsed for USING/GIVING file lists and INPUT/OUTPUT PROCEDURE targets. The `flushSort()` helper encapsulates the flush-and-parse logic, mirroring the existing `flushSelect()` pattern. Both helpers are called at EOF to handle truncated files.
+
+### GO TO Multi-Target
+
+`RE_GOTO` captures all paragraph names in a `GO TO` statement, including the multi-target form `GO TO p1 p2 p3 DEPENDING ON x`. The captured group contains all target names (space-separated), which are split into individual targets. Each target produces a separate `gotos` entry.
+
+### PROGRAM-ID Detection
+
+PROGRAM-ID is detected regardless of the current division state. This handles sibling programs that appear after `END PROGRAM` and omit the `IDENTIFICATION DIVISION` header -- the extractor will still capture the PROGRAM-ID and push a new program boundary.
+
+### EXEC Block Patterns
+
+| Constant | Pattern | Purpose | Example Match |
+|----------|---------|---------|---------------|
+| `RE_EXEC_SQL_START` | `\bEXEC\s+SQL\b` | Start of EXEC SQL block | `EXEC SQL` |
+| `RE_EXEC_CICS_START` | `\bEXEC\s+CICS\b` | Start of EXEC CICS block | `EXEC CICS` |
+| `RE_END_EXEC` | `\bEND-EXEC\b` | End of EXEC block | `END-EXEC` |
+
+EXEC blocks accumulate all lines between `EXEC SQL/CICS` and `END-EXEC`, then delegate to `parseExecSqlBlock()` or `parseExecCicsBlock()` for detailed extraction.
+
+## Excluded Paragraph Names
+
+The following names are excluded from paragraph detection to avoid false positives from division/section headers:
+
+```
+DECLARATIVES, END, PROCEDURE, IDENTIFICATION,
+ENVIRONMENT, DATA, WORKING-STORAGE, LINKAGE,
+FILE, LOCAL-STORAGE, COMMUNICATION, REPORT,
+SCREEN, INPUT-OUTPUT, CONFIGURATION
+```
+
+Additionally, paragraph candidates containing `DIVISION` or `SECTION` as substrings are excluded.
+
+## MOVE Skip List (Figurative Constants)
+
+MOVE statements where the source is a figurative constant are skipped:
+
+```
+SPACES, ZEROS, ZEROES, LOW-VALUES, LOW-VALUE,
+HIGH-VALUES, HIGH-VALUE, QUOTES, QUOTE, ALL
+```
+
+## Source Files
+
+- `gitnexus/src/core/ingestion/cobol-preprocessor.ts` -- `preprocessCobolSource()`, `extractCobolSymbolsWithRegex()`, all regex constants
diff --git a/docs/plans/2026-03-26-feat-cobol-full-language-coverage-plan.md b/docs/plans/2026-03-26-feat-cobol-full-language-coverage-plan.md
new file mode 100644
index 0000000000..b1a2e880ca
--- /dev/null
+++ b/docs/plans/2026-03-26-feat-cobol-full-language-coverage-plan.md
@@ -0,0 +1,326 @@
+---
+title: "feat: Complete COBOL language feature coverage for maximum knowledge graph value"
+type: feat
+status: active
+date: 2026-03-26
+origin: Feature audit from v3-integration-architect agent (session 8642401e)
+---
+
+## Enhancement Summary
+
+**Deepened on:** 2026-03-26
+**Research agents used:** COBOL expert (Phase 1+2), graph value analyst, codebase explorer
+**Sections enhanced:** Phase 1 (5 features), Phase 2 (4 features), graph value ranking
+
+### Key Improvements from Research
+1. **CALL USING** is the #1 highest-value edge type (9.2/10) — fixes ~40% of missing caller references
+2. **EXEC DLI** requires dual-interface support (EXEC DLI + CBLTDLI CALL) for full IMS coverage
+3. **DECLARATIVES** is lowest-risk Phase 2 item — existing section/paragraph detection already captures structure
+4. **SET TO TRUE** accounts for 80-90% of all SET statements — prioritize this form
+5. **INSPECT** needs multi-line accumulator (like SORT) — can span 5+ continuation lines
+6. **Graph value ranking**: cobol-call-using (9.2) > cobol-error-handler (9.0) > dli-gu (8.2) > cobol-string (6.2)
+
+### New Edge Cases Discovered
+- CALL USING supports mixed modes: `USING BY REFERENCE WS-A BY CONTENT WS-B BY VALUE WS-C`
+- CALL USING `ADDRESS OF` and `OMITTED` must be filtered from parameter lists
+- EXEC DLI can have multiple SEGMENT levels in hierarchical retrieval (use matchAll)
+- DECLARATIVES can have multiple USE sections (one per file + catch-all for INPUT/OUTPUT/I-O/EXTEND)
+- INSPECT TALLYING can have multiple counters in a single statement
+- STRING/UNSTRING can span multiple lines (need accumulator pattern)
+
+---
+
+# Complete COBOL Language Feature Coverage
+
+## Overview
+
+Implement the remaining 25 unhandled COBOL language features and fix 10 partial features to achieve ~95% coverage (up from 71.9%). The goal is to build the richest possible knowledge graph from COBOL codebases, enabling a future `modernize` MCP command (out of scope for this plan) that would use the graph to assist with COBOL-to-modern-language migration.
+
+## Problem Statement
+
+The COBOL processor currently handles 54 of 89 applicable language features (71.9%). The 25 unhandled features represent real data loss in the knowledge graph:
+- **Cross-program data flow** is invisible (CALL ... USING parameters not extracted)
+- **IMS/DB programs** produce empty graphs (EXEC DLI not recognized)
+- **String transformation logic** is invisible (STRING/UNSTRING/INSPECT not tracked)
+- **SQL copybook dependencies** are missing (EXEC SQL INCLUDE not mapped)
+- **Error handling flows** are lost (DECLARATIVES/USE AFTER not captured)
+
+## Proposed Solution
+
+Implement features in 4 phases, ordered by graph value density (edges created per LOC of implementation). Each phase is independently shippable and testable.
+
+## Technical Approach
+
+### Phase 1: High-Value Data Flow Edges (~150 LOC, ~8 new edge types)
+
+The highest-ROI features: they create new ACCESSES and IMPORTS edges that directly improve impact analysis.
+
+**Critical research finding**: Multi-line statement accumulation is the dominant challenge. CALL USING, STRING/UNSTRING, and multi-line data item clauses all span multiple lines in production COBOL. The free-format path processes each line independently — these features need statement accumulators (like SORT/SELECT) or the free-format path needs multi-line awareness. Estimated LOC increased from 110 to 150 to account for accumulator infrastructure.
+
+#### 1.1 EXEC SQL INCLUDE -> IMPORTS edges
+- **File:** `cobol-preprocessor.ts` (parseExecSqlBlock)
+- **What:** Detect `INCLUDE` as the operation, extract member name, emit as a `copies[]` entry
+- **Graph:** IMPORTS edge from File to included copybook/SQLCA with reason `sql-include`
+- **Tests:** Unit test for `EXEC SQL INCLUDE SQLCA END-EXEC` and `EXEC SQL INCLUDE CUSTCOPY END-EXEC`
+
+**Research insights (EXEC SQL INCLUDE):**
+- DB2 member names can contain underscores: `EXEC SQL INCLUDE CUST_TBL_DCL END-EXEC` — regex must use `[A-Z][A-Z0-9_-]+`
+- Quoted literal form: `EXEC SQL INCLUDE 'DBRMLIB.MEMBER' END-EXEC` (z/OS PDS qualified name)
+- SQLCA/SQLDA are DB2 builtins — won't resolve to repo files. Emit unresolved IMPORTS edge (still valuable)
+- No REPLACING support on EXEC SQL INCLUDE (unlike COPY)
+- Add `INCLUDE` to `OP_MAP` in `parseExecSqlBlock`; extract member via `RE_SQL_INCLUDE = /^INCLUDE\s+(?:'([^']+)'|"([^"]+)"|([A-Z][A-Z0-9_-]+))/i`
+
+#### 1.2 CALL ... USING parameter extraction -> ACCESSES edges (Graph value: 9.2/10)
+- **File:** `cobol-preprocessor.ts` (processLogicalLine CALL section)
+- **What:** After capturing CALL target, scan for USING clause. Extract parameter names (reuse USING_KEYWORDS filter). Store as `calls[].parameters: string[]`
+- **Interface:** Add `parameters?: string[]` to calls array type in CobolRegexResults
+- **File:** `cobol-processor.ts` (CALL edge block)
+- **Graph:** For each USING parameter, create ACCESSES edge from caller to data item Property node with reason `cobol-call-using`
+- **Tests:** `CALL 'AUDITLOG' USING CUST-ID WS-AMOUNT` -> 2 ACCESSES edges
+
+**Research insights (CALL USING forms):**
+- Mixed modes: `CALL 'PGM' USING BY REFERENCE WS-A BY CONTENT WS-B BY VALUE WS-C`
+- Pointer passing: `CALL 'PGM' USING ADDRESS OF WS-A`
+- Placeholder: `CALL 'PGM' USING OMITTED WS-B`
+- Filter keywords: add `ADDRESS`, `OMITTED`, `LENGTH` to USING_KEYWORDS (already has BY/VALUE/REFERENCE/CONTENT)
+- **Impact tool enhancement:** CALL-USING edges enable BFS traversal through parameter data flow — single most impactful edge type for COBOL impact analysis
+
+#### 1.3 STRING/UNSTRING data flow -> ACCESSES edges
+- **File:** `cobol-preprocessor.ts` (new section in extractProcedure)
+- **What:** Accumulate multi-line STRING/UNSTRING until period or END-STRING/END-UNSTRING. Extract sources and INTO targets.
+- **Interface:** Add `strings: Array<{ sources: string[]; target: string; type: 'string' | 'unstring'; line: number; caller: string | null }>` to CobolRegexResults
+- **Graph:** read-ACCESSES on sources, write-ACCESSES on INTO target with reason `cobol-string-read` / `cobol-string-write`
+- **Tests:** 2 unit tests + integration test assertions
+
+**Research insights (STRING/UNSTRING):**
+- **Needs statement accumulator** — STRING/UNSTRING always span multiple lines in production
+- Terminate accumulation at: period, END-STRING/END-UNSTRING, or start of next COBOL verb
+- STRING sources: identifiers before each `DELIMITED BY`. Filter: STRING, DELIMITED, BY, SIZE, ALL, INTO, WITH, POINTER, ON, OVERFLOW, NOT, END-STRING
+- UNSTRING: source is first identifier after UNSTRING; INTO targets are identifiers after INTO. Filter: DELIMITER, IN, COUNT, TALLYING, OR
+- WITH POINTER field is both read AND written (starting position updated)
+- TALLYING IN / COUNT IN fields are write targets
+- Literal sources (`'text'`) must be filtered — quote-aware tokenization needed
+- **Edge case**: STRING terminated by next verb, not period — existing fixture has `STRING ... DISPLAY` without period between them
+
+#### 1.4 OCCURS DEPENDING ON -> ACCESSES edge
+- **File:** `cobol-preprocessor.ts` (parseDataItemClauses)
+- **What:** Extend OCCURS regex to capture DEPENDING ON field, KEY fields, and INDEXED BY names
+- **Interface:** Add `dependingOn?: string`, `occursMax?: number`, `occursKeys?: Array<{direction: string; fields: string[]}>`, `indexedBy?: string[]` to data items
+- **Graph:** ACCESSES edge from table item to controlling field with reason `cobol-depends-on`
+- **Tests:** `05 WS-TABLE OCCURS 100 DEPENDING ON WS-COUNT` -> edge
+
+**Research insights (OCCURS):**
+- IBM allows `OCCURS 0 TO n DEPENDING ON` (zero minimum) and `OCCURS UNBOUNDED DEPENDING ON` (V6.4)
+- Subscripted controlling fields: `DEPENDING ON WS-COUNT(WS-IDX)` — strip subscripts before storing
+- **Pre-existing gap**: Multi-line data item clauses without continuation indicator are NOT captured. `05 WS-TABLE\n OCCURS 100\n DEPENDING ON WS-COUNT.` — the current RE_DATA_ITEM only gets the first line, `rest` is empty. Fixing properly requires a data item accumulator (like SELECT). **Defer full fix to Phase 3; implement same-line capture now.**
+- KEY IS fields: `ASCENDING KEY IS WS-KEY-1 WS-KEY-2` — capture for SEARCH ALL resolution
+- INDEXED BY: `INDEXED BY IDX-1 IDX-2` — capture for SET/SEARCH context
+
+#### 1.5 VALUE clause for standard data items
+- **File:** `cobol-preprocessor.ts` (parseDataItemClauses)
+- **What:** Extract VALUE using a pragmatic function that handles quoted strings, numerics, figurative constants, hex/national literals
+- **Interface:** Already exists as `values?: string[]` on data items (currently only populated for 88-level)
+- **Graph:** Stored in Property node description (no new edges)
+- **Tests:** `01 WS-STATUS PIC X VALUE 'A'` -> values: ['A']
+
+**Research insights (VALUE forms):**
+- Hex literals: `VALUE X'F1F2F3F4'`, National: `VALUE N'text'`, DBCS: `VALUE G'text'`
+- Figurative constants: SPACES, ZEROS, ZEROES, LOW-VALUES, HIGH-VALUES, QUOTES, NULL, NULLS
+- ALL literal: `VALUE ALL '*'`
+- Numeric with sign/decimal: `VALUE -123.45`, `VALUE +1`
+- `VALUE IS` optional — both `VALUE 'A'` and `VALUE IS 'A'` valid
+- **Decimal vs period ambiguity**: `VALUE 100.` — is `.` decimal or terminator? `parseDataItemClauses` already strips trailing period, so this is handled
+- IBM V6.4: floating-point `VALUE 1.0E5` — extend numeric regex if needed
+- Implementation: use a pragmatic `extractValue(rest)` function, not a single complex regex
+
+### Phase 2: EXEC DLI + DECLARATIVES (~90 LOC, ~4 new edge types)
+
+IMS/DB support and error handling flows.
+
+#### 2.1 EXEC DLI (IMS/DB) -> ACCESSES edges (Graph value: 8.2/10)
+- **File:** `cobol-preprocessor.ts` (processLogicalLine — add RE_EXEC_DLI_START check alongside SQL/CICS)
+- **What:** Accumulate EXEC DLI blocks like EXEC SQL. Parse DLI verbs (GU, GN, GNP, GHU, GHN, GHNP, ISRT, DLET, REPL, CHKP, SCHD, TERM). Extract segment name, PCB number, INTO/FROM areas, WHERE fields, PSB name.
+- **Interface:** Add `execDliBlocks: Array<{ line: number; verb: string; pcbNumber?: number; segmentName?: string; intoField?: string; fromField?: string; whereField?: string; psbName?: string }>` to CobolRegexResults
+- **Graph:** CodeElement node + ACCESSES edge to `:` Record node with reason `dli-{verb}`; ACCESSES edges to INTO/FROM data areas; PSB ACCESSES for SCHD
+- **Tests:** `EXEC DLI GU USING PCB(1) SEGMENT(CUSTOMER) INTO(WS-CUST) END-EXEC`
+
+**Research insights (dual IMS interface):**
+- **EXEC DLI**: Embedded command interface for CICS-DL/I programs only
+- **CBLTDLI CALL**: Batch interface via `CALL 'CBLTDLI' USING function-code PCB io-area SSA1..SSA15`
+- CBLTDLI is already captured as a CALL to 'CBLTDLI' — enrich with USING parameter semantics later
+- Multiple SEGMENT levels in hierarchical retrieval — use `matchAll` on segment regex
+- DLI verbs: GU (most common), GN, GNP, GHU, GHN, GHNP, ISRT, REPL, DLET, CHKP, SCHD, TERM, ROLL, ROLB
+- **Edge case**: DLET/REPL have no SEGMENT clause (operate on current position)
+- **Recommended order**: Implement AFTER DECLARATIVES and SET (lower risk, higher frequency)
+
+#### 2.2 DECLARATIVES / USE AFTER STANDARD EXCEPTION (Graph value: 9.0/10)
+- **File:** `cobol-preprocessor.ts` (processLogicalLine — detect DECLARATIVES keyword, track USE AFTER blocks)
+- **What:** When `DECLARATIVES.` is encountered, switch to declaratives mode. Extract USE statements binding sections to files/modes.
+- **Interface:** Add `declaratives: Array<{ sectionName: string; useType: 'error' | 'debug' | 'label' | 'reporting'; target: string; line: number }>` to CobolRegexResults
+- **Graph:** ACCESSES edge from declarative Namespace to file Record with reason `cobol-declarative-error-handler`
+- **Tests:** Unit test with DECLARATIVES section, integration test for error flow
+
+**Research insights (DECLARATIVES syntax):**
+- `USE AFTER STANDARD {EXCEPTION|ERROR} ON {file-name|INPUT|OUTPUT|I-O|EXTEND}`
+- EXCEPTION and ERROR are synonymous; STANDARD is optional in IBM dialects
+- Multiple USE sections allowed (one per file + catch-all for I/O modes)
+- `END DECLARATIVES.` must NOT reset PROCEDURE DIVISION state
+- `DECLARATIVES` is already in EXCLUDED_PARA_NAMES — no false paragraph risk
+- Existing section/paragraph detection already captures structural elements — just need USE binding
+- **Lowest risk Phase 2 item** — implement first
+
+#### 2.3 SET statement -> ACCESSES edges
+- **File:** `cobol-preprocessor.ts` (extractProcedure — new RE_SET regex)
+- **Interface:** Add `sets: Array<{ targets: string[]; form: 'to-true'|'to-value'|'up-by'|'down-by'|'address-of'|'to-null'|'to-entry'; value?: string; entryTarget?: string; entryIsLiteral?: boolean; line: number; caller: string | null }>` to CobolRegexResults
+- **Graph:** ACCESSES write edge with reason `cobol-set-condition` (TO TRUE), `cobol-set-index` (TO/UP/DOWN), `cobol-set-address` (ADDRESS OF). SET ENTRY with literal -> CALLS edge.
+- **Tests:** `SET WS-EOF TO TRUE`, `SET IDX-1 TO 5`, `SET IDX-1 UP BY 1`
+
+**Research insights (SET forms by frequency):**
+- `SET condition TO TRUE` — 80-90% of all SET usage. Multiple targets: `SET COND-A COND-B TO TRUE`
+- `SET index TO/UP BY/DOWN BY` — ~8%. Multiple indices: `SET IDX-1 IDX-2 UP BY 1`
+- `SET pointer TO ADDRESS OF data-item` / `SET ADDRESS OF data-item TO pointer` — ~2%
+- `SET proc-ptr TO ENTRY "PROGNAME"` — rare but creates CALLS edge (like dynamic CALL)
+- Filter OF/IN qualifiers: `SET COND-A OF WS-RECORD TO TRUE` (strip OF WS-RECORD)
+- **Prioritize**: SET TO TRUE alone covers 80-90% — implement this form first
+
+#### 2.4 INSPECT -> ACCESSES edges
+- **File:** `cobol-preprocessor.ts` (extractProcedure — new `inspectAccum` accumulator like SORT)
+- **What:** Accumulate multi-line INSPECT until period. Extract inspected field + tally counters.
+- **Interface:** Add `inspects: Array<{ inspectedField: string; counters: string[]; form: 'tallying'|'replacing'|'converting'|'tallying-replacing'; line: number; caller: string | null }>` to CobolRegexResults
+- **Graph:** ACCESSES read on inspected field always; write if REPLACING/CONVERTING. Write edges for tally counters. Reason: `cobol-inspect-read`/`cobol-inspect-write`/`cobol-inspect-tally`
+- **Tests:** `INSPECT WS-FIELD TALLYING WS-COUNT FOR ALL 'A'` -> read on WS-FIELD, write on WS-COUNT
+
+**Research insights (INSPECT forms by frequency):**
+- REPLACING (~60%): `INSPECT WS-STR REPLACING ALL 'A' BY 'B'`
+- TALLYING (~25%): `INSPECT WS-STR TALLYING WS-CNT FOR ALL 'A'` — multiple counters possible
+- CONVERTING (~10%): `INSPECT WS-STR CONVERTING 'abc' TO 'ABC'`
+- Combined (~5%): TALLYING + REPLACING in single statement
+- **Needs multi-line accumulator** — INSPECT frequently spans 3-5 lines in production
+- Extract tally counters with `([A-Z][A-Z0-9-]+)\s+FOR\b` matchAll pattern
+- Filter figurative constants (SPACES, ZEROS) using existing MOVE_SKIP set
+
+### Phase 3: Completeness Fixes (~60 LOC)
+
+Fix the 10 partial features and small gaps.
+
+#### 3.1 CALL ... RETURNING extraction
+- Extend RE_CALL processing to capture RETURNING target after the USING clause
+- Store as `calls[].returning?: string`
+- Graph: ACCESSES write edge with reason `cobol-call-returning`
+
+#### 3.2 SELECT OPTIONAL flag preservation
+- Store `isOptional: boolean` in FileDeclaration interface
+- Include in Record node description
+
+#### 3.3 ALTERNATE RECORD KEY extraction
+- Add regex in parseSelectStatement: `/\bALTERNATE\s+RECORD\s+KEY\s+(?:IS\s+)?([A-Z][A-Z0-9-]+)/i`
+- Store as `alternateKeys?: string[]`
+
+#### 3.4 COMMON attribute on nested programs
+- Extend RE_PROGRAM_ID: `/\bPROGRAM-ID\.\s*([A-Z][A-Z0-9-]+)(?:\s+IS\s+COMMON)?/i`
+- Store `isCommon: boolean` on Module node
+- Affects cross-program CALL resolution scope
+
+#### 3.5 IS EXTERNAL / IS GLOBAL as first-class properties
+- Change from usage string hack to proper boolean fields on data items
+- Add `isExternal?: boolean`, `isGlobal?: boolean` to data item interface
+
+#### 3.6 AUTHOR / DATE-WRITTEN mapped to Module node
+- Already extracted as programMetadata — map to Module node properties
+- `graph.addNode({ ..., properties: { ..., author, dateWritten } })`
+
+#### 3.7 REPLACE statement
+- Track REPLACE / REPLACE OFF state in preprocessor
+- Apply text substitutions during preprocessing (before regex extraction)
+- Complex: requires careful scoping rules
+
+### Phase 4: Niche Features (~30 LOC)
+
+Low-priority but nice for completeness.
+
+#### 4.1 INITIALIZE statement -> write ACCESSES
+- `/\bINITIALIZE\s+([A-Z][A-Z0-9-]+)/i`
+- ACCESSES write edge with reason `cobol-initialize`
+
+#### 4.2 Remaining IDENTIFICATION DIVISION paragraphs
+- DATE-COMPILED, INSTALLATION, SECURITY, REMARKS
+- Map to Module node description properties
+
+#### 4.3 EXEC SQL INCLUDE -> IMPORTS edge (expansion)
+- For EXEC SQL INCLUDE inside EXEC blocks that reference copybooks containing SQL
+- Create IMPORTS edge similar to COPY
+
+## Acceptance Criteria
+
+### Functional Requirements
+
+- [ ] Phase 1: All 5 features implemented with unit + integration tests
+- [ ] Phase 2: All 4 features implemented with unit + integration tests
+- [ ] Phase 3: All 7 partial features fixed
+- [ ] Phase 4: At least 2 of 3 niche features implemented
+- [ ] All existing 145 tests continue to pass
+- [ ] TypeScript compiles cleanly
+
+### Non-Functional Requirements
+
+- [ ] No performance regression: CardDemo benchmark stays under 8s
+- [ ] No file exceeds 1500 LOC (preprocessor currently 1326)
+- [ ] ACAS benchmark shows increased node/edge counts (more data extracted)
+- [ ] CardDemo benchmark shows increased edge counts (CALL USING, STRING, etc.)
+
+### Quality Gates
+
+- [ ] Each phase has its own commit
+- [ ] Integration test assertions updated with exact counts per phase
+- [ ] Benchmark run after each phase to track graph growth
+
+## Dependencies & Risks
+
+### Dependencies
+- None. All changes are additive to existing COBOL processor code.
+- No LanguageProvider changes needed.
+- No graph schema changes needed (all new constructs map to existing node labels + edge types).
+
+### Risks
+- **preprocessor.ts size**: Currently 1326 LOC. Phase 1+2 adds ~200 LOC -> 1526 LOC. May need to extract helpers into a separate `cobol-data-flow.ts` module if it exceeds 1500.
+- **REPLACE statement** (Phase 3.7) is the most complex feature — requires tracking text substitution state across logical lines. Consider deferring to a separate PR if it takes >100 LOC.
+- **EXEC DLI** (Phase 2.1) is only testable against IMS codebases. Need fixture data or synthetic test cases.
+
+## Graph Value Ranking by MCP Tool Impact
+
+Research agent analyzed all 5 MCP tools (query, context, impact, detect_changes, rename) against planned edge types:
+
+| Edge Type | QUERY | CONTEXT | IMPACT | DETECT | RENAME | **Overall** |
+|-----------|-------|---------|--------|--------|--------|-------------|
+| `cobol-call-using` | 4/5 | 5/5 | 5/5 | 4/5 | 4/5 | **9.2/10** |
+| `cobol-error-handler` | 5/5 | 4/5 | 5/5 | 5/5 | 2/5 | **9.0/10** |
+| `dli-*` (IMS verbs) | 4/5 | 4/5 | 5/5 | 4/5 | 2/5 | **8.2/10** |
+| `cobol-string-*` | 4/5 | 3/5 | 3/5 | 3/5 | 2/5 | **6.2/10** |
+
+**Key finding**: `cobol-call-using` alone would fix ~40% of missing caller references in COBOL graphs.
+
+## Future Considerations
+
+This plan provides the graph data foundation for a future `modernize` MCP command (out of scope) that would:
+- Use CALL USING edges to map data contracts between programs
+- Use STRING/UNSTRING edges to identify data transformation logic
+- Use EXEC SQL/DLI edges to map database access patterns
+- Use DECLARATIVES to understand error handling architecture
+- Use the complete knowledge graph to generate migration plans
+
+**MCP tool enhancements needed** (after this plan ships):
+- Add `cobol-call-using`, `cobol-error-handler`, `dli-*` to IMPACT tool's default `relationTypes` for COBOL repos
+- Add confidence floors for new edge types in `IMPACT_RELATION_CONFIDENCE`
+- Register new edge types in `VALID_RELATION_TYPES` set (`local-backend.ts:52`)
+
+## Sources & References
+
+### Internal References
+- Feature audit: session 8642401e (COBOL expert agent, 123 features audited)
+- Prior plans: `docs/plans/2026-03-25-feat-cobol-100-percent-feature-coverage-plan.md`
+- Architecture: `docs/code-indexing/cobol/` (7 documentation files)
+
+### External References
+- COBOL features reference: mainframestechhelp.com/tutorials/cobol/features.htm
+- COBOL-85 standard: ISO/IEC 1989:1985
+- IBM Enterprise COBOL reference
diff --git a/gitnexus/src/config/supported-languages.ts b/gitnexus/src/config/supported-languages.ts
index a35c3d2b11..91a654e805 100644
--- a/gitnexus/src/config/supported-languages.ts
+++ b/gitnexus/src/config/supported-languages.ts
@@ -42,4 +42,6 @@ export enum SupportedLanguages {
Kotlin = 'kotlin',
Swift = 'swift',
Dart = 'dart',
+ /** Standalone regex processor — no tree-sitter, no LanguageProvider. */
+ Cobol = 'cobol',
}
diff --git a/gitnexus/src/core/graph/graph.ts b/gitnexus/src/core/graph/graph.ts
index 4658131ccb..b0a641ec69 100644
--- a/gitnexus/src/core/graph/graph.ts
+++ b/gitnexus/src/core/graph/graph.ts
@@ -34,7 +34,15 @@ export const createKnowledgeGraph = (): KnowledgeGraph => {
};
/**
- * Remove all nodes (and their relationships) belonging to a file
+ * Remove a single relationship by id.
+ * Returns true if the relationship existed and was removed, false otherwise.
+ */
+ const removeRelationship = (relationshipId: string): boolean => {
+ return relationshipMap.delete(relationshipId);
+ };
+
+ /**
+ * Remove all nodes (and their relationships) belonging to a file.
*/
const removeNodesByFile = (filePath: string): number => {
let removed = 0;
@@ -75,6 +83,7 @@ export const createKnowledgeGraph = (): KnowledgeGraph => {
addRelationship,
removeNode,
removeNodesByFile,
+ removeRelationship,
};
};
diff --git a/gitnexus/src/core/graph/types.ts b/gitnexus/src/core/graph/types.ts
index 594a94c7e4..69cfc81fed 100644
--- a/gitnexus/src/core/graph/types.ts
+++ b/gitnexus/src/core/graph/types.ts
@@ -141,4 +141,5 @@ export interface KnowledgeGraph {
addRelationship: (relationship: GraphRelationship) => void,
removeNode: (nodeId: string) => boolean,
removeNodesByFile: (filePath: string) => number,
+ removeRelationship: (relationshipId: string) => boolean,
}
diff --git a/gitnexus/src/core/ingestion/cobol-processor.ts b/gitnexus/src/core/ingestion/cobol-processor.ts
new file mode 100644
index 0000000000..8e7983d813
--- /dev/null
+++ b/gitnexus/src/core/ingestion/cobol-processor.ts
@@ -0,0 +1,1308 @@
+/**
+ * COBOL Processor
+ *
+ * Standalone regex-based processor for COBOL and JCL files.
+ * Follows the markdown-processor.ts pattern: takes (graph, files, allPathSet),
+ * does its own extraction, and writes directly to the graph.
+ *
+ * Pipeline:
+ * 1. Separate programs from copybooks
+ * 2. Build copybook map (name -> content)
+ * 3. For each program: expand COPY statements, then run regex extraction
+ * 4. Map CobolRegexResults to graph nodes and relationships
+ * 5. Optionally process JCL files for job-step cross-references
+ */
+
+import path from 'node:path';
+import { generateId } from '../../lib/utils.js';
+import { SupportedLanguages } from '../../config/supported-languages.js';
+import type { KnowledgeGraph } from '../graph/types.js';
+import {
+ preprocessCobolSource,
+ extractCobolSymbolsWithRegex,
+ type CobolRegexResults,
+} from './cobol/cobol-preprocessor.js';
+import { expandCopies } from './cobol/cobol-copy-expander.js';
+import { processJclFiles } from './cobol/jcl-processor.js';
+
+// ---------------------------------------------------------------------------
+// File detection
+// ---------------------------------------------------------------------------
+
+const COBOL_EXTENSIONS = new Set([
+ '.cob', '.cbl', '.cobol', '.cpy', '.copybook',
+]);
+
+const JCL_EXTENSIONS = new Set(['.jcl', '.job', '.proc']);
+
+const COPYBOOK_EXTENSIONS = new Set(['.cpy', '.copybook']);
+
+interface CobolFile {
+ path: string;
+ content: string;
+}
+
+export interface CobolProcessResult {
+ programs: number;
+ paragraphs: number;
+ sections: number;
+ dataItems: number;
+ calls: number;
+ copies: number;
+ execSqlBlocks: number;
+ execCicsBlocks: number;
+ entryPoints: number;
+ moves: number;
+ fileDeclarations: number;
+ jclJobs: number;
+ jclSteps: number;
+ sqlIncludes: number;
+ execDliBlocks: number;
+ declaratives: number;
+ sets: number;
+ inspects: number;
+ initializes: number;
+}
+
+/** Returns true if the file is a COBOL or copybook file. */
+export function isCobolFile(filePath: string): boolean {
+ return COBOL_EXTENSIONS.has(path.extname(filePath).toLowerCase());
+}
+
+/** Returns true if the file is a JCL file. */
+export function isJclFile(filePath: string): boolean {
+ return JCL_EXTENSIONS.has(path.extname(filePath).toLowerCase());
+}
+
+/** Returns true if the file is a COBOL copybook. */
+function isCopybook(filePath: string): boolean {
+ return COPYBOOK_EXTENSIONS.has(path.extname(filePath).toLowerCase());
+}
+
+// ---------------------------------------------------------------------------
+// Main processor
+// ---------------------------------------------------------------------------
+
+/**
+ * Process COBOL and JCL files into the knowledge graph.
+ *
+ * @param graph - The in-memory knowledge graph
+ * @param files - Array of { path, content } for COBOL/JCL files
+ * @param allPathSet - Set of all file paths in the repository
+ * @returns Summary of what was extracted
+ */
+export const processCobol = (
+ graph: KnowledgeGraph,
+ files: CobolFile[],
+ allPathSet: Set,
+): CobolProcessResult => {
+ const result: CobolProcessResult = {
+ programs: 0,
+ paragraphs: 0,
+ sections: 0,
+ dataItems: 0,
+ calls: 0,
+ copies: 0,
+ execSqlBlocks: 0,
+ execCicsBlocks: 0,
+ entryPoints: 0,
+ moves: 0,
+ fileDeclarations: 0,
+ jclJobs: 0,
+ jclSteps: 0,
+ sqlIncludes: 0,
+ execDliBlocks: 0,
+ declaratives: 0,
+ sets: 0,
+ inspects: 0,
+ initializes: 0,
+ };
+
+ // ── 1. Separate programs, copybooks, and JCL ───────────────────────
+ const programs: CobolFile[] = [];
+ const copybooks: CobolFile[] = [];
+ const jclFiles: CobolFile[] = [];
+
+ for (const file of files) {
+ const ext = path.extname(file.path).toLowerCase();
+ if (JCL_EXTENSIONS.has(ext)) {
+ jclFiles.push(file);
+ } else if (isCopybook(file.path)) {
+ copybooks.push(file);
+ } else if (COBOL_EXTENSIONS.has(ext)) {
+ programs.push(file);
+ }
+ }
+
+ // ── 2. Build copybook map (uppercase name -> content) ──────────────
+ const copybookMap = new Map();
+ for (const cb of copybooks) {
+ const name = path.basename(cb.path, path.extname(cb.path)).toUpperCase();
+ copybookMap.set(name, { content: cb.content, path: cb.path });
+ }
+
+ // Build reverse lookup: path -> content for O(1) readCopy
+ const copybookByPath = new Map();
+ for (const [, entry] of copybookMap) {
+ copybookByPath.set(entry.path, entry.content);
+ }
+
+ // Resolve and read callbacks for expandCopies
+ const resolveCopy = (name: string): string | null => {
+ const entry = copybookMap.get(name.toUpperCase());
+ return entry ? entry.path : null;
+ };
+ const readCopy = (copyPath: string): string | null => {
+ const content = copybookByPath.get(copyPath);
+ return content ? preprocessCobolSource(content) : null;
+ };
+
+ // Track module names for cross-program CALL resolution
+ const moduleNodeIds = new Map(); // uppercase program name -> node id
+
+ // ── 3. Process each COBOL program ──────────────────────────────────
+ for (const file of programs) {
+ const fileNodeId = generateId('File', file.path);
+ // Skip if file node doesn't exist (structure-processor creates it)
+ if (!graph.getNode(fileNodeId)) continue;
+
+ // Preprocess: clean patch markers
+ const cleaned = preprocessCobolSource(file.content);
+
+ // Expand COPY statements
+ const { expandedContent, copyResolutions } = expandCopies(
+ cleaned, file.path, resolveCopy, readCopy,
+ );
+
+ // Extract symbols from expanded source
+ const extracted = extractCobolSymbolsWithRegex(expandedContent, file.path);
+
+ // Map to graph
+ mapToGraph(graph, extracted, file, copyResolutions, moduleNodeIds);
+
+ // Accumulate stats
+ result.programs += extracted.programs.length || (extracted.programName ? 1 : 0);
+ result.paragraphs += extracted.paragraphs.length;
+ result.sections += extracted.sections.length;
+ result.dataItems += extracted.dataItems.length;
+ result.calls += extracted.calls.length;
+ result.copies += extracted.copies.length;
+ result.execSqlBlocks += extracted.execSqlBlocks.length;
+ result.sqlIncludes += extracted.execSqlBlocks.filter(s => s.includeMember).length;
+ result.execCicsBlocks += extracted.execCicsBlocks.length;
+ result.entryPoints += extracted.entryPoints.length;
+ result.moves += extracted.moves.length;
+ result.fileDeclarations += extracted.fileDeclarations.length;
+ result.execDliBlocks += extracted.execDliBlocks.length;
+ result.declaratives += extracted.declaratives.length;
+ result.sets += extracted.sets.length;
+ result.inspects += extracted.inspects.length;
+ result.initializes += extracted.initializes.length;
+ }
+
+ // ── 4. Second pass: resolve cross-program CALL targets ─────────────
+ // During mapToGraph, early programs create unresolved CALL edges
+ // (target = :PROGNAME) because later programs haven't
+ // been registered in moduleNodeIds yet. Now that ALL programs are
+ // processed, re-scan unresolved CALLS edges and patch them.
+ // This covers both `cobol-call-unresolved` and CICS LINK/XCTL edges
+ // whose targets contain `:`.
+ const unresolvedToRemove: string[] = [];
+
+ graph.forEachRelationship(rel => {
+ if (rel.type !== 'CALLS') return;
+ const match = rel.targetId.match(/:(.+)/);
+ if (!match) return;
+ const resolvedId = moduleNodeIds.get(match[1]);
+ if (!resolvedId) return;
+
+ if (rel.reason?.startsWith('cobol-call-unresolved') || rel.reason === 'cobol-cancel-unresolved') {
+ // Replace unresolved CALL/CANCEL with resolved edge
+ const resolvedReason = rel.reason === 'cobol-cancel-unresolved' ? 'cobol-cancel' : 'cobol-call';
+ graph.addRelationship({
+ id: rel.id + ':resolved',
+ type: 'CALLS',
+ sourceId: rel.sourceId,
+ targetId: resolvedId,
+ confidence: rel.reason === 'cobol-cancel-unresolved' ? 0.9 : 0.95,
+ reason: resolvedReason,
+ });
+ } else if (rel.reason?.startsWith('cics-') && rel.reason.endsWith('-unresolved')) {
+ // Replace unresolved CICS LINK/XCTL/LOAD with resolved edge
+ graph.addRelationship({
+ id: rel.id + ':resolved',
+ type: 'CALLS',
+ sourceId: rel.sourceId,
+ targetId: resolvedId,
+ confidence: 0.95,
+ reason: rel.reason.replace('-unresolved', ''),
+ });
+ }
+
+ // Mark original unresolved edge for removal after iteration
+ unresolvedToRemove.push(rel.id);
+ });
+
+ // Remove orphan unresolved edges (cannot delete during Map.forEach iteration)
+ for (const id of unresolvedToRemove) {
+ graph.removeRelationship(id);
+ }
+
+ // ── 5. Process JCL files ───────────────────────────────────────────
+ if (jclFiles.length > 0) {
+ const jclPaths = jclFiles.map(f => f.path);
+ const jclContents = new Map();
+ for (const f of jclFiles) {
+ jclContents.set(f.path, f.content);
+ }
+ const jclResult = processJclFiles(graph, jclPaths, jclContents);
+ result.jclJobs += jclResult.jobCount;
+ result.jclSteps += jclResult.stepCount;
+ }
+
+ return result;
+};
+
+// ---------------------------------------------------------------------------
+// Graph mapping
+// ---------------------------------------------------------------------------
+
+/** Generate a deterministic Property node ID using composite key (section:level:name). */
+function generatePropertyId(
+ filePath: string,
+ item: { section: string; level: number; name: string },
+): string {
+ return generateId('Property', `${filePath}:${item.section}:${item.level}:${item.name}`);
+}
+
+/**
+ * Build a lookup Map from data item name (uppercase) to its Property node ID.
+ * First-wins semantics: if the same name appears in multiple sections,
+ * the first occurrence in extraction order is used for MOVE edge resolution.
+ */
+function buildDataItemMap(
+ dataItems: CobolRegexResults['dataItems'],
+ filePath: string,
+): Map {
+ const map = new Map();
+ for (const item of dataItems) {
+ if (item.name === 'FILLER') continue;
+ const key = item.name.toUpperCase();
+ if (!map.has(key)) {
+ map.set(key, generatePropertyId(filePath, item));
+ }
+ }
+ return map;
+}
+
+function mapToGraph(
+ graph: KnowledgeGraph,
+ extracted: CobolRegexResults,
+ file: CobolFile,
+ copyResolutions: Array<{ copyTarget: string; resolvedPath: string | null; line: number }>,
+ moduleNodeIds: Map,
+): void {
+ const { path: filePath, content } = file;
+ const lines = content.split(/\r?\n/);
+ const fileNodeId = generateId('File', filePath);
+
+ // ── PROGRAM-ID -> Module node ────────────────────────────────────
+ let moduleId: string | undefined;
+ if (extracted.programName) {
+ moduleId = generateId('Module', `${filePath}:${extracted.programName}`);
+ const metaDesc = [
+ extracted.programMetadata.author && `author:${extracted.programMetadata.author}`,
+ extracted.programMetadata.dateWritten && `date:${extracted.programMetadata.dateWritten}`,
+ extracted.programMetadata.dateCompiled && `compiled:${extracted.programMetadata.dateCompiled}`,
+ extracted.programMetadata.installation && `install:${extracted.programMetadata.installation}`,
+ ].filter(Boolean).join(' ');
+ graph.addNode({
+ id: moduleId,
+ label: 'Module',
+ properties: {
+ name: extracted.programName,
+ filePath,
+ startLine: 1,
+ endLine: lines.length,
+ language: SupportedLanguages.Cobol,
+ isExported: true,
+ description: metaDesc || undefined,
+ },
+ });
+ graph.addRelationship({
+ id: generateId('CONTAINS', `${fileNodeId}->${moduleId}`),
+ type: 'CONTAINS',
+ sourceId: fileNodeId,
+ targetId: moduleId,
+ confidence: 1.0,
+ reason: 'cobol-program-id',
+ });
+ moduleNodeIds.set(extracted.programName.toUpperCase(), moduleId);
+ }
+
+ // ── Nested programs -> additional Module nodes ───────────────────
+ // programs[] contains all PROGRAM-IDs with line ranges. The first entry
+ // is the primary (outer) program (already created above). Additional
+ // entries are nested programs that get their own Module nodes.
+ const programModuleIds = new Map();
+ if (moduleId) {
+ programModuleIds.set(extracted.programName!.toUpperCase(), moduleId);
+ }
+ for (const prog of extracted.programs) {
+ if (prog.name.toUpperCase() === extracted.programName?.toUpperCase()) continue; // skip primary
+ const nestedModuleId = generateId('Module', `${filePath}:${prog.name}`);
+ graph.addNode({
+ id: nestedModuleId,
+ label: 'Module',
+ properties: {
+ name: prog.name,
+ filePath,
+ startLine: prog.startLine,
+ endLine: prog.endLine,
+ language: SupportedLanguages.Cobol,
+ isExported: true,
+ description: `nested-program${prog.isCommon ? ' common' : ''}`,
+ },
+ });
+ // Find enclosing program by line-range containment
+ const enclosing = extracted.programs.find(p =>
+ p.startLine < prog.startLine && p.endLine > prog.endLine && p.nestingDepth < prog.nestingDepth,
+ );
+ const nestedParent = enclosing
+ ? (programModuleIds.get(enclosing.name.toUpperCase()) ?? moduleId ?? fileNodeId)
+ : (moduleId ?? fileNodeId);
+ graph.addRelationship({
+ id: generateId('CONTAINS', `${nestedParent}->${nestedModuleId}`),
+ type: 'CONTAINS',
+ sourceId: nestedParent,
+ targetId: nestedModuleId,
+ confidence: 1.0,
+ reason: 'cobol-nested-program',
+ });
+ moduleNodeIds.set(prog.name.toUpperCase(), nestedModuleId);
+ programModuleIds.set(prog.name.toUpperCase(), nestedModuleId);
+ }
+
+ const parentId = moduleId ?? fileNodeId;
+
+ // ── SECTIONs -> Namespace nodes ──────────────────────────────────
+ const sectionNodeIds = new Map();
+ for (let i = 0; i < extracted.sections.length; i++) {
+ const sec = extracted.sections[i];
+ const nextLine = i + 1 < extracted.sections.length
+ ? extracted.sections[i + 1].line - 1
+ : lines.length;
+ const owningPgm = findOwningProgramName(sec.line, extracted.programs);
+ const secId = generateId('Namespace', `${filePath}:${owningPgm ? owningPgm + ':' : ''}${sec.name}`);
+ graph.addNode({
+ id: secId,
+ label: 'Namespace',
+ properties: {
+ name: sec.name,
+ filePath,
+ startLine: sec.line,
+ endLine: nextLine,
+ language: SupportedLanguages.Cobol,
+ isExported: true,
+ },
+ });
+ const secParent = programModuleIds.get(owningPgm ?? '') ?? parentId;
+ graph.addRelationship({
+ id: generateId('CONTAINS', `${secParent}->${secId}`),
+ type: 'CONTAINS',
+ sourceId: secParent,
+ targetId: secId,
+ confidence: 1.0,
+ reason: 'cobol-section',
+ });
+ sectionNodeIds.set(`${owningPgm ?? ''}:${sec.name.toUpperCase()}`, secId);
+ }
+
+ // ── PARAGRAPHs -> Function nodes ─────────────────────────────────
+ const paraNodeIds = new Map();
+ for (let i = 0; i < extracted.paragraphs.length; i++) {
+ const para = extracted.paragraphs[i];
+ const nextLine = i + 1 < extracted.paragraphs.length
+ ? extracted.paragraphs[i + 1].line - 1
+ : lines.length;
+ const owningPgmPara = findOwningProgramName(para.line, extracted.programs);
+ const paraId = generateId('Function', `${filePath}:${owningPgmPara ? owningPgmPara + ':' : ''}${para.name}`);
+ graph.addNode({
+ id: paraId,
+ label: 'Function',
+ properties: {
+ name: para.name,
+ filePath,
+ startLine: para.line,
+ endLine: nextLine,
+ language: SupportedLanguages.Cobol,
+ isExported: true,
+ },
+ });
+ // Parent: find the containing section, or fall back to module/file
+ const containerId = findContainingSection(para.line, extracted.sections, sectionNodeIds, extracted.programs)
+ ?? (programModuleIds.get(owningPgmPara ?? '') ?? parentId);
+ graph.addRelationship({
+ id: generateId('CONTAINS', `${containerId}->${paraId}`),
+ type: 'CONTAINS',
+ sourceId: containerId,
+ targetId: paraId,
+ confidence: 1.0,
+ reason: 'cobol-paragraph',
+ });
+ paraNodeIds.set(`${owningPgmPara ?? ''}:${para.name.toUpperCase()}`, paraId);
+ }
+
+ // ── Data items -> Property nodes ─────────────────────────────────
+ for (const item of extracted.dataItems) {
+ if (item.name === 'FILLER') continue; // Skip anonymous fillers
+ const propId = generatePropertyId(filePath, item);
+ const itemOwner = findOwningProgramName(item.line, extracted.programs);
+ const itemParent = programModuleIds.get(itemOwner ?? '') ?? parentId;
+ graph.addNode({
+ id: propId,
+ label: 'Property',
+ properties: {
+ name: item.name,
+ filePath,
+ startLine: item.line,
+ endLine: item.line,
+ language: SupportedLanguages.Cobol,
+ description: `level:${item.level} section:${item.section}${item.pic ? ` pic:${item.pic}` : ''}`,
+ },
+ });
+ graph.addRelationship({
+ id: generateId('CONTAINS', `${itemParent}->${propId}`),
+ type: 'CONTAINS',
+ sourceId: itemParent,
+ targetId: propId,
+ confidence: 1.0,
+ reason: 'cobol-data-item',
+ });
+ }
+
+ // ── Build data item Map early (needed by CALL USING, CICS INTO/FROM, MOVE, and USING) ──
+ const dataItemMap = buildDataItemMap(extracted.dataItems, filePath);
+
+ // ── OCCURS DEPENDING ON -> ACCESSES edges (variable-length table deps) ──
+ for (const item of extracted.dataItems) {
+ if (item.name === 'FILLER' || !item.dependingOn) continue;
+ const propId = generatePropertyId(filePath, item);
+ const depFieldId = dataItemMap.get(item.dependingOn.toUpperCase());
+ if (depFieldId) {
+ graph.addRelationship({
+ id: generateId('ACCESSES', `${propId}->depends-on->${item.dependingOn}`),
+ type: 'ACCESSES',
+ sourceId: propId,
+ targetId: depFieldId,
+ confidence: 1.0,
+ reason: 'cobol-depends-on',
+ });
+ }
+ }
+
+ // Helper: look up paragraph/section by name scoped to the owning program
+ const scopedParaLookup = (name: string, lineNum: number): string | undefined => {
+ const pgm = findOwningProgramName(lineNum, extracted.programs);
+ return paraNodeIds.get(`${pgm ?? ''}:${name.toUpperCase()}`)
+ ?? sectionNodeIds.get(`${pgm ?? ''}:${name.toUpperCase()}`);
+ };
+ const scopedCallerLookup = (name: string | null, lineNum: number): string => {
+ if (!name) return owningModuleId(lineNum);
+ const pgm = findOwningProgramName(lineNum, extracted.programs);
+ return paraNodeIds.get(`${pgm ?? ''}:${name.toUpperCase()}`)
+ ?? (programModuleIds.get(pgm ?? '') ?? parentId);
+ };
+ /** Resolve the owning program's module ID for a given line (for nested program edge attribution). */
+ const owningModuleId = (lineNum: number): string => {
+ const pgm = findOwningProgramName(lineNum, extracted.programs);
+ return programModuleIds.get(pgm ?? '') ?? parentId;
+ };
+
+ // ── PERFORM -> CALLS relationship (intra-file) ──────────────────
+ for (const perf of extracted.performs) {
+ const targetId = scopedParaLookup(perf.target, perf.line);
+ if (!targetId) continue;
+
+ // Source: the paragraph containing the PERFORM, or the module
+ const sourceId = scopedCallerLookup(perf.caller, perf.line);
+
+ graph.addRelationship({
+ id: generateId('CALLS', `${sourceId}->perform->${targetId}:L${perf.line}`),
+ type: 'CALLS',
+ sourceId,
+ targetId,
+ confidence: 1.0,
+ reason: 'cobol-perform',
+ });
+
+ // PERFORM THRU -> expanded CALLS edge to thru target
+ if (perf.thruTarget) {
+ const thruTargetId = scopedParaLookup(perf.thruTarget, perf.line);
+ if (thruTargetId && thruTargetId !== targetId) {
+ graph.addRelationship({
+ id: generateId('CALLS', `${sourceId}->perform-thru->${thruTargetId}:L${perf.line}`),
+ type: 'CALLS',
+ sourceId,
+ targetId: thruTargetId,
+ confidence: 1.0,
+ reason: 'cobol-perform-thru',
+ });
+ }
+ }
+ }
+
+ // ── CALL -> CALLS relationship (cross-program) ──────────────────
+ for (const call of extracted.calls) {
+ if (!call.isQuoted) {
+ // Dynamic CALL via data item — not statically resolvable.
+ // Emit a CodeElement annotation for visibility in impact analysis.
+ graph.addNode({
+ id: generateId('CodeElement', `${filePath}:dynamic-call:${call.target}:L${call.line}`),
+ label: 'CodeElement',
+ properties: {
+ name: `CALL ${call.target}`,
+ filePath,
+ startLine: call.line,
+ endLine: call.line,
+ language: SupportedLanguages.Cobol,
+ description: 'dynamic-call (target is a data item, not resolvable statically)',
+ },
+ });
+ const dynCallOwner = owningModuleId(call.line);
+ graph.addRelationship({
+ id: generateId('CONTAINS', `${dynCallOwner}->dynamic-call:${call.target}:L${call.line}`),
+ type: 'CONTAINS',
+ sourceId: dynCallOwner,
+ targetId: generateId('CodeElement', `${filePath}:dynamic-call:${call.target}:L${call.line}`),
+ confidence: 1.0,
+ reason: 'cobol-dynamic-call',
+ });
+
+ // CALL USING parameters for dynamic call too
+ if (call.parameters && call.parameters.length > 0) {
+ for (const param of call.parameters) {
+ const paramPropId = dataItemMap.get(param.toUpperCase());
+ if (paramPropId) {
+ graph.addRelationship({
+ id: generateId('ACCESSES', `${dynCallOwner}->call-using->${param}:L${call.line}`),
+ type: 'ACCESSES',
+ sourceId: dynCallOwner,
+ targetId: paramPropId,
+ confidence: 0.9,
+ reason: 'cobol-call-using',
+ });
+ }
+ }
+ }
+ // CALL RETURNING target for dynamic call too
+ if (call.returning) {
+ const retPropId = dataItemMap.get(call.returning.toUpperCase());
+ if (retPropId) {
+ graph.addRelationship({
+ id: generateId('ACCESSES', `${dynCallOwner}->call-returning->${call.returning}:L${call.line}`),
+ type: 'ACCESSES',
+ sourceId: dynCallOwner,
+ targetId: retPropId,
+ confidence: 0.9,
+ reason: 'cobol-call-returning',
+ });
+ }
+ }
+ continue;
+ }
+
+ const targetModuleId = moduleNodeIds.get(call.target.toUpperCase());
+ // Create edge even if target not yet known — use a synthetic target id
+ const targetId = targetModuleId
+ ?? generateId('Module', `:${call.target.toUpperCase()}`);
+
+ const callOwner = owningModuleId(call.line);
+ graph.addRelationship({
+ id: generateId('CALLS', `${callOwner}->call->${call.target}:L${call.line}`),
+ type: 'CALLS',
+ sourceId: callOwner,
+ targetId,
+ confidence: targetModuleId ? 0.95 : 0.5,
+ reason: targetModuleId ? 'cobol-call' : 'cobol-call-unresolved',
+ });
+
+ // CALL USING parameters -> ACCESSES edges (data flow across programs)
+ if (call.parameters && call.parameters.length > 0) {
+ for (const param of call.parameters) {
+ const paramPropId = dataItemMap.get(param.toUpperCase());
+ if (paramPropId) {
+ graph.addRelationship({
+ id: generateId('ACCESSES', `${callOwner}->call-using->${param}:L${call.line}`),
+ type: 'ACCESSES',
+ sourceId: callOwner,
+ targetId: paramPropId,
+ confidence: 0.9,
+ reason: 'cobol-call-using',
+ });
+ }
+ }
+ }
+ // CALL RETURNING target -> ACCESSES edge (return value data flow)
+ if (call.returning) {
+ const retPropId = dataItemMap.get(call.returning.toUpperCase());
+ if (retPropId) {
+ graph.addRelationship({
+ id: generateId('ACCESSES', `${callOwner}->call-returning->${call.returning}:L${call.line}`),
+ type: 'ACCESSES',
+ sourceId: callOwner,
+ targetId: retPropId,
+ confidence: 0.9,
+ reason: 'cobol-call-returning',
+ });
+ }
+ }
+ }
+
+ // ── COPY -> IMPORTS relationship ─────────────────────────────────
+ for (const res of copyResolutions) {
+ if (!res.resolvedPath) continue;
+ const targetFileId = generateId('File', res.resolvedPath);
+ graph.addRelationship({
+ id: generateId('IMPORTS', `${fileNodeId}->${targetFileId}:${res.copyTarget}`),
+ type: 'IMPORTS',
+ sourceId: fileNodeId,
+ targetId: targetFileId,
+ confidence: 1.0,
+ reason: 'cobol-copy',
+ });
+ }
+
+ // ── EXEC SQL blocks -> CodeElement nodes + ACCESSES edges ──────
+ for (const sql of extracted.execSqlBlocks) {
+ const sqlId = generateId('CodeElement', `${filePath}:exec-sql:L${sql.line}`);
+ graph.addNode({
+ id: sqlId,
+ label: 'CodeElement',
+ properties: {
+ name: `EXEC SQL ${sql.operation}`,
+ filePath,
+ startLine: sql.line,
+ endLine: sql.line,
+ language: SupportedLanguages.Cobol,
+ description: `tables:[${sql.tables.join(',')}] cursors:[${sql.cursors.join(',')}]`,
+ },
+ });
+ const sqlOwner = owningModuleId(sql.line);
+ graph.addRelationship({
+ id: generateId('CONTAINS', `${sqlOwner}->${sqlId}`),
+ type: 'CONTAINS',
+ sourceId: sqlOwner,
+ targetId: sqlId,
+ confidence: 1.0,
+ reason: 'cobol-exec-sql',
+ });
+ // ACCESSES edges to tables
+ for (const table of sql.tables) {
+ const tableId = generateId('Record', `:${table}`);
+ graph.addRelationship({
+ id: generateId('ACCESSES', `${sqlId}->${tableId}:${sql.operation}`),
+ type: 'ACCESSES',
+ sourceId: sqlId,
+ targetId: tableId,
+ confidence: 0.9,
+ reason: `sql-${sql.operation.toLowerCase()}`,
+ });
+ }
+
+ // EXEC SQL INCLUDE -> IMPORTS edge
+ if (sql.includeMember) {
+ // Try to resolve as a copybook
+ const includeTarget = sql.includeMember.toUpperCase();
+ // We don't have copybookMap here, so emit directly as IMPORTS
+ // The edge uses reason 'sql-include' to distinguish from COPY
+ graph.addRelationship({
+ id: generateId('IMPORTS', `${fileNodeId}->sql-include->${includeTarget}:L${sql.line}`),
+ type: 'IMPORTS',
+ sourceId: fileNodeId,
+ targetId: generateId('File', `:${includeTarget}`),
+ confidence: 0.8,
+ reason: 'sql-include',
+ });
+ }
+ }
+
+ // ── PROCEDURE DIVISION USING -> ACCESSES edges (parameter contract) ──
+ // Iterate per-program to handle nested programs with their own USING clauses
+ for (const prog of extracted.programs) {
+ const progModId = programModuleIds.get(prog.name.toUpperCase()) ?? moduleId;
+ if (progModId && prog.procedureUsing && prog.procedureUsing.length > 0) {
+ for (const param of prog.procedureUsing) {
+ const paramPropId = dataItemMap.get(param.toUpperCase());
+ if (paramPropId) {
+ graph.addRelationship({
+ id: generateId('ACCESSES', `${progModId}->using->${param}`),
+ type: 'ACCESSES',
+ sourceId: progModId,
+ targetId: paramPropId,
+ confidence: 1.0,
+ reason: 'cobol-procedure-using',
+ });
+ }
+ }
+ }
+ }
+
+ // ── EXEC CICS blocks -> CodeElement nodes + CALLS edges ────────
+ for (const cics of extracted.execCicsBlocks) {
+ const cicsId = generateId('CodeElement', `${filePath}:exec-cics:L${cics.line}`);
+ graph.addNode({
+ id: cicsId,
+ label: 'CodeElement',
+ properties: {
+ name: `EXEC CICS ${cics.command}`,
+ filePath,
+ startLine: cics.line,
+ endLine: cics.line,
+ language: SupportedLanguages.Cobol,
+ description: [
+ cics.mapName && `map:${cics.mapName}`,
+ cics.programName && `program:${cics.programName}${cics.programIsLiteral === false ? ' (dynamic)' : ''}`,
+ cics.transId && `transid:${cics.transId}`,
+ cics.fileName && `file:${cics.fileName}`,
+ cics.queueName && `queue:${cics.queueName}`,
+ cics.labelName && `label:${cics.labelName}`,
+ ].filter(Boolean).join(' ') || undefined,
+ },
+ });
+ const cicsOwner = owningModuleId(cics.line);
+ graph.addRelationship({
+ id: generateId('CONTAINS', `${cicsOwner}->${cicsId}`),
+ type: 'CONTAINS',
+ sourceId: cicsOwner,
+ targetId: cicsId,
+ confidence: 1.0,
+ reason: 'cobol-exec-cics',
+ });
+ // LINK/XCTL -> cross-program CALLS (handles both literal and variable PROGRAM)
+ if (cics.programName && ['LINK', 'XCTL', 'LOAD'].includes(cics.command)) {
+ if (cics.programIsLiteral === false) {
+ // Dynamic PROGRAM reference via variable — annotate, don't resolve
+ graph.addNode({
+ id: generateId('CodeElement', `${filePath}:cics-dynamic-pgm:${cics.programName}:L${cics.line}`),
+ label: 'CodeElement',
+ properties: {
+ name: `CICS ${cics.command} ${cics.programName}`,
+ filePath, startLine: cics.line, endLine: cics.line,
+ language: SupportedLanguages.Cobol,
+ description: `cics-dynamic-program (target is data item ${cics.programName})`,
+ },
+ });
+ graph.addRelationship({
+ id: generateId('CONTAINS', `${cicsOwner}->cics-dynamic-pgm:${cics.programName}:L${cics.line}`),
+ type: 'CONTAINS', sourceId: cicsOwner,
+ targetId: generateId('CodeElement', `${filePath}:cics-dynamic-pgm:${cics.programName}:L${cics.line}`),
+ confidence: 1.0, reason: 'cics-dynamic-program',
+ });
+ } else {
+ const cicsTargetModuleId = moduleNodeIds.get(cics.programName.toUpperCase());
+ const targetId = cicsTargetModuleId
+ ?? generateId('Module', `:${cics.programName.toUpperCase()}`);
+ const cicsReason = `cics-${cics.command.toLowerCase()}`;
+ graph.addRelationship({
+ id: generateId('CALLS', `${cicsOwner}->cics-${cics.command.toLowerCase()}->${cics.programName}:L${cics.line}`),
+ type: 'CALLS', sourceId: cicsOwner, targetId,
+ confidence: cicsTargetModuleId ? 0.95 : 0.5,
+ reason: cicsTargetModuleId ? cicsReason : `${cicsReason}-unresolved`,
+ });
+ }
+ }
+
+ // CICS FILE I/O -> ACCESSES edges (READ/WRITE/REWRITE/DELETE/STARTBR/ENDBR FILE)
+ if (cics.fileName) {
+ const fileRecordId = generateId('Record', `:${cics.fileName.toUpperCase()}`);
+ const ioCommand = cics.command.toUpperCase();
+ const isRead = ['READ', 'STARTBR', 'READNEXT', 'READPREV', 'READ NEXT', 'READ PREV', 'ENDBR'].includes(ioCommand);
+ const isWrite = ['WRITE', 'REWRITE', 'DELETE'].includes(ioCommand);
+ const reason = isRead ? 'cics-file-read' : isWrite ? 'cics-file-write' : 'cics-file-access';
+ graph.addRelationship({
+ id: generateId('ACCESSES', `${cicsId}->file->${cics.fileName}:L${cics.line}`),
+ type: 'ACCESSES', sourceId: cicsId, targetId: fileRecordId,
+ confidence: 0.9, reason,
+ });
+ }
+
+ // CICS QUEUE -> ACCESSES edge with differentiated reason (WRITEQ/READQ/DELETEQ TS/TD)
+ if (cics.queueName) {
+ const queueId = generateId('Record', `:${cics.queueName}`);
+ const qCmd = cics.command.toUpperCase();
+ const qReason = qCmd.startsWith('READQ') ? 'cics-queue-read'
+ : qCmd.startsWith('WRITEQ') ? 'cics-queue-write'
+ : qCmd.startsWith('DELETEQ') ? 'cics-queue-delete'
+ : 'cics-queue';
+ graph.addRelationship({
+ id: generateId('ACCESSES', `${cicsId}->queue->${cics.queueName}:L${cics.line}`),
+ type: 'ACCESSES', sourceId: cicsId, targetId: queueId,
+ confidence: 0.85, reason: qReason,
+ });
+ }
+
+ // CICS RETURN/START TRANSID -> CALLS edge (transaction flow)
+ if (cics.transId) {
+ const cmd = cics.command.toUpperCase();
+ if (cmd === 'RETURN' || cmd.startsWith('START')) {
+ const transNodeId = generateId('CodeElement', `:${cics.transId}`);
+ graph.addRelationship({
+ id: generateId('CALLS', `${cicsOwner}->${cmd === 'RETURN' ? 'return' : 'start'}-transid->${cics.transId}:L${cics.line}`),
+ type: 'CALLS', sourceId: cicsOwner, targetId: transNodeId,
+ confidence: 0.8,
+ reason: cmd === 'RETURN' ? 'cics-return-transid' : 'cics-start-transid',
+ });
+ }
+ }
+
+ // CICS MAP -> ACCESSES edge (screen/mapset traceability)
+ if (cics.mapName) {
+ const mapId = generateId('Record', `