feat: add COBOL language support with regex extraction pipeline by magyargergo · Pull Request #498 · abhigyanpatwari/GitNexus

magyargergo · 2026-03-24T14:39:48Z

Summary

Adds full COBOL language support to GitNexus using a standalone regex processor — no tree-sitter dependency, no LanguageProvider modification. Follows the markdown-processor.ts pattern established for non-tree-sitter languages.

Based on the extraction work from PR #359 by @naicud, restructured to follow the clean architecture from PR #488.

Architecture Decision

Three independent review agents (architecture-strategist, code-simplicity-reviewer, typescript-pro) unanimously recommended Approach A (standalone processor) over Approach B (LanguageProvider discriminator):

Zero changes to LanguageProvider interface
Zero changes to SupportedLanguages enum
1 new orchestrator file + 1 pipeline integration point
Follows the markdown-processor.ts precedent exactly

See docs/plans/2026-03-24-feat-cobol-language-provider-integration-plan.md for the full decision record.

Changes

New files (4,470 LOC):

cobol-processor.ts — orchestrator (processCobol, isCobolFile, isJclFile)
cobol/cobol-preprocessor.ts — regex state machine extraction (888 LOC)
cobol/cobol-copy-expander.ts — COPY statement expansion with circular detection
cobol/jcl-parser.ts — JCL job/step/DD extraction
cobol/jcl-processor.ts — JCL graph node creation
docs/code-indexing/cobol/ — 7 documentation files

Pipeline integration:

pipeline.ts — single processCobol() call in Phase 2.6 (alongside processMarkdown)

COBOL Extraction Produces

COBOL Construct	Node Label	Edge Type
PROGRAM-ID	Module	—
PARAGRAPH	Function	CALLS (via PERFORM)
SECTION	Namespace	CONTAINS
Data items (01-level)	Property	—
CALL 'X'	—	CALLS (cross-program)
COPY X	—	IMPORTS
JCL EXEC PGM=X	—	CALLS (JCL→COBOL)

Test plan

npx tsc --noEmit — zero type errors
54 new tests (33 COBOL + 21 JCL) — all pass
Full suite: 3,889 tests pass, 0 failures
Index a real COBOL repository and verify graph output

Standalone COBOL processor following the markdown-processor.ts pattern: - No LanguageProvider modification — COBOL uses regex, not tree-sitter - No SupportedLanguages enum change — standalone processor pattern New files: - cobol-processor.ts — orchestrator (processCobol, isCobolFile, isJclFile) - cobol/cobol-preprocessor.ts — regex state machine extraction (~888 LOC) - cobol/cobol-copy-expander.ts — COPY statement expansion with circular detection - cobol/jcl-parser.ts — JCL job/step/DD extraction - cobol/jcl-processor.ts — JCL graph node creation Extraction produces: - Module nodes (PROGRAM-ID) - Function nodes (paragraphs) - Namespace nodes (sections) - Property nodes (data items) - CALLS edges (PERFORM intra-file, CALL cross-program) - IMPORTS edges (COPY statements) - CONTAINS edges (section → paragraph hierarchy) Pipeline integration: single processCobol() call in Phase 2.6 54 new tests (33 COBOL + 21 JCL), all 3889 tests pass.

vercel · 2026-03-24T14:39:55Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
gitnexus	Ready	Preview, Comment	Mar 26, 2026 1:55pm

github-actions · 2026-03-24T14:51:58Z

CI Report

✅ All checks passed

Pipeline Status

Stage	Status	Details
✅ Typecheck	`success`	tsc --noEmit
✅ Tests	`success`	unit tests, 3 platforms
✅ E2E	`success`	gitnexus-web changes only

Test Results

Tests	Passed	Failed	Skipped	Duration
4602	4600	0	2	175s

✅ All 4600 tests passed

2 test(s) skipped — expand for details

buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
Dart field-type resolution > emits ACCESSES edges for field reads in chains

Code Coverage

Tests

Metric	Coverage	Covered	Base	Delta	Status
Statements	72.24%	12314/17045	70.65%	📈 +1.6	🟢 ██████████████░░░░░░
Branches	62.15%	8225/13233	61.22%	📈 +0.9	🟢 ████████████░░░░░░░░
Functions	75.86%	1031/1359	74.62%	📈 +1.2	🟢 ███████████████░░░░░
Lines	74.72%	11049/14787	73.19%	📈 +1.5	🟢 ██████████████░░░░░░

_{📋 View full run · Generated by CI}

Add comment block at the custom processor integration point documenting the pattern for future non-tree-sitter language additions.

… flow, PERFORM THRU Maps the remaining 60% of CobolRegexResults to the graph: - EXEC SQL blocks → CodeElement nodes + ACCESSES edges to DB tables - EXEC CICS LINK/XCTL → CodeElement nodes + cross-program CALLS edges - ENTRY points → Constructor nodes (registered for cross-program resolution) - MOVE statements → ACCESSES edges (read/write data flow tracking) - PERFORM THRU → expanded CALLS edges for range targets - File declarations → Record nodes with assignment metadata - Cross-program CALL 2nd pass: resolves unresolved targets after all programs processed

…S resolution bug Integration tests (test/integration/resolvers/cobol.test.ts): - 26 tests covering full COBOL system extraction - ALL assertions use exact toBe(N) — zero fuzzy assertions - Fixtures: CUSTUPDT.cbl, AUDITLOG.cbl, CUSTDAT.cpy, RPTGEN.cbl, RUNJOBS.jcl Bug fix (cobol-processor.ts): - CICS LINK/XCTL cross-program resolution was broken — edges were created with "resolved" reason but pointing to <unresolved> targets - Fix: use cics-link-unresolved / cics-xctl-unresolved suffix pattern matching the existing cobol-call-unresolved pattern - Second-pass resolver now patches both CALL and CICS unresolved edges All 3915 tests pass, 0 failures.

Complete rewrite of COBOL integration tests using ground-truth approach: dump the full graph, then assert EVERY node and EVERY edge. 57 tests across 9 sections: - Node completeness: Module(3), Function(13), Namespace(2), Property(21), Record(1), CodeElement(8), Constructor(1) — exact sorted arrays - Edge completeness: 22 tests covering every type+reason combination with exact source→target pairs - Cross-program resolution: 6 tests verifying CALL, CICS LINK/XCTL, JCL - COPY expansion: copybook data items in RPTGEN - Section hierarchy: exact paragraph membership per section - Data item ownership: exact per-module breakdown - MOVE data flow: exact read/write pairs - JCL integration: job/step/dataset containment - Grand totals: CALLS(22), CONTAINS(48), IMPORTS(1), ACCESSES(7) Fixture enhancements: - CUSTUPDT.cbl: added INIT-SECTION + PROCESSING-SECTION, PERFORM THRU - AUDITLOG.cbl: added ENTRY "AUDITLOG-BATCH" - RPTGEN.cbl: added EXEC CICS XCTL Zero fuzzy assertions — every expect uses toBe(N) or toEqual([...sorted]).

magyargergo · 2026-03-24T16:48:08Z

@copilot reviewer the completness of this cobol processor and try to find and verify gaps and potential issues. Also try to see if our parser is complete for the cobol language

Copilot · 2026-03-24T16:48:15Z

@magyargergo I've opened a new pull request, #500, to work on those changes. Once the pull request is ready, I'll request review from you.

magyargergo · 2026-03-24T18:25:51Z

@claude reviewer the completness of this cobol processor and try to find and verify gaps and potential issues. Also try to see if our parser is complete for the cobol language

github-actions · 2026-03-24T18:26:08Z

Claude finished @magyargergo's task in 3m 3s —— View job

COBOL Processor Completeness Review

Tasks:

Read and analyze all COBOL processor source files
Review regex patterns for correctness
Identify missing COBOL constructs
Check for structural/correctness bugs
Review JCL parser completeness

Summary

The implementation is solid for its stated scope (regex-based symbol extraction for code indexing). Architecture, test coverage, and graph integration are well done. Below are findings organized by severity.

🔴 Critical Bugs

1. CALL 'single-quoted' not detected — cobol-preprocessor.ts:176

const RE_CALL = /\bCALL\s+"([^"]+)"/i;

COBOL allows both "double" and 'single' quote delimiters. CALL 'SUBPROG' is extremely common in IBM mainframe COBOL (EBCDIC environments often prefer single quotes). This means a large class of cross-program calls goes completely undetected.

Fix this →

Same issue applies to RE_ENTRY at line 189 — ENTRY 'name' with single quotes is valid and won't be captured.

2. Sequence-numbered files: paragraph/section detection silently fails

// cobol-preprocessor.ts:171-172
const RE_PROC_SECTION = /^       ([A-Z][A-Z0-9-]+)\s+SECTION\.\s*$/;
const RE_PROC_PARAGRAPH = /^       ([A-Z][A-Z0-9-]+)\.\s*$/;

These require exactly 7 leading spaces. Standard COBOL files with sequence numbers in columns 1–6 (e.g. 000100 MAIN-PARAGRAPH.) have digits in those positions. preprocessCobolSource only replaces alpha/# markers, so numeric sequence numbers are preserved. Result: zero paragraphs and sections extracted for any file with numeric sequence numbers.

The fixture files happen to use all-space column 1–6 areas, so tests pass — but production COBOL files almost universally have sequence numbers.

Fix this →

3. Duplicate data item names produce colliding graph node IDs — cobol-processor.ts:355

const propId = generateId('Property', `${filePath}:${item.name}`);

The ID only includes filename and name, not section. If the same name appears in multiple sections (e.g. a STATUS field in both WORKING-STORAGE and LINKAGE SECTION), the second addNode silently overwrites the first. Level is also not included, so 01 CUST-REC and 05 CUST-REC (valid in different groups) collide too.

Fix this →

4. Resolved cross-program CALL edges are duplicated, not replaced — cobol-processor.ts:196–214

The second pass adds new resolved edges (id: rel.id + ':resolved') but the original <unresolved>:PROGNAME edge is never removed. The graph ends up with both. Any upstream traversal will see two CALLS edges to the same logical target — one to a phantom node, one to the real module.

🟡 Notable Gaps

5. MOVE identifier TO a b c (multiple targets) not tracked — cobol-preprocessor.ts:192

const RE_MOVE = /\bMOVE\s+(CORRESPONDING\s+)?([A-Z][A-Z0-9-]+)\s+TO\s+([A-Z][A-Z0-9-]+)/i;

Only captures the first target. MOVE WS-AMOUNT TO FIELD-A FIELD-B FIELD-C produces one data-flow edge instead of three.

6. COPY ... IN library / COPY ... OF library syntax not supported — cobol-copy-expander.ts:259

const targetMatch = text.match(/^COPY\s+(?:"([^"]+)"|([A-Z][A-Z0-9-]*))/i);

COPY CUSTDATA IN PRODLIB is valid COBOL. The IN/OF clause is silently ignored and only CUSTDATA is captured — but the IN PRODLIB qualifier is important for resolving to the right library, particularly in mainframe environments with multiple PDS libraries.

7. COPY REPLACING pseudotext (==...==) not handled — cobol-copy-expander.ts

IBM Enterprise COBOL's most-used form of REPLACING is pseudotext:

COPY CUSTDATA REPLACING ==CUST-== BY ==WS-CUST-==.

The parseReplacingClause function tokenizes on "quoted" and unquoted tokens. ==pseudotext== delimiters will be completely unparsed, so REPLACING transformations in this format silently do nothing.

8. Inline PERFORM without paragraph name not indexed — cobol-preprocessor.ts:173

const RE_PERFORM = /\bPERFORM\s+([A-Z][A-Z0-9-]+)(?:\s+THRU\s+([A-Z][A-Z0-9-]+))?/i;

Inline PERFORM blocks (PERFORM ... END-PERFORM) have no paragraph target and are completely invisible to the extractor. This also means PERFORM VARYING / PERFORM UNTIL loops produce no graph edges.

9. Nested programs (multiple PROGRAM-IDs in one file) only capture the first — cobol-preprocessor.ts:686-691

if (result.programName === null) {
  const m = line.match(RE_PROGRAM_ID);
  if (m) { result.programName = m[1]; ... }
}

The guard === null means only the first PROGRAM-ID is captured. COBOL allows multiple programs per compilation unit (nested programs or a sequence of independent programs in one file). Only the first gets a Module node.

10. procedureUsing parameters not linked to Linkage Section items in graph — cobol-processor.ts

procedureUsing is captured (e.g. ['LS-CUST-ID', 'LS-AMOUNT']) but no graph edge connects the procedure's formal parameters to their corresponding Property nodes in the Linkage Section. Callers have no graph-level way to know which data items are passed as parameters.

11. JCL in-stream PROC ordering bug — jcl-processor.ts:119–183

Steps are processed (section 2) before in-stream PROCs are registered in moduleNames (section 4). If a step EXECs a PROC that's defined later in the same file, the jcl-exec-proc CALLS edge will never be created. The fix is to pre-register all PROCs before processing steps.

🟢 Minor / Design Notes

12. CALL identifier (dynamic calls via variable) not tracked

CALL WS-PROGRAM-NAME — call via a data item holding the program name — is invisible. This is hard to resolve statically, so it's acceptable to skip, but worth noting.

13. CICS LINK/XCTL MAP name in CICS blocks with unquoted identifiers

const mapMatch = body.match(/\bMAP\s*\(\s*['"]([^'"]+)['"]\s*\)/i);

CICS MAP names can be identifier references (not just literals). SEND MAP(WS-MAP-NAME) won't capture the map name, silently producing no mapName in the result.

14. SQL table extraction misses DELETE FROM pattern

/\bFROM\s+([A-Z][A-Z0-9_]+)/gi,

DELETE FROM TABLE would be caught by the FROM pattern, but DELETE TABLE (without FROM, which is valid SQL) would not. The UPDATE TABLE SET pattern is caught by the UPDATE pattern. Generally low impact.

15. findDataItemNode linear scan — cobol-processor.ts:238-246

Called once per MOVE statement. O(n) per call, O(n·m) total for n data items and m moves. For large programs this could be slow. A precomputed Map<string, dataItem> per file would fix this.

Test Coverage Assessment

The 57-test suite covers the happy paths well. Key untested scenarios:

Files with numeric sequence numbers in cols 1–6 (critical gap Welcome to GitNexus Discussions! #2 above)
CALL 'single-quoted' programs
COPY ... IN library syntax
COPY REPLACING ==pseudotext==
PERFORM VARYING / PERFORM UNTIL (inline)
Duplicate data item names across sections
Multiple PROGRAM-IDs in one file

magyargergo · 2026-03-24T19:00:55Z

@claude can you compare PR #500 with this?

github-actions · 2026-03-24T19:01:10Z

Claude finished @magyargergo's task in 1m 58s —— View job

Comparing PR #498 vs PR #500

Tasks:

Post initial comment
Fetch PR fix(cobol): single-quote CALL/COPY, sequence number stripping, PERFORM keyword false-positives #500 details and changed files
Compare fixes/additions in fix(cobol): single-quote CALL/COPY, sequence number stripping, PERFORM keyword false-positives #500 vs gaps identified in feat: add COBOL language support with regex extraction pipeline #498
Post final comparison

Blocked: To compare PR #500, I need to fetch origin/copilot/sub-pr-498 (the PR #500 branch). The git fetch and gh CLI commands require approval that hasn't been granted yet.

Could you approve git fetch commands or the gh CLI so I can pull down PR #500's branch and diff it against feat/cobol-language-support?

Alternatively, if you'd like to grant both, the exact commands I'd need are:

git fetch origin copilot/sub-pr-498
git diff feat/cobol-language-support...origin/copilot/sub-pr-498

…, PERFORM keyword skip Phase 0A: Add removeRelationship(id) to KnowledgeGraph interface and implementation (trivial Map.delete wrapper). Required for orphan edge cleanup in next commit. Phase 1A (from PR #500 review, modified): - RE_CALL and RE_COPY_QUOTED now match both "double" and 'single' quotes - parseSingleCopyStatement in copy-expander updated for single quotes - PERFORM_KEYWORD_SKIP set prevents UNTIL/VARYING/WITH/TEST/FOREVER from being stored as false-positive perform targets - Sequence number stripping uses /[^0-9 ]/ (preserves numeric seq numbers unlike PR #500's /\S/ which stripped them) - Normalized || to ?? for regex group extraction in copy-expander 5 new graph unit tests, all 57 COBOL integration tests pass.

Phase 1B: RE_ENTRY regex now supports both "double" and 'single' quoted ENTRY targets. Uses named intermediates (entryName, usingClause) with ?? operator. USING capture group shifted from [2] to [3]. Phase 1C: Second-pass resolution now collects resolved orphan edge IDs during iteration and removes them after the loop completes, using the new graph.removeRelationship() API. Graph no longer contains phantom <unresolved>: edges alongside their resolved replacements. CALLS count drops from 22 to 18 (4 orphan edges removed).

Phase 1D+3C (atomic): Property node IDs now use composite key filePath:section:level:name instead of filePath:name. This prevents duplicate data item names in different sections (e.g., STATUS in both WORKING-STORAGE and LINKAGE) from silently colliding. New generatePropertyId() helper ensures both node creation and MOVE edge lookup use the identical key formula. buildDataItemMap() replaces the O(n) findDataItemNode linear scan with O(1) Map lookup, built once per file before MOVE processing.

MOVE X TO A B C now produces write edges for all targets, not just the first. extractMoveTargets() helper handles OF/IN qualified names (WS-NAME OF WS-RECORD -> target is WS-NAME), subscript stripping (WS-TABLE(I) -> WS-TABLE), and MOVE_SKIP filtering on targets. Data model: CobolRegexResults.moves.to:string -> targets:string[] MOVE CORRESPONDING stays single-target per COBOL standard. Processor MOVE loop now iterates move.targets.

…PERFORM TIMES, CICS MAP unquoted Phase 2B: COPY ... IN/OF library-name now captured as metadata in CopyResolution (IN and OF are synonyms per COBOL-85 standard). Phase 2C: COPY REPLACING ==pseudotext== support. Tokenizer handles ==...== delimiters alongside "quoted" strings. Pseudotext forces EXACT type. Two-pass applyReplacing: first pass handles space-containing/ non-identifier pseudotext via global string replace; second pass handles identifier-level LEADING/TRAILING/EXACT. New test file cobol-copy-expander.test.ts with 10 tests. Phase 2E: PERFORM WS-COUNT TIMES no longer produces a false-positive perform target (checks for TIMES keyword after captured identifier). Phase 2F: Dynamic CALL via data item (CALL WS-PROG-NAME without quotes) now emits a CodeElement annotation node with description 'dynamic-call' instead of silently ignoring. Adds isQuoted:boolean to call results. Phase 3A: CICS MAP(WS-MAP-NAME) unquoted identifiers now captured. Phase 3B: Normalized || to ?? in copy-expander (done in Phase 1A).

…r file Phase 2D: The state machine now captures all PROGRAM-IDs, not just the first. The primary program name stays in programName; additional nested programs go into nestedPrograms[]. The processor creates separate Module nodes for each nested program, contained by the outer module, and registers them in moduleNodeIds for cross-program CALL resolution. Paragraphs/data items are not yet scoped per-program (attributed to the outer module) — full per-program scoping is a future enhancement that requires END PROGRAM boundary tracking in the state machine.

New fixtures: - NESTED.cbl — two PROGRAM-IDs (OUTER-PROG, INNER-PROG) for nested program support testing - COPYLIB.cpy — copybook for pseudotext REPLACING test target Modified fixtures: - CUSTUPDT.cbl — single-quoted ENTRY 'ALTENTRY', multi-target MOVE (WS-AMT TO FIELD-A FIELD-B), dynamic CALL WS-PROG-NAME, COPY COPYLIB with pseudotext REPLACING, LINKAGE SECTION with LS-PARAM - RPTGEN.cbl — PERFORM WS-COUNT TIMES (false-positive guard), unquoted MAP(WS-MAP-NAME), additional data items WS-COUNT WS-MAP-NAME Integration test rewritten with 62 exact assertions covering: - 5 Module, 17 Function, 33 Property, 9 CodeElement, 2 Constructor nodes - Nested program containment (OUTER-PROG -> INNER-PROG) - Dynamic CALL annotation (CodeElement with cobol-dynamic-call) - Multi-target MOVE (UPDATE-BALANCE: 2 reads, 3 writes) - Single-quoted ENTRY (ALTENTRY under CUSTUPDT) - PERFORM TIMES guard (WS-COUNT not in CALLS) - Orphan unresolved edge removal (zero -unresolved edges) - Grand totals: 21 CALLS, 68 CONTAINS, 2 IMPORTS, 10 ACCESSES

…xt flag Root cause: ==PREFIX-== matched /^[A-Z][A-Z0-9-]*$/i (trailing hyphens allowed), routing it to the second-pass EXACT identifier match where PREFIX-RECORD !== PREFIX- failed silently. Fix: Propagate isPseudotext from parseReplacingClause to CopyReplacing interface, then use it in applyReplacing first-pass condition to force global string replacement for all pseudotext entries regardless of whether the content looks like an identifier. Result: COPY COPYLIB REPLACING ==PREFIX-== BY ==WS-==. now correctly transforms PREFIX-RECORD → WS-RECORD, PREFIX-CODE → WS-CODE, etc.

…ge grouping State machine changes (minimal, ~30 lines): - Add RE_END_PROGRAM regex for END PROGRAM program-name. detection - Replace nestedPrograms[] with programs[] containing startLine/endLine/ nestingDepth metadata for each PROGRAM-ID in the file - Reset division/section/paragraph state on new PROGRAM-ID boundary - EOF finalization flushes remaining stack entries (single-program files) - Programs sorted by startLine (outer before inner) Processor changes: - Uses programs[] with line-range containment to find enclosing parent Module for nested programs (replaces hardcoded nestedParent logic) - programModuleIds Map tracks Module node IDs per program name Fixture: NESTED.cbl now includes END PROGRAM lines for both programs. Integration test: PREFIX-* Property nodes now correctly appear as WS-* after the pseudotext REPLACING fix from the previous commit.

github-actions · 2026-03-26T10:43:35Z

Claude finished @magyargergo's task in 9m 24s —— View job

COBOL Processor — Fifteenth Deep Review (Compiler Front-End / Static Analysis Perspective)

Tasks:

Post initial comment
Read all COBOL processor source files (current state post-commit 2b222ef)
Verify all 14th review findings against current code
Deep compiler front-end analysis — new gap identification
Architectural fit review
Post final comprehensive review

14th Review — Verification Status

All 4 findings from the 14th review verified against the current branch (commit 2b222ef). Ground truth:

#	Finding	Status
14th #1	`callAccum` flush causes false paragraph + `currentParagraph` corruption when last CALL USING param is on period-terminated line	✅ FIXED — combination of (a) callAccum early-flush detection at lines 1154–1157 and (b) Area A guard in `extractProcedure` at line 1543. But see New #A below
14th #3	`inspectAccum` same false paragraph risk on continuation lines	✅ FIXED for Area B continuation lines — Area A guard at line 1543 rejects deeply-indented lines. But see New #B
14th #4	`callAccum` pre-check not guarded by `currentDivision === 'procedure'`	✅ FIXED — line 1165: `currentDivision === 'procedure' &&` guard present
14th #5	`RE_PROC_PARAGRAPH` uses `^\s*` (unlimited indent) rather than Area A column bounds	✅ FIXED — Area A guard at lines 1542–1543: `if (!isFreeFormat && leadingSpaces > 7) return;`

New Findings (This Review)

🔴 New #A — Last CALL USING Parameter Dropped When on Its Own Period-Terminated Line; False Paragraph in Free-Format Mode

cobol-preprocessor.ts:1154–1157:

if (/^(?:GO\s+TO|PERFORM|MOVE|...|START)\b/i.test(trimmedLine)
  || RE_PROC_SECTION.test(line) || RE_PROC_PARAGRAPH.test(line)) {
  flushCallAccum(); // Flush CALL without this line's content
  // Fall through to process this line normally

The fix for 14th review New #1 introduced this regression. When callAccum !== null and the line matches RE_PROC_PARAGRAPH = /^\s*([A-Z][A-Z0-9-]+)\.\s*$/i — which includes ANY line consisting of a single identifier + period — flushCallAccum() is called without the current line. In the dominant IBM mainframe coding style where the last USING parameter appears alone on a period-terminated line, it is silently dropped:

       CALL 'CUSTUPDT'
           USING BY REFERENCE WS-CUST-ID
                              WS-CUST-NAME
                              WS-CUST-ADDR.    ← RE_PROC_PARAGRAPH matches

Fixed-format mode: WS-CUST-ADDR. has many leading spaces (Area B). Early flush fires WITHOUT WS-CUST-ADDR. Falls through to extractProcedure where the Area A guard rejects it as a paragraph. Result: WS-CUST-ADDR silently dropped from CALL parameters — missing cobol-call-using ACCESSES edge.

Free-format mode (GnuCOBOL / ACAS benchmark — 268 programs): Same early flush, but isFreeFormat = true means the Area A guard at line 1543 is disabled. WS-CUST-ADDR is created as a false paragraph, currentParagraph = 'WS-CUST-ADDR' — all subsequent PERFORM/MOVE/CALL edges until the next real paragraph are misattributed to the spurious node. The original problem from the 14th review reappears, unchanged, in free-format mode.

Root cause: RE_PROC_PARAGRAPH cannot distinguish between an Area B USING parameter WS-ADDR. and an Area A paragraph header NEXT-PARA. without considering column position. The 14th review fix applies the Area A check in extractProcedure (after the early flush), not inside the callAccum decision block where it is needed.

Fix: Apply the Area A constraint inside the callAccum flush-trigger decision. In fixed-format, only use RE_PROC_PARAGRAPH as a flush trigger when the line is in Area A (≤7 leading spaces). In free-format, never use RE_PROC_PARAGRAPH as a flush trigger (rely only on statement verbs and RE_PROC_SECTION):

const matchesPara = RE_PROC_PARAGRAPH.test(line);
const leadingSpaces = (line.match(/^(\s*)/)?.[1].length ?? 0);
// Fixed-format: only flush for Area A para headers (≤7 spaces). Area B = USING param.
// Free-format: never flush for RE_PROC_PARAGRAPH (no column semantics).
const isParaFlushTrigger = !isFreeFormat && matchesPara && leadingSpaces <= 7;
if (statementVerbMatch || RE_PROC_SECTION.test(line) || isParaFlushTrigger) {
  flushCallAccum();
  // Fall through to process this line normally
} else {
  callAccum += ' ' + line;
  if (/\.\s*$/.test(callAccum) || /\bEND-CALL\b/i.test(callAccum)) {
    flushCallAccum();
  }
  return;
}

This correctly accumulates Area B USING parameters (fixed-format) or any period-terminated parameter line (free-format via the period trigger), while still detecting Area A paragraph headers as a flush signal.

Fix this →

🟡 New #B — `inspectAccum` State Stuck When Real Paragraph Header Arrives Mid-Accumulation

cobol-preprocessor.ts:1637–1643 (inside extractProcedure):

if (inspectAccum !== null) {
  inspectAccum += ' ' + line;
  if (/\.\s*$/.test(inspectAccum)) {
    flushInspect();
  } else {
    return;
  }
}

The inspectAccum block is at position 9 in extractProcedure. Paragraph detection is at position 4 (lines 1536–1548). When inspectAccum !== null and a real paragraph header in Area A (≤7 leading spaces) arrives:

Paragraph detection fires → paragraph created → return at line 1548
inspectAccum block (line 1637) is never reached
inspectAccum retains its partial text unchanged
Subsequent lines are appended to the corrupted accumulation
flushInspect() eventually runs with multi-statement garbage text

Example:

       INSPECT WS-STATUS
           TALLYING WS-COUNT
       VALIDATE-INPUT.      ← real paragraph header, Area A, ≤7 spaces
           PERFORM CHECK-FIELDS.

VALIDATE-INPUT. creates a paragraph correctly (step 1), but inspectAccum is left pointing at INSPECT WS-STATUS TALLYING WS-COUNT. PERFORM CHECK-FIELDS. is appended to inspectAccum. When a period-terminated line is finally found, flushInspect() extracts counters from the corrupted text.

Fix: In processLogicalLine, add an inspectAccum awareness block before the switch/case (mirroring the callAccum pattern): detect statement verbs and paragraph/section headers → flushInspect() and fall through. The INSPECT block already has a return for pure continuation lines, so the fix only needs to handle the paragraph/section boundary case.

Fix this →

🟡 New #C — `callAccum` Flush Trigger Fires for Data Items Named with COBOL Reserved-Word Prefixes

cobol-preprocessor.ts:1154:

/^(?:GO\s+TO|PERFORM|MOVE|DISPLAY|ACCEPT|INSPECT|SEARCH|SORT|MERGE|IF|EVALUATE|SET|INITIALIZE|STOP|EXIT|GOBACK|CONTINUE|READ|WRITE|REWRITE|DELETE|OPEN|CLOSE|START)\b/i

\b matches between \w and \W. COBOL data items commonly use reserved words as prefixes: MOVE-COUNT, READ-STATUS, WRITE-BUFFER, PERFORM-LIMIT. For such an identifier on a CALL USING parameter line:

       CALL 'RPTGEN'
           USING BY REFERENCE MOVE-COUNT
                              READ-STATUS.

trimmedLine = "USING BY REFERENCE MOVE-COUNT" — starts with USING, OK.
But trimmedLine = "MOVE-COUNT" (if MOVE-COUNT is alone on the line) starts with MOVE\b (boundary between E and - is \w→\W). The early flush fires, MOVE-COUNT is dropped.

Same applies to parameters named PERFORM-CNT, READ-KEY, WRITE-AREA, SEARCH-FOUND.

Fix: change the statement verb regex to require a space or end-of-line after the verb (not just \b): PERFORM(?:\s|$) instead of PERFORM\b. This prevents matching hyphenated identifiers.

Fix this →

🟢 New #D — Free-Format CALL USING Last Parameter: No Correct Termination Signal

In free-format mode (isFreeFormat = true), the proposed fix for New #A (not using RE_PROC_PARAGRAPH as a flush trigger) means the only flush triggers are: statement verbs, RE_PROC_SECTION, and the period in the accumulated text. For:

    CALL "PROG"
        USING WS-PARAM-A
              WS-PARAM-B
              WS-PARAM-C.      ← after New #A fix: not a flush trigger, appended to callAccum

WS-PARAM-C. is appended to callAccum, period found → flushCallAccum() with all three params. ✓

However, for free-format with no END-CALL and no period on the last parameter line (which is non-standard but possible in truncated/malformed source), callAccum would accumulate indefinitely until the next period-terminated line. This is the same behavior as any other non-period-terminated CALL and is handled by the EOF flush at line 959.

This is a design note, not a bug — just worth confirming the fix for New #A doesn't introduce a free-format regression.

Architectural Fit Assessment

The implementation is in excellent shape after 15 review cycles.

Dimension	Assessment
Boundary integrity	Zero `LanguageProvider` interface changes — perfect
Area A guard	Correct in concept; the gap is that it's applied after the callAccum flush decision rather than inside it
`callAccum` design	Intentionally does not `return` for all continuation lines — allows other constructs to be detected. The early-flush-on-paragraph-match approach is architecturally sound, but the `RE_PROC_PARAGRAPH` trigger must be column-position-aware
`inspectAccum` vs paragraph boundary	`callAccum` has a two-path design (early-flush-and-fall-through vs return). `inspectAccum` only has the return path. Adding the same two-path design to `inspectAccum` would make both accumulators consistent
Free-format mode vs fixed-format	The Area A guard correctly differentiates the two modes, but the callAccum early-flush check at line 1154 doesn't carry this differentiation
`SORT_CLAUSE_NOISE`	Comprehensive ✓
`EXCLUDED_PARA_NAMES`	Comprehensive ✓
Two-pass CALL/CANCEL resolution	All constructs (CALL, CANCEL, CICS, SQL, SEARCH, ENTRY) use `owningModuleId()` ✓
`findContainingSection`	`if (resolved) best = resolved` guard correctly prevents cross-program contamination ✓
`CopyResolution.library` dead code	Still captured in `cobol-copy-expander.ts` but `resolveCopy` ignores it — misleading interface ⚠️

Test Coverage Gap

The 145-test suite (86 unit + 59 integration) doesn't currently cover the New #A scenario:

Gap	Finding
Multi-line CALL with last parameter alone on period-terminated line (`WS-CUST-ADDR.`) in fixed-format	New #A — missing ACCESSES edge
Same scenario in free-format mode (ACAS-style)	New #A — false paragraph + state corruption
Real paragraph header in Area A while `inspectAccum` is active	New #B
CALL USING with `MOVE-COUNT` or `PERFORM-LIMIT` as a parameter name on its own line	New #C

Summary

All 4 findings from the 14th review are verified fixed. The Area A guard (14th #5) and callAccum early-flush detection (14th #1) are the right mechanisms, but they interact in a way that introduces a new issue in the most common IBM COBOL multi-line CALL pattern.

New findings (4 total):

🔴 New #A — Last CALL USING parameter dropped when on its own period-terminated Area B line (fixed-format: silent parameter loss; free-format: false paragraph + state corruption). This is the highest-priority remaining issue — it affects the dominant IBM mainframe CALL coding style and corrupts the GnuCOBOL free-format analysis.
🟡 New #B — inspectAccum state stuck when a real paragraph header arrives during accumulation (fix: add inspectAccum early-flush detection mirroring callAccum pattern)
🟡 New #C — callAccum flush trigger incorrectly fires for USING parameters named MOVE-COUNT, PERFORM-LIMIT, READ-STATUS etc. (fix: \b → (?:\s|$) in verb pattern)
🟢 New #D — Design note confirming the New #A fix doesn't introduce free-format regression

Priority order:

New #A — Column-position-aware flush trigger in callAccum; one-condition change with high correctness ROI for both fixed-format (parameter retention) and free-format (false paragraph prevention). This must be fixed before indexing the ACAS benchmark with the current branch.
New #C — One-character change in the verb regex (\b → (?:\s|$)); prevents USING parameter loss for common hyphenated identifier names.
New #B — Adds inspectAccum two-path design consistent with callAccum; affects rare but real edge case.

#A: Column-position-aware paragraph detection in callAccum flush. #B: inspectAccum early-flush on paragraph/section/verb headers. #C: Verb boundary \b → (?:\s|$) prevents MOVE-COUNT false flush.

17 new tests covering all recurring review patterns: Multi-line CALL USING (7 tests): - Parameters on separate continuation lines (IBM mainframe style) - No absorption of INSPECT/GO TO/paragraphs following CALL - END-CALL scope terminator - Hyphenated identifiers (MOVE-COUNT) not triggering false flush - Dual quoted+dynamic CALL on same line (ON EXCEPTION) Nested program attribution (2 tests): - CALL in inner program within inner line range - PERFORM before first paragraph has null caller CRLF compatibility (1 test): - GO TO DEPENDING ON with \r\n line endings Area A paragraph detection (2 tests): - Area B (>7 spaces) rejected; Area A (7 spaces) accepted SORT/MERGE (1 test): COLLATING SEQUENCE keywords not captured PROCEDURE USING (2 tests): RETURNING excluded, period-terminated Comment stripping (1 test): pipe in quoted string preserved SELECT OPTIONAL (1 test): correct file name, not OPTIONAL keyword Bug fix: USING extraction regex verb terminators changed from \bVERB\b to \bVERB(?=\s|$) in flushCallAccum — prevents truncation on hyphenated identifiers like MOVE-COUNT, PERFORM-LIMIT. Total: 185 tests passing

13 new describe blocks covering all extraction features: - EXEC DLI: no-SEGMENT, multi-line accumulation (2 tests) - SET: multiple targets, DOWN BY, TO numeric (3 tests) - INSPECT: CONVERTING, multiple counters, tallying-replacing, paragraph flush during accumulation (4 tests) - DECLARATIVES: no-STANDARD keyword, I-O mode, post-END paragraphs (3) - COPY REPLACING: pseudotext deletion ==OLD== BY ==== (1 test) - VALUE: hex literal, negative numeric, ALL literal (3 tests) - OCCURS: TO range, fixed-size without DEPENDING ON (2 tests) - Dynamic CALL/CANCEL: end-of-line, multiple CANCELs (3 tests) - EXEC SQL: INCLUDE skips tables, SELECT INTO host vars, host variable extraction (3 tests) - INITIALIZE: target and caller context (1 test) - Nested programs: sibling scoping, PROGRAM-ID without ID DIV (2) - EXEC EOF flush: unclosed EXEC SQL flushed (1 test) - Multi-PERFORM: IF/ELSE dual PERFORM on single line (1 test) - IS EXTERNAL: USAGE not polluted by external flag (1 test) Total: 215 tests passing

magyargergo · 2026-03-26T11:29:55Z

@claude ultra-think while identifying COBOL-language gaps. Perfom this analysis on this pull request and verify those findings. Act as a senior compiler font-end engineer and expert in static analysis tools and review this change from that perpective and review the architectural fit. Also check out previous comments and reason with their requests.

github-actions · 2026-03-26T11:30:07Z

Claude finished @magyargergo's task in 15m 6s —— View job

COBOL Processor — Sixteenth Deep Review (Compiler Front-End / Static Analysis Perspective)

Tasks:

Post initial comment
Read all COBOL processor source files (current state post-commits 3296201/5aa0e18)
Verify all 15th review findings against current code
Deep compiler front-end analysis — new gap identification
Architectural fit review
Post final comprehensive review

15th Review — Verification Status

All 4 findings from the 15th review verified against the current branch. Ground truth:

#	Finding	Status
15th #A	Last CALL USING parameter dropped / false paragraph in free-format when callAccum flushes on `RE_PROC_PARAGRAPH`	✅ FIXED — `isAreaAParagraph = RE_PROC_PARAGRAPH.test(line) && (!isFreeFormat ? leadingSpaces <= 7 : false)` at line 1157; free-format always `false`
15th #B	`inspectAccum` state stuck when paragraph header arrives mid-accumulation	✅ FIXED — identical two-path guard at lines 1643-1659: `inspIsAreaAPara` + fall-through flush
15th #C	`callAccum` flush trigger fires for `MOVE-COUNT`, `PERFORM-LIMIT` etc.	✅ FIXED — verb regex uses `(?:\s\|$)` at line 1158 (prevents hyphenated-identifier false matches)
15th #D	Free-format CALL USING last parameter — design note	✅ Confirmed: period in accumulated text triggers flush; design is correct

Score: All 60+ prior tracked findings verified resolved or WONTFIX across 15 review cycles.

New Findings (This Review)

🔴 New #1 — `CANCEL` Inside Multi-Line `CALL ... ON EXCEPTION ... END-CALL` Is Silently Lost

cobol-preprocessor.ts:1353–1384 (flushCallAccum)

When callAccum accumulates a full CALL ... END-CALL block, flushCallAccum() processes the accumulated text using only text.matchAll(RE_CALL) and text.matchAll(RE_CALL_DYNAMIC). There is no RE_CANCEL pass.

For the extremely common IBM mainframe COBOL error-handling pattern:

       CALL 'SUBPROG'
           USING WS-DATA
           ON EXCEPTION
               CANCEL 'SUBPROG'
               CALL 'SUBPROG'     ← retry
       END-CALL.

callAccum accumulates all lines until END-CALL.. In flushCallAccum:

RE_CALL finds both SUBPROG calls (first and retry) ✅
CANCEL 'SUBPROG' → no RE_CANCEL extraction — the CANCEL is silently lost

The CANCEL inside the ON EXCEPTION block is completely invisible to the graph. Any impact analysis for "which programs are cancelled when SUBPROG fails?" misses this edge entirely.

Note: the callAccumFlushedThisLine / Area A approach correctly prevents false paragraphs from CANCEL targets because 'SUBPROG'. doesn't match RE_PROC_PARAGRAPH. But CANCEL itself goes unextracted.

Fix: add a RE_CANCEL extraction pass inside flushCallAccum, mirroring the RE_CALL loop:

for (const cancelMatch of text.matchAll(RE_CANCEL)) {
  result.cancels.push({ target: cancelMatch[1] ?? cancelMatch[2], line: callAccumLine, isQuoted: true });
}
for (const dynCancelMatch of text.matchAll(RE_CANCEL_DYNAMIC)) {
  result.cancels.push({ target: dynCancelMatch[1], line: callAccumLine, isQuoted: false });
}

Fix this →

🔴 New #2 — `CANCEL` Keyword Falsely Captured as USING Parameter in Pre-COBOL-85 Style

cobol-preprocessor.ts:1361 (USING lookahead in flushCallAccum)

The current USING lookahead in flushCallAccum:

/\bUSING\s+([\s\S]*?)(?=\bRETURNING\b|\bON\s+(?:EXCEPTION|OVERFLOW)\b|\bNOT\s+ON\b|\bEND-CALL\b|\bINSPECT(?=\s|$)|...|\.\s*$|$)/i

\bCANCEL\b is not in the lookahead. In legacy pre-COBOL-85 code where CALL and CANCEL appear on consecutive lines terminated by the CANCEL's period:

       CALL 'PROG-A'
           USING WS-DATA
       CANCEL 'OLD-PROG'.     ← CANCEL's period terminates callAccum

callAccum accumulates all three lines until the CANCEL's period. In flushCallAccum:

usingMatch[1] captures WS-DATA CANCEL (lazy match stops at .\s*$)
After split(/\s+/).filter(...): ['WS-DATA', 'CANCEL'] — CANCEL passes the identifier regex /^[A-Z][A-Z0-9-]+$/i and is NOT in CALL_USING_FILTER
CANCEL is falsely stored as a USING parameter for PROG-A

Additionally, 'OLD-PROG' is missed (same as New #1).

Fix: add \bCANCEL(?=\s|$) and \bEVALUATE(?=\s|$) to the USING lookahead as additional boundary alternatives. Also add CANCEL to the callAccum flush trigger list at line 1158:

/^(?:GO\s+TO|PERFORM|MOVE|...|CANCEL)(?:\s|$)/i

Fix this →

🟡 New #3 — Multi-Line CALL Start Without Period Simultaneously Feeds `sortAccum`/`inspectAccum`

cobol-preprocessor.ts:1176–1181 and cobol-preprocessor.ts:1624–1637

When a multi-line CALL starts (no period, not single-line), callAccum = line is set at line 1177 but there is no return. The switch at line 1184 still calls extractProcedure(line, lineNum).

If sortAccum !== null at that point, extractProcedure appends the CALL start line to sortAccum (line 1626):

// Line 1626 in extractProcedure — no CALL boundary check here
sortAccum += ' ' + line;

Subsequent CALL continuation lines return early at line 1167 (return; // continuation line consumed), so they don't reach extractProcedure. But the CALL start line IS appended to sortAccum.

When sortAccum eventually flushes, the CALL keyword and program name appear in the accumulated SORT text. flushSort() extracts SORT_CLAUSE_NOISE-filtered names, but CALL and the program name (e.g., CUSTPRC) are not in SORT_CLAUSE_NOISE — they pass the identifier filter and become false SORT GIVING file entries, producing phantom sort-giving ACCESSES edges.

This scenario only occurs when a SORT statement lacks its terminating period before a CALL starts (technically invalid COBOL), but it has been observed in partial source extractions from mainframe repositories.

Fix: add an early return after starting multi-line CALL accumulation:

} else {
  // Multi-line CALL — start accumulating
  callAccum = line;
  callAccumLine = lineNum;
  return; // Don't pass the CALL start line to extractProcedure
}

Alternatively, add CALL to sortAccum/inspectAccum flush triggers inside extractProcedure.

Fix this →

🟡 New #4 — In Free-Format Mode, Real Paragraph Headers Absorbed Into `callAccum` When CALL Lacks Terminator

cobol-preprocessor.ts:1155–1167

In free-format mode, isAreaAParagraph = false (the ternary at line 1157 always evaluates the right branch to false). This correctly prevents WS-CUST-ADDR. from being treated as a paragraph header. However, it also means a real paragraph header like NEXT-PARA. arriving while callAccum is active is absorbed into the accumulation rather than treated as a flush signal.

For a GnuCOBOL (ACAS-style) program:

    CALL 'PROG'
        USING WS-DATA
NEXT-PARA.                  ← no END-CALL, no period before paragraph
    PERFORM PROCESS.

NEXT-PARA. → isAreaAParagraph = false → accumulated into callAccum → period triggers flushCallAccum(). Result:

NEXT-PARA is captured as a USING parameter for PROG (passes [A-Z][A-Z0-9-]+ regex, not in CALL_USING_FILTER)
NEXT-PARA. never reaches extractProcedure → paragraph not added to result.paragraphs → no Function node → PERFORM PROCESS below it has caller attributed to the wrong paragraph

This is the fundamental free-format tension: without column-position information, WS-CUST-ADDR. (legitimate parameter) and NEXT-PARA. (paragraph header) are syntactically identical. The 15th review fix chose to avoid false parameters (correct for the common ACAS benchmark case), but this creates the paragraph-missed problem for programs that don't use END-CALL.

In practice, well-formatted GnuCOBOL programs consistently use END-CALL, so the real paragraph is preceded by END-CALL which triggers proper callAccum flush. The issue only occurs for pre-COBOL-85 free-format style. Worth noting for the ACAS benchmark verification.

🟢 New #5 — Paragraph `endLine` Calculation Crosses Nested Program Boundaries

cobol-processor.ts:423–427:

const nextLine = i + 1 < extracted.paragraphs.length
  ? extracted.paragraphs[i + 1].line - 1
  : lines.length;

All paragraphs across all nested programs are in a single sorted extracted.paragraphs array. When the outer program has paragraph at line 20 and the inner program has paragraph at line 28, the outer paragraph's endLine is computed as 28 - 1 = 27 — a line inside the inner program. The paragraph extends "into" the inner program's territory in the graph.

For a single-program file or files with unique paragraph names (as in the test fixtures), this is invisible. For nested programs where outer paragraphs precede inner program boundaries, the endLine is incorrect.

Fix requires a program-boundary-aware nextLine computation: find the next paragraph at the same nesting depth, not just the next paragraph globally.

🟢 New #6 — `callAccum` Not Flushed on Division/Section Transitions (Malformed COBOL Edge Case)

cobol-preprocessor.ts:1083–1133 vs 1150

The division transition check at line 1083 (RE_DIVISION) and section transition at line 1118 (RE_SECTION) both return before reaching the callAccum block at line 1150. For a file where a CALL statement is the last outer-program statement before a nested IDENTIFICATION DIVISION. header — without a terminating period — callAccum is not flushed and the CALL leaks into the inner program's state.

This only occurs for invalid COBOL (missing final period), and the effect is that the outer CALL is attributed to the outer program (since callAccumLine predates the inner program) but USING parameters from subsequent inner-program lines are falsely included. Low impact for well-formed COBOL. Mentioning for completeness.

Test Coverage Assessment

The recent 49 new regression tests (commits 3296201 and 5aa0e18) cover the area A/free-format fixes well. Missing test scenarios for new findings:

Gap	Finding
`CALL 'PROG' USING WS-DATA ON EXCEPTION CANCEL 'PROG' END-CALL.` — CANCEL inside ON EXCEPTION block	New #1
Pre-COBOL-85 style: `CALL 'A' USING WS-DATA` then `CANCEL 'B'.` on next line	New #2
Multi-line SORT followed immediately by multi-line CALL (no period on SORT)	New #3
Free-format: paragraph header while callAccum active without END-CALL	New #4

Architectural Fit Assessment

The implementation remains architecturally sound and well-factored after 16 review cycles.

Dimension	Assessment
Boundary integrity	Zero `LanguageProvider` interface changes — perfect
callAccum design	Multi-path accumulator (Area A / statement verb detection) is correct for fixed-format. Free-format limitation (New #4) is a fundamental tension without a clean solution
`flushCallAccum` completeness	Extracts quoted + dynamic CALLs correctly. Missing: CANCEL extraction (New #1), CANCEL boundary in USING lookahead (New #2)
`callAccum` start fall-through	Multi-line CALL start doesn't prevent `extractProcedure` from running — can corrupt sortAccum/inspectAccum (New #3)
`owningModuleId()` coverage	All CALL/CANCEL/CICS/SQL/ENTRY/SORT constructs use correct nested-program scoping ✅
Two-pass CALL/CANCEL resolution	Both quoted and dynamic forms; all CICS variants ✅
`SORT_CLAUSE_NOISE`	Comprehensive for SORT/MERGE keyword noise; doesn't help against callAccum contamination (New #3)
`EXCLUDED_PARA_NAMES`	Now includes common verbs; still missing PERFORM, INSPECT, EVALUATE as paragraph false-positive candidates
`CopyResolution.library`	Still captured but `resolveCopy` ignores it — dead code / misleading interface ⚠️
Paragraph `endLine` calculation	Crosses nested program boundaries (New #5) — architectural limitation

Summary

All 4 findings from the 15th review verified as fixed. 60+ findings across 16 review cycles; excellent progression.

New findings (6 total):

🔴 New Add support for Ollama as a local inference backend #1 — CANCEL inside CALL ... ON EXCEPTION ... END-CALL silently lost. flushCallAccum uses only RE_CALL/RE_CALL_DYNAMIC; no RE_CANCEL pass. Common IBM mainframe error-handling pattern. One-loop fix.
🔴 New Welcome to GitNexus Discussions! #2 — CANCEL keyword falsely captured as USING parameter in pre-COBOL-85 style (when CANCEL's period terminates callAccum). CANCEL not in USING lookahead. One-regex fix in flushCallAccum + add CANCEL to flush trigger list.
🟡 New readme #3 — Multi-line CALL start falls through to extractProcedure, contaminating active sortAccum/inspectAccum with CALL text (false SORT file names). Add return after multi-line CALL start, or add CALL to sort/inspect flush triggers.
🟡 New Embeddings pipeline #4 — Free-format paragraph headers absorbed into callAccum when CALL lacks END-CALL (paragraph missed, name falsely in USING). Fundamental free-format tension; only for pre-COBOL-85 style without scope terminators.
🟢 New Embeddings pipeline #5 — Paragraph endLine crosses nested program boundaries (off-by-N for paragraphs preceding nested program sections).
🟢 New highlight tool for agent, prompt enhancements #6 — callAccum not flushed on division/section transitions (only for malformed COBOL with missing final period).

Priority order:

New Add support for Ollama as a local inference backend #1 — One-loop addition to flushCallAccum; closes a gap in a very common IBM mainframe CALL pattern (ON EXCEPTION CANCEL). High ROI.
New Welcome to GitNexus Discussions! #2 — Add CANCEL to USING lookahead + flush trigger list; prevents parameter list corruption for legacy COBOL.
New readme #3 — Add return after multi-line CALL start; prevents sortAccum/inspectAccum contamination.

Production readiness: The implementation is production-quality for all well-formed COBOL. New #1 is the only gap that affects common, valid (with scope terminators) IBM mainframe code.

#1: flushCallAccum now extracts CANCEL statements from within CALL ON EXCEPTION blocks. Adds RE_CANCEL + RE_CANCEL_DYNAMIC matchAll passes alongside existing CALL extraction. #2: Added \bCANCEL(?=\s|$) to USING lookahead regex to prevent CANCEL keyword being captured as false USING parameter. #3: Multi-line CALL start now returns immediately to prevent the CALL start line from simultaneously feeding sortAccum/inspectAccum. #6: Division transitions now flush all active accumulators (callAccum, sortAccum, inspectAccum) to prevent state leakage across programs. Also added CANCEL to callAccum flush trigger verb list. Tests: 215 passing | TypeScript clean

magyargergo · 2026-03-26T12:08:02Z

@claude ultra-think while identifying COBOL-language gaps. Perfom this analysis on this pull request and verify those findings. Act as a senior compiler font-end engineer and expert in static analysis tools and review this change from that perpective and review the architectural fit. Also check out previous comments and reason with their requests.

github-actions · 2026-03-26T12:08:14Z

Claude finished @magyargergo's task in 13m 30s —— View job

18th Review — 17th Review Findings Resolved ✅

Commit 88d1b68 implements both substantive fixes from the 17th review.

17th Review — Verified Fixed

#	Severity	Finding	Status
New #1	🔴	`callAccum` not flushed when EXEC SQL/CICS/DLI block starts	✅ Fixed
New #2	🟡	`inspectAccum` flush trigger missing 14 verbs vs `callAccum`	✅ Fixed
New #3	🟢	CANCEL double-extracted for single-line `CALL…CANCEL…END-CALL`	ℹ️ Accepted (graph-level idempotent)
New #4	🟢	Section transitions don't flush `callAccum`	ℹ️ Accepted (well-formed COBOL constraint)

Fix Details

New #1 — flushCallAccum() before every EXEC entry (lines 1021, 1031, 1040)

Any multi-line CALL whose terminating period appears only after END-EXEC was silently abandoned — the EXEC early-return bypassed the accumulator check entirely. Adding flushCallAccum() before each of the three execAccum assignments mirrors the existing pattern at division and END PROGRAM boundaries.

* Before fix: this CALL was dropped
CALL 'PGMX'
     USING WS-PARM                 ← no period here
     EXEC SQL SELECT …              ← EXEC early-returns, callAccum orphaned
     END-EXEC.

New #2 — inspectAccum flush trigger brought to parity with callAccum (line 1663)

The inspectAccum trigger previously covered only 11 verbs; callAccum's trigger covers 25. The 14 missing verbs (ACCEPT, INSPECT, SEARCH, SORT, MERGE, IF, EVALUATE, CONTINUE, READ, WRITE, REWRITE, DELETE, OPEN, CLOSE, START) meant that an in-flight INSPECT block would absorb subsequent statements rather than closing, silently hiding graph edges from SEARCH, SORT, I/O verbs, and structured control flow.

Remaining Open Items

Finding	Source	Severity	Notes
Paragraph `endLine` ignores nested program boundaries	16th review #5	🟢 Cosmetic	Next paragraph's line - 1 may bleed into inner/outer program

No new architectural gaps found in this pass. The accumulator state machine is now consistent across all flush boundaries (division, END PROGRAM, EXEC entry, paragraph/section detection).

Extract COBOL_STATEMENT_VERBS, RE_STATEMENT_VERB_START, and RE_USING_PARAMS as shared constants — eliminates 4 duplicated 25-verb regex patterns. 17th review: #1 flushCallAccum before EXEC entry, #2 inspectAccum verb parity via shared constant. Tests: 215 passing | TypeScript clean

Replaced 7 toBeGreaterThan/toBeLessThan/toBeGreaterThanOrEqual assertions with exact toBe values: - dataItems.length: >= 3 → toBe(3) - calls.length: >= 1 → toBe(1) - calls[0].line: range check → toBe(10) - programs[].startLine/endLine: comparison → exact values - innerA.endLine/innerB.startLine: comparison → exact values Also added 11 new edge-case tests (accumulator flush on EXEC/division transitions, free-format, CANCEL in CALL block, SORT THRU, verb flush, integration). 226 tests passing — zero fuzzy assertions remain.

magyargergo · 2026-03-26T13:07:25Z

@claude ultra-think while identifying COBOL-language gaps. Perfom this analysis on this pull request and verify those findings. Act as a senior compiler font-end engineer and expert in static analysis tools and review this change from that perpective and review the architectural fit. Also check out previous comments and reason with their requests.

github-actions · 2026-03-26T13:07:46Z

Claude finished @magyargergo's task in 8m 56s —— View job

19th Review — COBOL Processor Deep Analysis (commit `7e52f98`)

Reviewer persona: senior compiler front-end engineer / static analysis tooling expert.
All prior findings re-verified against current code before identifying new gaps.

✅ Verification — 17th / 18th Review Items

#	Finding	Status
17.1	`callAccum` not flushed when EXEC SQL/CICS/DLI starts	✅ Fixed — `flushCallAccum()` called at lines 1050, 1060, 1069
17.2	`inspectAccum` flush trigger missing 14 verbs vs `callAccum`	✅ Fixed — both now share `RE_STATEMENT_VERB_START`
17.3	CANCEL double-extraction for single-line `CALL...CANCEL...END-CALL`	✅ Accepted — idempotent at graph-edge level
17.4	Section transitions don't flush `callAccum`	✅ Accepted — well-formed COBOL invariant documented
18.1	EXEC block `flushCallAccum()` ordering	✅ Fixed — flush occurs before `execAccum` is set

🔴 Finding 1 — `END PROGRAM` does not flush `callAccum` (HIGH)

Location: cobol-preprocessor.ts lines 1078–1093

const endProgramMatch = line.match(RE_END_PROGRAM);
if (endProgramMatch) {
  const topProgram = programBoundaryStack.pop();
  // ... push to result.programs ...
  return;   // ← callAccum NOT flushed
}

Impact: If a multi-line CALL statement is left open at the end of Program A's procedure division and END PROGRAM PROG-A. is the next structural line, callAccum silently carries live state across the program boundary. The accumulated text from Program A is then either:

Flushed during Program B's first matching flush trigger (wrong owning program, wrong callAccumLine), or
Silently dropped if Program B has no subsequent flush trigger before the file ends.

Concrete trigger:

       CALL 'PROG-A-SUB' USING WS-DATA       ← no period, no END-CALL
       END PROGRAM OUTER.                     ← returns without flush
       PROGRAM-ID. INNER.                     ← flush still pending

Fix: Call flushCallAccum(); flushSort(); flushInspect(); before the return in the endProgramMatch block — identical to the division-boundary flush at lines 1119–1122.

🔴 Finding 2 — `PROGRAM-ID` detection path does not flush `callAccum` (HIGH)

Location: cobol-preprocessor.ts lines 1107–1113

if (currentDivision !== 'identification') {
  const pgmIdMatch = line.match(RE_PROGRAM_ID);
  if (pgmIdMatch) {
    extractIdentification(line, lineNum);
    return;   // ← callAccum NOT flushed
  }
}

This path is reached for sibling programs whose IDENTIFICATION DIVISION. header is omitted (common in COBOL source that uses abbreviated structure after END PROGRAM). The comment at line 1105 confirms this intent.

Impact: Compounds Finding 1. Even when END PROGRAM is present, if the next non-blank line is PROGRAM-ID. rather than IDENTIFICATION DIVISION., callAccum is still not flushed at this second-chance boundary. The accumulated CALL text from Program A gets attributed to Program B via callAccumLine which still points into Program A's line range.

Fix: Same as Finding 1 — add flush calls before return here.

🟡 Finding 3 — Arithmetic verbs absent from `COBOL_STATEMENT_VERBS` (MEDIUM)

Location: cobol-preprocessor.ts lines 377–398

const COBOL_STATEMENT_VERBS = [
  'GO\\s+TO', 'PERFORM', 'MOVE', 'DISPLAY', 'ACCEPT',
  'INSPECT', 'SEARCH', 'SORT', 'MERGE', 'IF', 'EVALUATE',
  'SET', 'INITIALIZE', 'STOP', 'EXIT', 'GOBACK', 'CONTINUE',
  'READ', 'WRITE', 'REWRITE', 'DELETE', 'OPEN', 'CLOSE', 'START',
  'CANCEL',
  // ← COMPUTE, ADD, SUBTRACT, MULTIPLY, DIVIDE, STRING, UNSTRING absent
];

Impact: A multi-line CALL with an arithmetic continuation (pre-COBOL-85 implicit termination via the next statement's period) is not flushed when the arithmetic line is encountered. Instead the arithmetic line is appended to callAccum. When the period on the arithmetic line triggers flushCallAccum(), RE_USING_PARAMS scans the combined text and extracts identifiers from the arithmetic line as false USING parameters:

       CALL 'PAYROLL' USING WS-RATE WS-HOURS
       COMPUTE WS-TOTAL = WS-RATE * WS-HOURS.

After accumulation, flushCallAccum sees:

CALL 'PAYROLL' USING WS-RATE WS-HOURS COMPUTE WS-TOTAL = WS-RATE * WS-HOURS.

RE_USING_PARAMS stops at \. (period) and the CALL_USING_FILTER does not contain COMPUTE or WS-TOTAL. The identifier filter /^[A-Z][A-Z0-9-]+$/i matches COMPUTE and WS-TOTAL, injecting them as phantom parameters.

Affected verbs to add: COMPUTE, ADD, SUBTRACT, MULTIPLY, DIVIDE, STRING, UNSTRING — all are COBOL statement-opening verbs that cannot legally appear within a CALL's USING clause.

Fix: Add these seven verbs to COBOL_STATEMENT_VERBS. They will automatically propagate into RE_STATEMENT_VERB_START (callAccum / inspectAccum flush trigger) and USING_VERB_LOOKAHEAD (RE_USING_PARAMS terminator), closing both the flush gap and the false-parameter extraction gap simultaneously.

🟢 Finding 4 — Section `endLine` crosses nested program boundaries (LOW / cosmetic)

Location: cobol-processor.ts lines 392–394 (same as 16th review finding #5 — still present)

const nextLine = sections[i + 1]?.line - 1 ?? fileLines;

Section i and section i+1 may belong to different nested programs. endLine is over-extended across the program boundary, producing a wider line range than the true section scope. This does not affect correctness of edges (owning module is resolved by owningModuleId() / findOwningProgramName()), but makes Namespace node metadata misleading in the graph output.

Architectural Fit Assessment

Dimension	Assessment
Approach (regex vs tree-sitter)	Correct tradeoff for an indexing tool: regex is fast, streaming, and maintainable without a parser generator toolchain. The accumulator state machines (`callAccum`, `sortAccum`, `inspectAccum`, `selectAccum`, `execAccum`) are the right pattern for multi-line COBOL statements.
Two-pass CALL/CANCEL resolution	Architecturally sound. First pass emits unresolved edges; second pass resolves via `cics-*-unresolved` catch-all and program name lookup. Handles inter-file edges correctly.
Shared verb constants	`COBOL_STATEMENT_VERBS` + `RE_STATEMENT_VERB_START` + `USING_VERB_LOOKAHEAD` derived from one source of truth — excellent. Findings 1–3 show the pattern is applied consistently; only the arithmetic verb gap and program-boundary flush remain as exceptions.
Area A guard	Fixed-format paragraph detection at ≤7 leading spaces is correct and avoids the classic CALL USING false-paragraph bug. Free-format correctly disables this guard.
Nested program support	`programBoundaryStack` + `findOwningProgramName()` + `owningModuleId()` is clean. Findings 1–2 are the only gaps in the boundary tracking.
CICS global scoping	`<cics-file>:FILENAME` key design is pragmatic for cross-program VSAM analysis.

Priority Summary

Priority	Finding	Effort
🔴 HIGH	END PROGRAM does not flush `callAccum` (Finding 1)	1 line
🔴 HIGH	PROGRAM-ID sibling path does not flush `callAccum` (Finding 2)	1 line
🟡 MEDIUM	Arithmetic verbs missing from `COBOL_STATEMENT_VERBS` (Finding 3)	7 tokens
🟢 LOW	Section `endLine` crosses nested program boundary (Finding 4)	cosmetic

Findings 1 and 2 are one-liners each (add flushCallAccum(); flushSort(); flushInspect(); before return). Finding 3 requires adding seven verb strings to one array constant — the propagation into all three derived regexes is automatic.

Fixes: #1: END PROGRAM flushes callAccum/sortAccum/inspectAccum #2: PROGRAM-ID sibling path flushes all accumulators #3: Added COMPUTE/ADD/SUBTRACT/MULTIPLY/DIVIDE/STRING/UNSTRING to COBOL_STATEMENT_VERBS (now 32 verbs) Tests (15 new): - END PROGRAM flush: single + nested programs (2) - PROGRAM-ID sibling flush (1) - Arithmetic verb flush: COMPUTE/ADD/SUBTRACT/MULTIPLY/DIVIDE (5) - String verb flush: STRING/UNSTRING (2) - Arithmetic not captured as false USING params (1) - SORT flushed at END PROGRAM (1) - INSPECT flushed at END PROGRAM (1) - All with exact toBe assertions (2) Total: 239 tests passing | Zero fuzzy assertions

magyargergo · 2026-03-26T13:38:28Z

@claude ultra-think while identifying COBOL-language gaps. Perfom this analysis on this pull request and verify those findings. Act as a senior compiler font-end engineer and expert in static analysis tools and review this change from that perpective and review the architectural fit. Also check out previous comments and reason with their requests.

github-actions · 2026-03-26T13:38:41Z

Claude finished @magyargergo's task in 8m 44s —— View job

20th Review — Compiler Front-End & Static Analysis Audit

Acting as a senior compiler front-end engineer and static analysis expert.

✅ 19th Review Findings — Verification

#	Finding	Status
1	`END PROGRAM` block did not flush `callAccum` / `sortAccum` / `inspectAccum`	✅ FIXED — `flushCallAccum(); flushSort(); flushInspect();` at lines 1083–1085
2	`PROGRAM-ID` detection path did not flush accumulators before starting new program scope	✅ FIXED — same three flush calls at lines 1115–1117
3	Arithmetic verbs (`COMPUTE`, `ADD`, `SUBTRACT`, `MULTIPLY`, `DIVIDE`, `STRING`, `UNSTRING`) absent from `COBOL_STATEMENT_VERBS`, causing accumulator to never flush on those statement boundaries	✅ FIXED — all seven verbs added to `COBOL_STATEMENT_VERBS` at lines 382–383
4	`section.endLine` calculation crosses nested program boundaries (cosmetic — graph endLine wrong on boundary sections)	⚠️ Still present — cosmetic, no action required unless precise endLine fidelity is needed

All three actionable fixes from the 19th review are confirmed in-place. The cosmetic boundary issue remains as acknowledged.

🔬 New Findings — 20th Review

Finding 1 — `RE_INITIALIZE` captures only the first target identifier 🟡 MEDIUM

Location: cobol-preprocessor.ts line 313
Pattern: RE_INITIALIZE = /\bINITIALIZE\s+([A-Z][A-Z0-9-]+)/i

The regex captures exactly one identifier after INITIALIZE. In standard COBOL, INITIALIZE accepts a space-separated list of targets:

INITIALIZE WS-CUSTOMER WS-ORDER WS-LINE-ITEM.

With the current pattern, only WS-CUSTOMER produces an ACCESSES write edge. WS-ORDER and WS-LINE-ITEM are silently dropped — meaning the data-flow graph is incomplete for any multi-target INITIALIZE.

Fix: Replace the single-capture regex with a matchAll loop over token positions, or capture the full tail and split on whitespace (filtering out WITH, REPLACING, ALL, ALPHABETIC, NUMERIC, etc. which are INITIALIZE clause keywords):

RE_INITIALIZE_TAIL = /\bINITIALIZE\s+((?:[A-Z][A-Z0-9-]+\s*)+)/i
// then: tail.match(/[A-Z][A-Z0-9-]+/gi)?.filter(t => !INITIALIZE_CLAUSE_KEYWORDS.has(t))

This is the same matchAll pattern already used for CALL/CANCEL extraction and is consistent with the rest of the pipeline.

Finding 2 — COPY detection runs before `callAccum` continuation guard 🟢 LOW

Location: cobol-preprocessor.ts lines 1179–1193 (processLogicalLine)

The COPY detection block executes unconditionally before the callAccum block. In formally valid COBOL a COPY statement cannot appear mid-CALL; however a source file with a malformed or concatenated COPY inside a CALL argument list (e.g. a source generator artefact) would:

Push a spurious entry to result.copies
Skip the callAccum continuation, silently abandoning the in-progress CALL

This is an edge case for invalid source only, but a static analyser operating on real-world legacy codebases (which frequently contain pre-processor noise) will encounter it. No action required for MVP, but worth a defensive if (callAccum) { flushCallAccum(); } before the COPY push if robustness against malformed input is ever needed.

Finding 3 — Single-line CALL + active `sortAccum` contamination 🟢 LOW

Location: cobol-preprocessor.ts lines 1213–1225 (single-line CALL path in processLogicalLine)

The single-line CALL path (terminated by a period or END-CALL on the same line) does not return early from processLogicalLine by design — it falls through to extractProcedure. If sortAccum happens to be active at the same time (which requires invalid COBOL — overlapping SORT and CALL blocks), the CALL line text gets appended to sortAccum, potentially producing phantom file-name tokens in SORT edges.

This only manifests on structurally invalid COBOL and is therefore informational only.

Finding 4 — `CopyResolution.library` is captured but never consumed 🟢 LOW (dead code)

Location: cobol-copy-expander.ts — CopyResolution interface and resolveCopy function

The library field of CopyResolution is populated by the resolution logic but is never read by the caller. This means COPY member IN library or COPY member OF library library qualifiers are parsed and discarded — any disambiguation between identically-named copybooks in different libraries is silently lost.

For a future graph that models library-qualified COPY dependencies (e.g. COPY CUSTIO IN BATCHLIB) this is a pre-existing architectural gap. No immediate action required, but library should either be wired into the node/edge attribution or the field should be removed to avoid giving false assurance that library qualification is handled.

🏗️ Architectural Fit Assessment

Strengths (compiler front-end perspective):

The two-phase architecture (preprocessor extracts, processor maps to graph) correctly separates syntactic concerns from semantic graph construction — analogous to a front-end IR lowering pass.
The COBOL_STATEMENT_VERBS single source of truth (shared by callAccum flush trigger, inspectAccum flush trigger, and RE_USING_PARAMS lookahead) is the right pattern for a regex-based tokeniser — changes propagate consistently.
programBoundaryStack + owningModuleId() correctly models COBOL nested-program scoping as a stack — the right data structure for a block-scoped symbol table.
Two-pass CALL/CANCEL resolution (<unresolved>:PROGNAME synthetic IDs) correctly defers link resolution until all nodes exist, matching the linker-resolution step in compiled languages.
Area A / Area B column guards for paragraph detection (leadingSpaces <= 7) correctly encode COBOL fixed-format layout rules — without this, every DISPLAY statement could be misidentified as a paragraph name.

Residual concerns:

The regex pipeline is a hand-rolled lexer with no formal grammar. This is appropriate for the stated scope (graph extraction from legacy COBOL, not full compilation), but the lack of a token stream means certain statement boundaries are context-dependent on accumulator state rather than syntax. The COBOL_STATEMENT_VERBS set is the de-facto token classifier; keeping it exhaustive is important for correctness.
Free-format (>>SOURCE FREE) detection is a single-pass flag — no column-range validation is applied in free-format mode. This is acceptable given that free-format COBOL is rare in the legacy corpus this tool targets.
The RE_INITIALIZE single-target gap (Finding 1) is the only remaining medium-severity data-flow accuracy issue. All other new findings are low-severity edge cases.

Priority Ordering

Priority	Item
1	Fix `RE_INITIALIZE` multi-target gap (Finding 1) — data-flow completeness
2	Wire or remove `CopyResolution.library` (Finding 4) — dead code hygiene
3	Document COPY-before-callAccum ordering (Finding 2) — robustness note
4	Document single-line CALL + sortAccum (Finding 3) — informational
5	Section endLine crossing nested program boundaries — cosmetic, carry forward

Overall: The three actionable findings from the 19th review are confirmed fixed. The preprocessor is in good shape for production use. The one remaining medium-severity gap is the INITIALIZE multi-target extraction. All other new findings are low-severity edge cases or dead-code hygiene items.

Task checklist

Post initial comment
Read all COBOL processor source files
Verify all 19th review findings against current code
Deep compiler front-end analysis — new gap identification
Architectural fit review
Post final comprehensive review

Finding 1: INITIALIZE now captures multiple targets with REPLACING clause keyword filtering. Regex changed to lazy match stopping at REPLACING/WITH/period boundary. Targets split on whitespace and filtered against INITIALIZE_CLAUSE_KEYWORDS set. Tests (2 new): - INITIALIZE multi-target: WS-CUSTOMER WS-ORDER WS-LINE-ITEM → 3 - INITIALIZE with REPLACING: only WS-RECORD captured, not keywords Total: 241 tests passing | TypeScript clean

* main: (114 commits) feat(csharp): C# MethodExtractor config (abhigyanpatwari#582) docs: add gitnexus-shared build step before gitnexus-web (abhigyanpatwari#585) chore: add enterprise offering section to README, ignore local_docs/ (abhigyanpatwari#579) fix(eval): exclude litellm 1.82.7 and 1.82.8 due to compatibility issues (abhigyanpatwari#580) feat(java,kotlin): MethodExtractor abstraction with per-language configs (abhigyanpatwari#576) feat: added skip-agents-md cli flag (abhigyanpatwari#517) feat(wiki): Azure OpenAI support for wiki command (abhigyanpatwari#562) refactor: reduce explicit any types (abhigyanpatwari#566) feat(java): method references, worker overload disambiguation, interface dispatch (abhigyanpatwari#540) feat: configure eslint with unused import removal (abhigyanpatwari#564) feat: configure prettier with pre-commit hook (abhigyanpatwari#563) feat: unify web and cli ingestion pipeline (abhigyanpatwari#536) fix/opencode mcp gitnexus timeout (abhigyanpatwari#363) chore: bump version to 1.4.10, update CHANGELOG fix: resolve tree-sitter peer dependency conflicts (abhigyanpatwari#538) chore: bump version to 1.4.9, add CHANGELOG.md refactor: Phase 8 & 9 — Field Types and Return-Type Binding (abhigyanpatwari#494) feat: add COBOL language support with regex extraction pipeline (abhigyanpatwari#498) fix: close remaining Dart language support gaps (abhigyanpatwari#524) refactor: split global BUILT_IN_NAMES into per-language provider fields (abhigyanpatwari#523) ... # Conflicts: # gitnexus/src/core/wiki/llm-client.ts

…gyanpatwari#498) * feat: add COBOL language support with regex extraction pipeline Standalone COBOL processor following the markdown-processor.ts pattern: - No LanguageProvider modification — COBOL uses regex, not tree-sitter - No SupportedLanguages enum change — standalone processor pattern New files: - cobol-processor.ts — orchestrator (processCobol, isCobolFile, isJclFile) - cobol/cobol-preprocessor.ts — regex state machine extraction (~888 LOC) - cobol/cobol-copy-expander.ts — COPY statement expansion with circular detection - cobol/jcl-parser.ts — JCL job/step/DD extraction - cobol/jcl-processor.ts — JCL graph node creation Extraction produces: - Module nodes (PROGRAM-ID) - Function nodes (paragraphs) - Namespace nodes (sections) - Property nodes (data items) - CALLS edges (PERFORM intra-file, CALL cross-program) - IMPORTS edges (COPY statements) - CONTAINS edges (section → paragraph hierarchy) Pipeline integration: single processCobol() call in Phase 2.6 54 new tests (33 COBOL + 21 JCL), all 3889 tests pass. * docs: document custom processor pattern in pipeline.ts Add comment block at the custom processor integration point documenting the pattern for future non-tree-sitter language additions. * feat(cobol): enrich graph with EXEC SQL/CICS, ENTRY points, MOVE data flow, PERFORM THRU Maps the remaining 60% of CobolRegexResults to the graph: - EXEC SQL blocks → CodeElement nodes + ACCESSES edges to DB tables - EXEC CICS LINK/XCTL → CodeElement nodes + cross-program CALLS edges - ENTRY points → Constructor nodes (registered for cross-program resolution) - MOVE statements → ACCESSES edges (read/write data flow tracking) - PERFORM THRU → expanded CALLS edges for range targets - File declarations → Record nodes with assignment metadata - Cross-program CALL 2nd pass: resolves unresolved targets after all programs processed * test(cobol): add 26 integration tests with exact assertions + fix CICS resolution bug Integration tests (test/integration/resolvers/cobol.test.ts): - 26 tests covering full COBOL system extraction - ALL assertions use exact toBe(N) — zero fuzzy assertions - Fixtures: CUSTUPDT.cbl, AUDITLOG.cbl, CUSTDAT.cpy, RPTGEN.cbl, RUNJOBS.jcl Bug fix (cobol-processor.ts): - CICS LINK/XCTL cross-program resolution was broken — edges were created with "resolved" reason but pointing to <unresolved> targets - Fix: use cics-link-unresolved / cics-xctl-unresolved suffix pattern matching the existing cobol-call-unresolved pattern - Second-pass resolver now patches both CALL and CICS unresolved edges All 3915 tests pass, 0 failures. * test(cobol): exhaustive 57-test suite with strict exact assertions Complete rewrite of COBOL integration tests using ground-truth approach: dump the full graph, then assert EVERY node and EVERY edge. 57 tests across 9 sections: - Node completeness: Module(3), Function(13), Namespace(2), Property(21), Record(1), CodeElement(8), Constructor(1) — exact sorted arrays - Edge completeness: 22 tests covering every type+reason combination with exact source→target pairs - Cross-program resolution: 6 tests verifying CALL, CICS LINK/XCTL, JCL - COPY expansion: copybook data items in RPTGEN - Section hierarchy: exact paragraph membership per section - Data item ownership: exact per-module breakdown - MOVE data flow: exact read/write pairs - JCL integration: job/step/dataset containment - Grand totals: CALLS(22), CONTAINS(48), IMPORTS(1), ACCESSES(7) Fixture enhancements: - CUSTUPDT.cbl: added INIT-SECTION + PROCESSING-SECTION, PERFORM THRU - AUDITLOG.cbl: added ENTRY "AUDITLOG-BATCH" - RPTGEN.cbl: added EXEC CICS XCTL Zero fuzzy assertions — every expect uses toBe(N) or toEqual([...sorted]). * fix(cobol): add removeRelationship API + single-quote CALL/COPY/ENTRY, PERFORM keyword skip Phase 0A: Add removeRelationship(id) to KnowledgeGraph interface and implementation (trivial Map.delete wrapper). Required for orphan edge cleanup in next commit. Phase 1A (from PR abhigyanpatwari#500 review, modified): - RE_CALL and RE_COPY_QUOTED now match both "double" and 'single' quotes - parseSingleCopyStatement in copy-expander updated for single quotes - PERFORM_KEYWORD_SKIP set prevents UNTIL/VARYING/WITH/TEST/FOREVER from being stored as false-positive perform targets - Sequence number stripping uses /[^0-9 ]/ (preserves numeric seq numbers unlike PR abhigyanpatwari#500's /\S/ which stripped them) - Normalized || to ?? for regex group extraction in copy-expander 5 new graph unit tests, all 57 COBOL integration tests pass. * fix(cobol): RE_ENTRY single-quote + remove orphan unresolved CALLS edges Phase 1B: RE_ENTRY regex now supports both "double" and 'single' quoted ENTRY targets. Uses named intermediates (entryName, usingClause) with ?? operator. USING capture group shifted from [2] to [3]. Phase 1C: Second-pass resolution now collects resolved orphan edge IDs during iteration and removes them after the loop completes, using the new graph.removeRelationship() API. Graph no longer contains phantom <unresolved>: edges alongside their resolved replacements. CALLS count drops from 22 to 18 (4 orphan edges removed). * fix(cobol): Property ID collisions + O(1) Map lookup for MOVE edges Phase 1D+3C (atomic): Property node IDs now use composite key filePath:section:level:name instead of filePath:name. This prevents duplicate data item names in different sections (e.g., STATUS in both WORKING-STORAGE and LINKAGE) from silently colliding. New generatePropertyId() helper ensures both node creation and MOVE edge lookup use the identical key formula. buildDataItemMap() replaces the O(n) findDataItemNode linear scan with O(1) Map lookup, built once per file before MOVE processing. * feat(cobol): MOVE multi-target extraction with OF/IN qualifier filtering MOVE X TO A B C now produces write edges for all targets, not just the first. extractMoveTargets() helper handles OF/IN qualified names (WS-NAME OF WS-RECORD -> target is WS-NAME), subscript stripping (WS-TABLE(I) -> WS-TABLE), and MOVE_SKIP filtering on targets. Data model: CobolRegexResults.moves.to:string -> targets:string[] MOVE CORRESPONDING stays single-target per COBOL standard. Processor MOVE loop now iterates move.targets. * feat(cobol): COPY IN/OF library, pseudotext REPLACING, dynamic CALL, PERFORM TIMES, CICS MAP unquoted Phase 2B: COPY ... IN/OF library-name now captured as metadata in CopyResolution (IN and OF are synonyms per COBOL-85 standard). Phase 2C: COPY REPLACING ==pseudotext== support. Tokenizer handles ==...== delimiters alongside "quoted" strings. Pseudotext forces EXACT type. Two-pass applyReplacing: first pass handles space-containing/ non-identifier pseudotext via global string replace; second pass handles identifier-level LEADING/TRAILING/EXACT. New test file cobol-copy-expander.test.ts with 10 tests. Phase 2E: PERFORM WS-COUNT TIMES no longer produces a false-positive perform target (checks for TIMES keyword after captured identifier). Phase 2F: Dynamic CALL via data item (CALL WS-PROG-NAME without quotes) now emits a CodeElement annotation node with description 'dynamic-call' instead of silently ignoring. Adds isQuoted:boolean to call results. Phase 3A: CICS MAP(WS-MAP-NAME) unquoted identifiers now captured. Phase 3B: Normalized || to ?? in copy-expander (done in Phase 1A). * feat(cobol): nested program support — capture multiple PROGRAM-IDs per file Phase 2D: The state machine now captures all PROGRAM-IDs, not just the first. The primary program name stays in programName; additional nested programs go into nestedPrograms[]. The processor creates separate Module nodes for each nested program, contained by the outer module, and registers them in moduleNodeIds for cross-program CALL resolution. Paragraphs/data items are not yet scoped per-program (attributed to the outer module) — full per-program scoping is a future enhancement that requires END PROGRAM boundary tracking in the state machine. * test(cobol): expand integration tests for all new language features New fixtures: - NESTED.cbl — two PROGRAM-IDs (OUTER-PROG, INNER-PROG) for nested program support testing - COPYLIB.cpy — copybook for pseudotext REPLACING test target Modified fixtures: - CUSTUPDT.cbl — single-quoted ENTRY 'ALTENTRY', multi-target MOVE (WS-AMT TO FIELD-A FIELD-B), dynamic CALL WS-PROG-NAME, COPY COPYLIB with pseudotext REPLACING, LINKAGE SECTION with LS-PARAM - RPTGEN.cbl — PERFORM WS-COUNT TIMES (false-positive guard), unquoted MAP(WS-MAP-NAME), additional data items WS-COUNT WS-MAP-NAME Integration test rewritten with 62 exact assertions covering: - 5 Module, 17 Function, 33 Property, 9 CodeElement, 2 Constructor nodes - Nested program containment (OUTER-PROG -> INNER-PROG) - Dynamic CALL annotation (CodeElement with cobol-dynamic-call) - Multi-target MOVE (UPDATE-BALANCE: 2 reads, 3 writes) - Single-quoted ENTRY (ALTENTRY under CUSTUPDT) - PERFORM TIMES guard (WS-COUNT not in CALLS) - Orphan unresolved edge removal (zero -unresolved edges) - Grand totals: 21 CALLS, 68 CONTAINS, 2 IMPORTS, 10 ACCESSES * fix(cobol): pseudotext REPLACING now applies correctly via isPseudotext flag Root cause: ==PREFIX-== matched /^[A-Z][A-Z0-9-]*$/i (trailing hyphens allowed), routing it to the second-pass EXACT identifier match where PREFIX-RECORD !== PREFIX- failed silently. Fix: Propagate isPseudotext from parseReplacingClause to CopyReplacing interface, then use it in applyReplacing first-pass condition to force global string replacement for all pseudotext entries regardless of whether the content looks like an identifier. Result: COPY COPYLIB REPLACING ==PREFIX-== BY ==WS-==. now correctly transforms PREFIX-RECORD → WS-RECORD, PREFIX-CODE → WS-CODE, etc. * refactor(cobol): per-program scoping via boundary tracking + line-range grouping State machine changes (minimal, ~30 lines): - Add RE_END_PROGRAM regex for END PROGRAM program-name. detection - Replace nestedPrograms[] with programs[] containing startLine/endLine/ nestingDepth metadata for each PROGRAM-ID in the file - Reset division/section/paragraph state on new PROGRAM-ID boundary - EOF finalization flushes remaining stack entries (single-program files) - Programs sorted by startLine (outer before inner) Processor changes: - Uses programs[] with line-range containment to find enclosing parent Module for nested programs (replaces hardcoded nestedParent logic) - programModuleIds Map tracks Module node IDs per program name Fixture: NESTED.cbl now includes END PROGRAM lines for both programs. Integration test: PREFIX-* Property nodes now correctly appear as WS-* after the pseudotext REPLACING fix from the previous commit. * feat(cobol): free-format COBOL support (>>source free) Auto-detects >>SOURCE FREE directive in the first 500 chars and switches to free-format line processing: - No column-position rules (cols 1-6 are program text, not sequence area) - Comments use *> prefix instead of col 7 indicator - No continuation line indicator - Strip inline *> comments - Skip >>SOURCE directive lines preprocessCobolSource() skips col-1-6 stripping for free-format files. Paragraph/section regexes relaxed from fixed 7-space prefix to flexible whitespace with case-insensitivity (/^\s*([A-Z][A-Z0-9-]+)\.\s*$/i). EXCLUDED_PARA_NAMES expanded with COBOL verbs (GOBACK, END-READ, etc.) to prevent false-positive paragraph detection in free-format. Also fixes: entry-point-scoring.ts crash when language is 'cobol' (MERGED_ENTRY_POINT_PATTERNS[language] was undefined → optional chaining). Benchmark on ACAS 3.01 (268 GnuCOBOL free-format programs, 10MB): - Before: 407 nodes, 393 edges (near-empty, only file nodes) - After: 4,297 nodes, 3,612 edges, 542 clusters, 11 flows * fix(cobol): relax data item regexes for free-format (^\s+ to ^\s*) RE_FD, RE_DATA_ITEM, RE_ANONYMOUS_REDEFINES, and RE_88_LEVEL all used ^\s+ which requires at least 1 leading space. In free-format mode, lines are trimmed before processing, so data items like "01 WS-FIELD PIC X." have no leading whitespace after trimming. Changed to ^\s* (zero or more spaces) which works for both fixed-format (indented lines still have spaces) and free-format (trimmed lines). ACAS benchmark (268 GnuCOBOL programs): - Before: 4,297 nodes, 3,612 edges (paragraphs only) - After: 13,832 nodes, 8,615 edges (+ data items, FDs, 88-levels) * feat(cobol): 100% structural feature coverage — GO TO, SCREEN, SD/RD, SORT, SEARCH, CANCEL, Level 66 New extractions: GO TO (CALLS edges), SCREEN SECTION data items, SD/RD alongside FD (Record nodes), SORT/MERGE USING/GIVING (ACCESSES), SEARCH (ACCESSES), CANCEL (CALLS), Level 66 RENAMES (Property), IS EXTERNAL/IS GLOBAL (Property description enrichment). ACAS: 13,951 nodes | 13,193 edges | 685 clusters | 150 flows (+53% edges from new GO TO/SORT/SEARCH/CANCEL extractions) * feat(cobol): enriched CICS extraction — file I/O, dynamic PROGRAM, queues, HANDLE ABEND EXEC CICS blocks now extract: - FILE/DATASET clause: captures VSAM file name (literal or data item ref) for READ/WRITE/REWRITE/DELETE/STARTBR/READNEXT/READPREV → ACCESSES edges - PROGRAM clause: now handles unquoted variable references (dynamic CICS program transfer) → CodeElement annotation with cics-dynamic-program reason - QUEUE clause: captures TS/TD queue names from WRITEQ/READQ → ACCESSES edges - LABEL clause: captures HANDLE ABEND error handler targets → CALLS edges - TRANSID: now handles unquoted variable references CodeElement descriptions enriched with all captured fields (map, program, transid, file, queue, label). CardDemo benchmark: +49 nodes, +33 edges from enriched CICS extraction. * feat(cobol): complete CICS command extraction — all 7 expert recommendations From COBOL expert agent analysis: 1. ENDBR added to isRead file command list 2. LOAD added to PROGRAM edge commands (alongside LINK/XCTL) 3. Two-word commands expanded: WRITEQ/READQ/DELETEQ TS/TD, HANDLE ABEND/AID/CONDITION, START TRANSID 4. Queue reason differentiated: cics-queue-read/-write/-delete 5. RETURN/START TRANSID → CALLS edges to synthetic <transid> target 6. MAP → ACCESSES edges for screen traceability 7. INTO/FROM data fields extracted → ACCESSES edges to data items Also: dataItemMap built before CICS block processing (was declared after), CodeElement descriptions enriched with all captured CICS fields. * test(cobol): strict exhaustive integration tests with exact edgeSet assertions Every edge reason has exact sorted pair assertions via edgeSet(), not just counts. Any change to extraction that adds, removes, or reorders edges will produce a precise, descriptive failure. Updated RPTGEN.cbl fixture with: - GO TO EXIT-PARAGRAPH, SORT USING/GIVING, SEARCH table - EXEC CICS READ FILE INTO, WRITEQ TS QUEUE FROM, SEND MAP FROM - EXEC CICS HANDLE ABEND LABEL, RETURN TRANSID, XCTL PROGRAM(variable) - ABEND-HANDLER and EXIT-PARAGRAPH paragraphs 46 tests covering 24 CALLS + 79 CONTAINS + 18 ACCESSES + 2 IMPORTS edges across 15 distinct edge reason codes, all with exact sorted pair lists. * fix(cobol): address 5 findings from second Claude review (compiler front-end perspective) Finding abhigyanpatwari#2: Numeric sequence numbers now stripped (changed /[^0-9 ]/ to /\S/ in preprocessCobolSource). Lines like "000100 MAIN-PARAGRAPH." now have cols 1-6 blanked so paragraph regex matches correctly. Finding abhigyanpatwari#11: JCL in-stream PROC ordering fixed — pre-register all PROCs into moduleNames before step processing. Steps that EXEC a PROC defined later in the same file now get CALLS edges. Finding #A: PROCEDURE DIVISION USING no longer captures calling-convention keywords (BY, VALUE, REFERENCE, CONTENT, ADDRESS, OF) as parameter names. Finding #C: SORT/MERGE USING/GIVING now captures ALL file references (multi-file), not just the first. Changed from single-match to section extraction with split. Finding #D: Section headers no longer set currentParagraph, preventing PERFORM caller misattribution to Namespace instead of Function nodes. * fix(cobol): address code review findings — ReDoS fix, perf, cleanup P1 CRITICAL — ReDoS in SORT USING/GIVING: Replaced nested-quantifier regex with safe indexOf+substring+split approach. No backtracking possible on crafted input. P2 — readCopy O(M) linear scan: Added copybookByPath reverse Map for O(1) path-to-content lookup. P3 — Dead code removal: Deleted unused RE_SORT_USING and RE_SORT_GIVING constants. P3 — EXCLUDED_PARA_NAMES simplification: Replaced 20 END-* entries with startsWith('END-') prefix check. Auto-covers future END-* verbs. P3 — Misplaced JSDoc on removeRelationship: Fixed comment that described removeNodesByFile instead. Added missing JSDoc to removeNodesByFile. Review agents: architecture-strategist, performance-oracle, security-sentinel, code-simplicity-reviewer. * refactor: add Cobol to SupportedLanguages with parseStrategy: standalone New languages/cobol.ts — standalone regex processor provider with no-op tree-sitter fields. Declares parseStrategy: 'standalone' to distinguish from tree-sitter-based languages. Added parseStrategy: 'tree-sitter' | 'standalone' to LanguageProviderConfig for languages that use their own processor instead of tree-sitter. Removed all 11 'cobol' as any casts — now uses SupportedLanguages.Cobol. Added empty Cobol entries to entry-point-scoring and framework-detection. * fix(cobol): 5 fixes from third Claude review + 3 regression tests Fixes: - Line numbers now 1-indexed in fixed-format (was 0-indexed, off-by-one in jump-to-definition links) - Copybook content preprocessed before COPY expansion (sequence numbers and patch markers in copybooks no longer survive into expanded source) - ENTRY USING filters calling-convention keywords (BY, VALUE, REFERENCE, CONTENT, ADDRESS, OF) — same fix as PROCEDURE DIVISION USING - SORT/MERGE trailing period stripped from USING/GIVING file tokens - Paragraph exclusion uses exact match for SECTION/DIVISION (was substring match that excluded valid names like CROSS-SECTION-ANALYSIS) USING_KEYWORDS moved to module scope for reuse by both PROCEDURE DIVISION USING and ENTRY USING handlers. New unit tests: - ENTRY USING BY VALUE filtering - Paragraph names containing SECTION not excluded - Numeric sequence numbers stripped enabling paragraph detection * fix(cobol): address 6 findings from fourth Claude review + tests Fourth review findings fixed: - New #IV: PERFORM TIMES guard uses perfMatch.index instead of line.indexOf (prevents wrong match when target appears earlier in line) - New #V: 88-level condition values now handle single-quoted literals ('Y' no longer stored with embedded quotes) - New #I: CANCEL edges use two-pass resolution like CALL (no longer silently dropped when target indexed after source) - New abhigyanpatwari#3: Multi-line SORT/MERGE accumulation — sortAccum state variable accumulates lines until period, then extracts USING/GIVING from full statement (95% of production SORT statements span multiple lines) - New #II: PROCEDURE DIVISION USING on split lines — pendingProcUsing flag defers parameter capture to next line if USING not on same line - New abhigyanpatwari#6 (prior): EXCLUDED_PARA_NAMES exact match for SECTION/DIVISION Updated fixture: RPTGEN.cbl SORT now uses multi-line format with GIVING on separate line (period-terminated). New sort-giving integration test. ACCESSES total: 18 → 19 (new sort-giving edge from multi-line capture). * fix(cobol): address 4 findings from fifth Claude review Finding #B (5 reviews old): Section/paragraph node IDs now include enclosing program name to prevent collision when nested programs share section/paragraph names. New findOwningProgramName() helper uses programs[] line ranges to find the innermost enclosing program. Finding #α: pendingProcUsing now reset in the if(procUsingMatch) branch (was only set in else branch, could leak across nested programs). Finding #β: RE_CALL_DYNAMIC uses negative lookbehind (?<![A-Z0-9-]) to prevent false-positive on compound identifiers like WS-CALL OCCURS. Finding #γ: sortAccum flushed at EOF (parallel to flushSelect and pendingFdName EOF cleanup). Prevents silent loss of SORT USING/GIVING relationships in truncated files. * fix(cobol): address findings from reviews 5+6 with full test coverage Review 5 fixes: - #α: pendingProcUsing reset in if(procUsingMatch) branch - #β: RE_CALL_DYNAMIC negative lookbehind prevents WS-CALL false positive - #γ: sortAccum flushed at EOF for truncated files - #B: Section/paragraph IDs include owning program name Review 6 fixes: - #P: sectionNodeIds/paraNodeIds maps use program-scoped keys (PROGNAME:NAME). New scopedParaLookup/scopedCallerLookup helpers. findContainingSection updated with programs parameter. - #Q: RETURNING added to USING_KEYWORDS for COBOL 2002+ - #R: RE_PERFORM matches both THRU and THROUGH via alternation New unit tests (6): - PERFORM THROUGH captures thruTarget - PROCEDURE DIVISION USING RETURNING filters keyword - RE_CALL_DYNAMIC no false-match on WS-CALL compound identifier - Multi-line SORT captures USING/GIVING from continuation lines - PROCEDURE DIVISION USING on split line via pendingProcUsing - Copybook preprocessing strips sequence numbers * fix(cobol): address findings from seventh Claude review + 3 tests Review 7 fixes: - #i: findContainingSection only updates best when lookup succeeds (prevents undefined overwriting valid parent section) - #ii: RE_PROC_SECTION handles segment numbers (SECTION 30.) - #III: procedureUsing now stored per-program on boundary stack entries, propagated to programs[] output. Inner programs no longer overwrite outer program's parameters. - #δ: Dynamic CANCEL (CANCEL variable) now creates CodeElement annotation node, matching dynamic CALL behavior. RE_CANCEL_DYNAMIC with negative lookbehind. cancels[] gains isQuoted field. - #Q: RETURNING added to USING_KEYWORDS (already in prev commit) - #R: PERFORM THROUGH already fixed (THRU|THROUGH alternation) New unit tests: - Nested programs carry per-program procedureUsing - SECTION with segment number detected - Dynamic CANCEL via data item captured with isQuoted=false * feat(cobol): link PROCEDURE DIVISION USING to LINKAGE data items + close 4 findings Finding abhigyanpatwari#10 FIXED: procedureUsing parameters now create ACCESSES edges with reason 'cobol-procedure-using' from Module to matching LINKAGE SECTION Property nodes. This exposes the program's parameter contract in the graph (e.g., AUDITLOG → LS-CUST-ID, AUDITLOG → LS-AMOUNT). Findings closed by expert agent consensus: - abhigyanpatwari#6 COPY IN library: WONTFIX — captured metadata, no universal library-to-directory mapping exists. Field costs nothing and is useful for library queries. - abhigyanpatwari#14 SQL DELETE: WONTFIX — DB2 requires FROM; existing FROM pattern handles it. Bare DELETE would risk false positives. - #E OCCURS DEPENDING ON: WONTFIX — runtime sizing concern, not structural. The static occurs count is sufficient for indexing. All 39 findings from 7 Claude reviews now resolved or closed. * fix(cobol): resolve 48 review findings across 9 review cycles Ninth deep review resolved all remaining COBOL parser gaps identified by 5 specialist agents (COBOL expert, architecture strategist, TypeScript reviewer, security sentinel, code simplicity reviewer). Fixes (P1 — critical): - SELECT OPTIONAL now correctly skips OPTIONAL keyword (C1) - RETURNING params excluded from PROCEDURE DIVISION USING list (C7) - SORT GIVING no longer captures clause keywords as file names (C5) - Extract flushSort() helper eliminating 40-line duplication (S2) - Flush unclosed EXEC blocks at EOF matching SORT/SELECT pattern (S3) - Guard undefined map key in jcl-processor moduleNames (S1) - Add MAX_TOTAL_EXPANSIONS=500 to prevent exponential COPY breadth (S4) Fixes (P2 — important): - Quote-aware stripInlineComment for | and *> in string literals (C2+C3) - Fixed-format literal continuation now handles quoted strings (C6) - PROGRAM-ID detected regardless of division state for siblings (C9) Fixes (P3 — cleanup): - EXEC SQL INTO restricted to INSERT INTO to avoid FETCH false-pos (C8) - Copy expander line numbers fixed from 0-based to 1-based (C11) - Remove dead code: inInStreamProc, fileIsLiteral, expansionDepth (S7-S10) Also fixes 8th-review findings: nested program CONTAINS attribution, multi-PERFORM on same line, INPUT/OUTPUT PROCEDURE IS in SORT, GO TO DEPENDING ON multi-target, MOVE CORR abbreviation, per-program procedureUsing ACCESSES edges. Tests: 145 COBOL tests passing (59 integration + 86 unit) Benchmarks: CardDemo 12,323 nodes/8,893 edges (7.4s) ACAS 14,016 nodes/15,452 edges (9.3s, -9% faster) * docs(cobol): update documentation for ninth review cycle fixes Update all 4 COBOL documentation files to reflect the 16 fixes from the ninth review cycle: - regex-extraction.md: quote-aware comment stripping, SELECT OPTIONAL, RETURNING exclusion, SORT_CLAUSE_NOISE filter, flushSort() helper, GO TO multi-target, PROGRAM-ID division-independent detection - copy-expansion.md: MAX_TOTAL_EXPANSIONS=500 breadth guard, 1-based line numbers, removed expansionDepth/warnedCircular param - deep-indexing.md: GO TO DEPENDING ON, INPUT/OUTPUT PROCEDURE IS, MOVE CORR edge reasons, INSERT INTO restriction, literal continuation - performance.md: updated benchmarks (CardDemo 12,323n/8,893e/7.4s, ACAS 14,016n/15,452e/9.3s), COPY breadth guard * fix(cobol): resolve 10th review findings — nested program edge attribution Fix 6 findings from the 10th review (PR abhigyanpatwari#498 comment #4132201110): #A+#F: All CALL/CANCEL/CICS/ENTRY/SQL/SEARCH/file-declaration edges now use owningModuleId() for nested program attribution instead of the outer program's parentId. Added helper function owningModuleId() to centralize the pattern. #B: Added USING and GIVING to SORT_CLAUSE_NOISE set to prevent MERGE USING + OUTPUT PROCEDURE from capturing clause keywords as file names. #C: INPUT/OUTPUT PROCEDURE regex now captures optional THRU/THROUGH range end paragraph, mirroring RE_PERFORM's THRU support. #D: scopedCallerLookup fallback now uses programModuleIds.get(pgm) instead of parentId, so PERFORM/MOVE/GOTO in nested programs with unresolvable paragraphs fall back to the correct inner module. #E: pendingProcUsing only set when PROCEDURE DIVISION line is NOT period-terminated, preventing false USING expectation. Tests: 145 passing | TypeScript clean * fix(cobol): resolve 10th review findings — nested program edge attribution Fix 6 findings from the 10th review (PR abhigyanpatwari#498 comment #4132201110): #A+#F: All CALL/CANCEL/CICS/ENTRY/SQL/SEARCH/file-declaration edges now use owningModuleId() for nested program attribution instead of the outer program's parentId. Added helper function owningModuleId() to centralize the pattern. #B: Added USING and GIVING to SORT_CLAUSE_NOISE set to prevent MERGE USING + OUTPUT PROCEDURE from capturing clause keywords as file names. #C: INPUT/OUTPUT PROCEDURE regex now captures optional THRU/THROUGH range end paragraph, mirroring RE_PERFORM's THRU support. #D: scopedCallerLookup fallback now uses programModuleIds.get(pgm) instead of parentId, so PERFORM/MOVE/GOTO in nested programs with unresolvable paragraphs fall back to the correct inner module. #E: pendingProcUsing only set when PROCEDURE DIVISION line is NOT period-terminated, preventing false USING expectation. Tests: 145 passing | TypeScript clean * fix(cobol): resolve 11th review findings — final nested program + multi-CALL gaps abhigyanpatwari#1: scopedCallerLookup(null) now uses owningModuleId(lineNum) instead of parentId, fixing PERFORM/MOVE/GOTO before first paragraph in nested programs. abhigyanpatwari#2+abhigyanpatwari#3: CALL and CANCEL extraction now uses matchAll (global flag) to capture multiple occurrences on the same line. Dynamic CALL/CANCEL checked independently instead of in else branch. abhigyanpatwari#4: SORT/MERGE ACCESSES edge IDs now use owningModuleId(sort.line) instead of parentId for nested program correctness. abhigyanpatwari#5: preprocessCobolSource free-format detection now uses first 10 lines (consistent with extractCobolSymbolsWithRegex threshold). abhigyanpatwari#6: EXCLUDED_PARA_NAMES expanded with DISPLAY, ACCEPT, WRITE, READ, REWRITE, DELETE, OPEN, CLOSE, RETURN, RELEASE, SORT, MERGE to prevent false-positive paragraph detection on isolated verbs. Also removed unused GraphNode import from cobol-processor.ts. Tests: 145 passing | TypeScript clean * docs(cobol): deepened full language coverage plan with research findings 3 research agents analyzed Phase 1-2 features and graph value ranking. Key findings: cobol-call-using is abhigyanpatwari#1 edge type (9.2/10); multi-line accumulation is dominant challenge; DECLARATIVES is lowest-risk Phase 2 item; SET TO TRUE covers 80-90% of SET usage. * feat(cobol): implement Phase 1 — high-value data flow edges 4 new extraction features that create new ACCESSES and IMPORTS edges: 1.1: EXEC SQL INCLUDE -> IMPORTS edges with reason 'sql-include' Handles unquoted (SQLCA), quoted ('DBRMLIB.MEMBER'), and underscored (CUST_TBL_DCL) member names. 1.2: CALL USING parameter extraction -> ACCESSES edges Extracts parameters from CALL USING clause, filtering BY/REFERENCE/ CONTENT/VALUE/ADDRESS/OF/LENGTH/OMITTED keywords. Creates 'cobol-call-using' ACCESSES edges (graph value: 9.2/10). 1.4: OCCURS DEPENDING ON -> ACCESSES edges with reason 'cobol-depends-on' Extended OCCURS regex captures DEPENDING ON field with subscript stripping. Creates dependency edge from table to controlling field. 1.5: VALUE clause for standard data items Extracts VALUE from data item clauses: quoted strings with type prefix (X/N/G/B), ALL literals, numerics (incl negative/decimal), and figurative constants. Populates Property node values. Tests: 145 passing (+2 ACCESSES from CALL USING) | TypeScript clean * feat(cobol): implement Phase 2 — DECLARATIVES, SET, INSPECT, EXEC DLI 4 new extraction features for error handling, data flow, and IMS/DB: 2.1: EXEC DLI (IMS/DB) -> CodeElement + ACCESSES edges Accumulates EXEC DLI blocks like EXEC SQL. Parses DLI verbs (GU, GN, ISRT, REPL, DLET, CHKP, SCHD, TERM). Extracts SEGMENT, PCB, INTO/FROM, PSB. Creates dli-{verb} ACCESSES edges to <ims>:segment Record nodes. 2.2: DECLARATIVES / USE AFTER EXCEPTION -> ACCESSES edges Tracks inDeclaratives state. Detects USE AFTER STANDARD EXCEPTION ON file-name. Creates cobol-error-handler ACCESSES edge from handler section to file Record. 2.3: SET statement -> ACCESSES edges Detects SET TO TRUE (80-90% of SET usage) and SET index TO/UP BY/DOWN BY. Creates cobol-set-condition / cobol-set-index write edges + cobol-set-read for identifier values. 2.4: INSPECT -> ACCESSES edges with multi-line accumulator Accumulates INSPECT until period (like SORT). Extracts inspected field + tally counters. Creates cobol-inspect-read/write/tally edges. Form detection: tallying/replacing/converting/combined. Preprocessor: 1398 -> 1597 LOC (+199). Tests: 145 passing. * feat(cobol): implement Phase 3 — completeness fixes 6 partial features fixed to first-class support: 3.1: CALL RETURNING -> ACCESSES write edge (cobol-call-returning) 3.2: SELECT OPTIONAL flag preserved in FileDeclaration + Record node 3.3: ALTERNATE RECORD KEY extraction (matchAll for multiple keys) 3.4: COMMON attribute on nested programs (RE_PROGRAM_ID extended) 3.5: IS EXTERNAL / IS GLOBAL as first-class boolean properties (removed usage string hack) 3.6: AUTHOR / DATE-WRITTEN mapped to Module node description Tests: 145 passing | TypeScript clean * feat(cobol): implement Phase 4 — INITIALIZE + metadata completeness 4.1: INITIALIZE statement -> ACCESSES write edge (cobol-initialize) 4.2: DATE-COMPILED and INSTALLATION paragraphs extracted and mapped to Module node description alongside existing AUTHOR/DATE-WRITTEN All 4 plan phases complete. Coverage: ~95% (up from 71.9%). Tests: 145 passing | TypeScript clean * test(cobol): add 24 unit tests for Phase 1-4 features Coverage for all new extraction features: Phase 1 (8 tests): - EXEC SQL INCLUDE (unquoted, quoted, underscored) - CALL USING (simple, mixed modes, ADDRESS OF, OMITTED) - CALL RETURNING - OCCURS DEPENDING ON - VALUE clause (string, numeric, figurative constant) Phase 2 (10 tests): - EXEC DLI GU/ISRT/SCHD (verb, segment, PCB, INTO, FROM, PSB) - DECLARATIVES USE AFTER EXCEPTION (single + multiple sections) - SET TO TRUE, SET index UP BY - INSPECT TALLYING, INSPECT REPLACING Phase 3-4 (6 tests): - SELECT OPTIONAL flag - ALTERNATE RECORD KEY - PROGRAM-ID IS COMMON - IS EXTERNAL / IS GLOBAL booleans - INITIALIZE extraction - Full programMetadata (AUTHOR, DATE-WRITTEN, DATE-COMPILED, INSTALLATION) Total: 168 tests passing (145 + 24 - 1 removed duplicate) * fix(cobol): use /\r?\n/ split for Windows CRLF compatibility All 4 COBOL source files now split on /\r?\n/ instead of '\n' to handle CRLF line endings on Windows. Previously, trailing \r in lines caused RE_GOTO's $ anchor to fail on multi-line GO TO DEPENDING ON statements, producing only 1 goto edge instead of 4. Files fixed: cobol-preprocessor.ts (2 sites), cobol-processor.ts, jcl-parser.ts, cobol-copy-expander.ts Tests: 168 passing | TypeScript clean * fix(cobol): resolve 12th review — dynamic CALL/CANCEL dedup + trailing anchors abhigyanpatwari#1+abhigyanpatwari#2: Removed incorrect hasQuotedCall/hasQuotedCancel deduplication guards. RE_CALL_DYNAMIC and RE_CANCEL_DYNAMIC require [A-Z] after CALL/CANCEL, so they CANNOT match quoted targets — the guards were both unnecessary and actively harmful, suppressing dynamic CALL/CANCEL in ON EXCEPTION patterns. abhigyanpatwari#3+abhigyanpatwari#5: Changed RE_CALL_DYNAMIC and RE_CANCEL_DYNAMIC trailing anchor from (?:\s|\.) to (?=\s|\.|$) (lookahead). The consuming anchor failed when the identifier was the last token on a physical line. Tests: 168 passing | TypeScript clean * feat(cobol): add CALL accumulator + fix SORT double-statement (abhigyanpatwari#4, abhigyanpatwari#6) Finding abhigyanpatwari#4: Multi-line CALL USING accumulator Added callAccum state variable that accumulates CALL statements spanning multiple physical lines until period or END-CALL is found. Uses flushCallAccum() to re-extract CALL target + USING parameters from the full accumulated statement. This fixes the silent loss of ACCESSES parameter edges when USING appears on lines after CALL. Finding abhigyanpatwari#6: SORT double-statement on same line After flushSort(), the code now falls through to re-check the current line for a new SORT/MERGE start (was previously blocked by the sortAccum === null check evaluating before flushSort ran). Also fixed: used non-global regex for CALL detection test to avoid the classic global regex .test() lastIndex bug. Tests: 168 passing (+1 ACCESSES from multi-line CALL USING) * fix(cobol): resolve 13th review — CICS LOAD, USING extraction, file scoping abhigyanpatwari#1: CICS LOAD unresolved edge no longer silently deleted in second pass. Changed narrow cics-link/cics-xctl check to catch-all pattern: rel.reason?.startsWith('cics-') && rel.reason.endsWith('-unresolved') abhigyanpatwari#2: flushCallAccum USING extraction now stops before COBOL statement verbs (INSPECT, SEARCH, SORT, MERGE, DISPLAY, ACCEPT, MOVE, PERFORM, GO TO, CALL, IF, EVALUATE). Prevents absorbing adjacent statements as false USING parameters in legacy pre-COBOL-85 code without END-CALL. abhigyanpatwari#3: CICS FILE Record nodes now globally-scoped (<cics-file>:FILENAME) instead of per-file-scoped. Enables cross-program CICS file access analysis, consistent with SQL table scoping (<db>:TABLE). abhigyanpatwari#4: callAccum pre-check regex now has (?<![A-Z0-9-]) lookbehind to prevent false activation on compound identifiers like WS-CALL-FLAG. Tests: 168 passing | TypeScript clean * fix(cobol): resolve 14th review — callAccum false paragraph + Area A guard abhigyanpatwari#1: callAccum continuation lines now check for COBOL statement verb starts (GO TO, PERFORM, MOVE, etc.) and paragraph/section headers. If detected, the CALL is flushed as-is and the line processed normally — prevents false paragraph detection and currentParagraph corruption from lines like "WS-ADDR." being treated as paragraphs. abhigyanpatwari#4: callAccum pre-check now guarded by currentDivision === 'procedure' to prevent unnecessary activations in DATA DIVISION. abhigyanpatwari#5: Fixed-format paragraph detection now rejects lines with >7 leading spaces (Area B indentation) as paragraph candidates. Paragraph names in fixed-format must start in Area A (col 8-11, max 7 spaces). Free-format mode is unaffected. Tests: 168 passing | TypeScript clean * fix(cobol): resolve 15th review — callAccum Area A + verb boundary fixes #A: Column-position-aware paragraph detection in callAccum flush. #B: inspectAccum early-flush on paragraph/section/verb headers. #C: Verb boundary \b → (?:\s|$) prevents MOVE-COUNT false flush. * test(cobol): add 17 edge-case regression tests + fix USING verb boundary 17 new tests covering all recurring review patterns: Multi-line CALL USING (7 tests): - Parameters on separate continuation lines (IBM mainframe style) - No absorption of INSPECT/GO TO/paragraphs following CALL - END-CALL scope terminator - Hyphenated identifiers (MOVE-COUNT) not triggering false flush - Dual quoted+dynamic CALL on same line (ON EXCEPTION) Nested program attribution (2 tests): - CALL in inner program within inner line range - PERFORM before first paragraph has null caller CRLF compatibility (1 test): - GO TO DEPENDING ON with \r\n line endings Area A paragraph detection (2 tests): - Area B (>7 spaces) rejected; Area A (7 spaces) accepted SORT/MERGE (1 test): COLLATING SEQUENCE keywords not captured PROCEDURE USING (2 tests): RETURNING excluded, period-terminated Comment stripping (1 test): pipe in quoted string preserved SELECT OPTIONAL (1 test): correct file name, not OPTIONAL keyword Bug fix: USING extraction regex verb terminators changed from \bVERB\b to \bVERB(?=\s|$) in flushCallAccum — prevents truncation on hyphenated identifiers like MOVE-COUNT, PERFORM-LIMIT. Total: 185 tests passing * test(cobol): add 32 comprehensive edge-case regression tests 13 new describe blocks covering all extraction features: - EXEC DLI: no-SEGMENT, multi-line accumulation (2 tests) - SET: multiple targets, DOWN BY, TO numeric (3 tests) - INSPECT: CONVERTING, multiple counters, tallying-replacing, paragraph flush during accumulation (4 tests) - DECLARATIVES: no-STANDARD keyword, I-O mode, post-END paragraphs (3) - COPY REPLACING: pseudotext deletion ==OLD== BY ==== (1 test) - VALUE: hex literal, negative numeric, ALL literal (3 tests) - OCCURS: TO range, fixed-size without DEPENDING ON (2 tests) - Dynamic CALL/CANCEL: end-of-line, multiple CANCELs (3 tests) - EXEC SQL: INCLUDE skips tables, SELECT INTO host vars, host variable extraction (3 tests) - INITIALIZE: target and caller context (1 test) - Nested programs: sibling scoping, PROGRAM-ID without ID DIV (2) - EXEC EOF flush: unclosed EXEC SQL flushed (1 test) - Multi-PERFORM: IF/ELSE dual PERFORM on single line (1 test) - IS EXTERNAL: USAGE not polluted by external flag (1 test) Total: 215 tests passing * fix(cobol): resolve 16th review — CANCEL in CALL block + USING boundary abhigyanpatwari#1: flushCallAccum now extracts CANCEL statements from within CALL ON EXCEPTION blocks. Adds RE_CANCEL + RE_CANCEL_DYNAMIC matchAll passes alongside existing CALL extraction. abhigyanpatwari#2: Added \bCANCEL(?=\s|$) to USING lookahead regex to prevent CANCEL keyword being captured as false USING parameter. abhigyanpatwari#3: Multi-line CALL start now returns immediately to prevent the CALL start line from simultaneously feeding sortAccum/inspectAccum. abhigyanpatwari#6: Division transitions now flush all active accumulators (callAccum, sortAccum, inspectAccum) to prevent state leakage across programs. Also added CANCEL to callAccum flush trigger verb list. Tests: 215 passing | TypeScript clean * refactor(cobol): extract shared verb constants + resolve 17th review Extract COBOL_STATEMENT_VERBS, RE_STATEMENT_VERB_START, and RE_USING_PARAMS as shared constants — eliminates 4 duplicated 25-verb regex patterns. 17th review: abhigyanpatwari#1 flushCallAccum before EXEC entry, abhigyanpatwari#2 inspectAccum verb parity via shared constant. Tests: 215 passing | TypeScript clean * test(cobol): replace all fuzzy assertions with exact toBe checks Replaced 7 toBeGreaterThan/toBeLessThan/toBeGreaterThanOrEqual assertions with exact toBe values: - dataItems.length: >= 3 → toBe(3) - calls.length: >= 1 → toBe(1) - calls[0].line: range check → toBe(10) - programs[].startLine/endLine: comparison → exact values - innerA.endLine/innerB.startLine: comparison → exact values Also added 11 new edge-case tests (accumulator flush on EXEC/division transitions, free-format, CANCEL in CALL block, SORT THRU, verb flush, integration). 226 tests passing — zero fuzzy assertions remain. * fix(cobol): resolve 19th review + 15 accumulator flush tests Fixes: abhigyanpatwari#1: END PROGRAM flushes callAccum/sortAccum/inspectAccum abhigyanpatwari#2: PROGRAM-ID sibling path flushes all accumulators abhigyanpatwari#3: Added COMPUTE/ADD/SUBTRACT/MULTIPLY/DIVIDE/STRING/UNSTRING to COBOL_STATEMENT_VERBS (now 32 verbs) Tests (15 new): - END PROGRAM flush: single + nested programs (2) - PROGRAM-ID sibling flush (1) - Arithmetic verb flush: COMPUTE/ADD/SUBTRACT/MULTIPLY/DIVIDE (5) - String verb flush: STRING/UNSTRING (2) - Arithmetic not captured as false USING params (1) - SORT flushed at END PROGRAM (1) - INSPECT flushed at END PROGRAM (1) - All with exact toBe assertions (2) Total: 239 tests passing | Zero fuzzy assertions * fix(cobol): resolve 20th review — INITIALIZE multi-target + 2 tests Finding 1: INITIALIZE now captures multiple targets with REPLACING clause keyword filtering. Regex changed to lazy match stopping at REPLACING/WITH/period boundary. Targets split on whitespace and filtered against INITIALIZE_CLAUSE_KEYWORDS set. Tests (2 new): - INITIALIZE multi-target: WS-CUSTOMER WS-ORDER WS-LINE-ITEM → 3 - INITIALIZE with REPLACING: only WS-RECORD captured, not keywords Total: 241 tests passing | TypeScript clean

vercel Bot deployed to Preview March 24, 2026 14:40 View deployment

docs: document custom processor pattern in pipeline.ts

88c89c4

Add comment block at the custom processor integration point documenting the pattern for future non-tree-sitter language additions.

vercel Bot deployed to Preview March 24, 2026 15:00 View deployment

magyargergo mentioned this pull request Mar 24, 2026

feat: add COBOL language support with regex extraction pipeline #359

Closed

6 tasks

vercel Bot deployed to Preview March 24, 2026 15:42 View deployment

vercel Bot deployed to Preview March 24, 2026 16:11 View deployment

Copilot AI mentioned this pull request Mar 24, 2026

fix(cobol): single-quote CALL/COPY, sequence number stripping, PERFORM keyword false-positives #500

Closed

Merge remote-tracking branch 'origin' into feat/cobol-language-support

49fd493

vercel Bot deployed to Preview March 25, 2026 06:59 View deployment

magyargergo added 9 commits March 25, 2026 07:14

vercel Bot deployed to Preview March 26, 2026 10:44 View deployment

magyargergo added 2 commits March 26, 2026 11:07

fix(cobol): resolve 15th review — callAccum Area A + verb boundary fixes

86a36e5

#A: Column-position-aware paragraph detection in callAccum flush. #B: inspectAccum early-flush on paragraph/section/verb headers. #C: Verb boundary \b → (?:\s|$) prevents MOVE-COUNT false flush.

vercel Bot deployed to Preview March 26, 2026 11:20 View deployment

vercel Bot deployed to Preview March 26, 2026 11:30 View deployment

vercel Bot deployed to Preview March 26, 2026 12:07 View deployment

vercel Bot deployed to Preview March 26, 2026 12:52 View deployment

vercel Bot deployed to Preview March 26, 2026 13:07 View deployment

vercel Bot deployed to Preview March 26, 2026 13:39 View deployment

vercel Bot deployed to Preview March 26, 2026 13:55 View deployment

magyargergo merged commit d2cd0b6 into main Mar 26, 2026
11 checks passed

magyargergo deleted the feat/cobol-language-support branch March 26, 2026 14:03

leoyamboW3 mentioned this pull request Mar 26, 2026

feat: Add Solidity language support #531

Open

1 task

Uh oh!

Conversation

magyargergo commented Mar 24, 2026

Summary

Architecture Decision

Changes

COBOL Extraction Produces

Test plan

Uh oh!

vercel Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Report

Pipeline Status

Test Results

Code Coverage

Tests

Uh oh!

magyargergo commented Mar 24, 2026

Uh oh!

Copilot AI commented Mar 24, 2026

Uh oh!

magyargergo commented Mar 24, 2026

Uh oh!

github-actions Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

COBOL Processor Completeness Review

Summary

🔴 Critical Bugs

🟡 Notable Gaps

🟢 Minor / Design Notes

Test Coverage Assessment

Uh oh!

magyargergo commented Mar 24, 2026

Uh oh!

github-actions Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comparing PR #498 vs PR #500

Uh oh!

github-actions Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

COBOL Processor — Fifteenth Deep Review (Compiler Front-End / Static Analysis Perspective)

14th Review — Verification Status

New Findings (This Review)

🔴 New #A — Last CALL USING Parameter Dropped When on Its Own Period-Terminated Line; False Paragraph in Free-Format Mode

🟡 New #B — inspectAccum State Stuck When Real Paragraph Header Arrives Mid-Accumulation

🟡 New #C — callAccum Flush Trigger Fires for Data Items Named with COBOL Reserved-Word Prefixes

🟢 New #D — Free-Format CALL USING Last Parameter: No Correct Termination Signal

Architectural Fit Assessment

Test Coverage Gap

Summary

Uh oh!

magyargergo commented Mar 26, 2026

Uh oh!

github-actions Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

COBOL Processor — Sixteenth Deep Review (Compiler Front-End / Static Analysis Perspective)

15th Review — Verification Status

New Findings (This Review)

🔴 New #1 — CANCEL Inside Multi-Line CALL ... ON EXCEPTION ... END-CALL Is Silently Lost

🔴 New #2 — CANCEL Keyword Falsely Captured as USING Parameter in Pre-COBOL-85 Style

🟡 New #3 — Multi-Line CALL Start Without Period Simultaneously Feeds sortAccum/inspectAccum

🟡 New #4 — In Free-Format Mode, Real Paragraph Headers Absorbed Into callAccum When CALL Lacks Terminator

🟢 New #5 — Paragraph endLine Calculation Crosses Nested Program Boundaries

🟢 New #6 — callAccum Not Flushed on Division/Section Transitions (Malformed COBOL Edge Case)

Test Coverage Assessment

Architectural Fit Assessment

Summary

Uh oh!

magyargergo commented Mar 26, 2026

Uh oh!

github-actions Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

18th Review — 17th Review Findings Resolved ✅

17th Review — Verified Fixed

Fix Details

Remaining Open Items

Uh oh!

vercel Bot commented Mar 24, 2026 •

edited

Loading

github-actions Bot commented Mar 24, 2026 •

edited

Loading

github-actions Bot commented Mar 24, 2026 •

edited

Loading

github-actions Bot commented Mar 24, 2026 •

edited

Loading

github-actions Bot commented Mar 26, 2026 •

edited

Loading

🟡 New #B — `inspectAccum` State Stuck When Real Paragraph Header Arrives Mid-Accumulation

🟡 New #C — `callAccum` Flush Trigger Fires for Data Items Named with COBOL Reserved-Word Prefixes

github-actions Bot commented Mar 26, 2026 •

edited

Loading

🔴 New #1 — `CANCEL` Inside Multi-Line `CALL ... ON EXCEPTION ... END-CALL` Is Silently Lost

🔴 New #2 — `CANCEL` Keyword Falsely Captured as USING Parameter in Pre-COBOL-85 Style

🟡 New #3 — Multi-Line CALL Start Without Period Simultaneously Feeds `sortAccum`/`inspectAccum`

🟡 New #4 — In Free-Format Mode, Real Paragraph Headers Absorbed Into `callAccum` When CALL Lacks Terminator

🟢 New #5 — Paragraph `endLine` Calculation Crosses Nested Program Boundaries

🟢 New #6 — `callAccum` Not Flushed on Division/Section Transitions (Malformed COBOL Edge Case)

github-actions Bot commented Mar 26, 2026 •

edited

Loading

github-actions Bot commented Mar 26, 2026 •

edited

Loading

19th Review — COBOL Processor Deep Analysis (commit `7e52f98`)

🔴 Finding 1 — `END PROGRAM` does not flush `callAccum` (HIGH)

🔴 Finding 2 — `PROGRAM-ID` detection path does not flush `callAccum` (HIGH)

🟡 Finding 3 — Arithmetic verbs absent from `COBOL_STATEMENT_VERBS` (MEDIUM)

🟢 Finding 4 — Section `endLine` crosses nested program boundaries (LOW / cosmetic)

github-actions Bot commented Mar 26, 2026 •

edited

Loading

Finding 1 — `RE_INITIALIZE` captures only the first target identifier 🟡 MEDIUM

Finding 2 — COPY detection runs before `callAccum` continuation guard 🟢 LOW

Finding 3 — Single-line CALL + active `sortAccum` contamination 🟢 LOW

Finding 4 — `CopyResolution.library` is captured but never consumed 🟢 LOW (dead code)