Skip to content

fix(parse): survive non-cloneable worker results so large-repo analyze doesn't crash (#2112)#2135

Merged
magyargergo merged 37 commits into
abhigyanpatwari:mainfrom
magyargergo:fix/2112-large-repo-degradation
Jun 10, 2026
Merged

fix(parse): survive non-cloneable worker results so large-repo analyze doesn't crash (#2112)#2135
magyargergo merged 37 commits into
abhigyanpatwari:mainfrom
magyargergo:fix/2112-large-repo-degradation

Conversation

@magyargergo

Copy link
Copy Markdown
Collaborator

Problem (#2112)

gitnexus analyze aborts on large repos of vendored/generated C++ (ExecutorTorch, TensorFlow Lite). One of the three failure modes in the issue is a deterministic crash that defeats even the conservative GITNEXUS_WORKER_POOL_SIZE=1 workaround:

DataCloneError: function toString() { [native code] } could not be cloned.
    at MessagePort.<anonymous> (.../parse-worker.js:1913:24)

A parse worker delivers its accumulated result via postMessage, which structured-clones the payload synchronously on the worker thread and throws on the first value it can't serialize (here, a node properties value pointing at a native toString). The worker re-posts the throw as {type:'error'}; the pool counts it as a worker death; and under POOL_SIZE=1 the same graph re-throws on every respawn until the slot's budget is exhausted and the whole parse phase aborts.

This was confirmed empirically — structuredClone({ toString: Object.prototype.toString }) reproduces the exact issue string. (It is not the Scope Maps — those clone fine — and not a leaked Error — those clone fine. Because production analyze runs in store-mode, where parsedFiles are emptied before the post and the shard write's JSON.stringify silently drops functions, the leak rides in a plain-record array, not captureSideChannel.)

What this PR does

Adds a clone-safety net at the worker result boundary (new workers/clone-safety.ts). On a clone failure the worker:

  • isolates the offending file (per-element structuredClone probe — clean arrays keep referential identity, zero copy),
  • strips the non-cloneable value from a plain extraction record (keeping the record — strictly-missing data, never wrong), or drops a whole ParsedFile so scope-resolution re-derives it on the main thread with intact edge data,
  • records the affected paths on result.skippedPaths, warns naming the field + file so the still-unpinned leak is diagnosable from logs and fixable at source, and re-posts.

Healthy runs are byte-identical — the net runs only after a real DataCloneError, so there's zero overhead on the fast path. The strip drops the same values the store path's JSON.stringify already silently removes, so store/no-store runs converge. Skipped paths surface via the parsing processor alongside the existing skipped-language telemetry.

The worker-pool resilience machinery is untouched (doc-comment corrections only — two stale comments wrongly claimed clone failures route to messageerror; the failure is a sender-side synchronous throw). U1 sanitizes at the source, so the pool's death/respawn path is never reached.

Scope

PR-1 — failure mode C, the deterministic POOL_SIZE=1 killer. The timeout/native-abort graceful-degradation cascade (failure modes A & B) is coupled to downstream re-parse exclusion + a hard per-file worker watchdog and is tracked as follow-up work (it would otherwise just relocate the hang/native-abort to the main thread).

Test plan

  • test/unit/clone-safety.test.ts (10 cases): the exact function toString() { [native code] } repro; the refuted Map case; strip-and-keep vs drop-and-reparse; referential identity on healthy results.
  • test/integration/parse-impl-clone-skip.test.ts: real worker, POOL_SIZE=1 — GREEN (a non-cloneable result is sanitized and the run completes with all files) + RED control (the same raw result aborts the phase). The test worker imports the built helper and applies the exact production pattern, exercising the real clone boundary the fake-worker doubles can't.
  • Regression: worker-pool unit suites (50/50), quarantine-cache-skip integration, tsc --noEmit, prettier — all green.

🤖 Generated with Claude Code

…e doesn't crash (abhigyanpatwari#2112)

A parse worker delivers its accumulated result to the main thread via
postMessage, which structured-clones the payload synchronously on the
worker thread and throws a DataCloneError on the first value it can't
serialize. The reporter's case was a node `properties` value pointing at
a native `toString`. The worker re-posted the throw as {type:'error'},
the pool counted it as a worker death, and under
GITNEXUS_WORKER_POOL_SIZE=1 the same graph re-threw on every respawn
until the slot's budget was exhausted and the whole parse phase aborted
-- defeating even the conservative single-worker workaround.

Add a clone-safety net at the worker result boundary. On a clone failure
the worker isolates the offending file, strips the non-cloneable value
from a plain extraction record (keeping the record -- strictly-missing
data, never wrong) or drops a whole ParsedFile so scope-resolution
re-derives it on the main thread with intact edge data, records the
affected paths on the result, warns naming the field + file so the leak
is diagnosable, and re-posts. Healthy runs are byte-identical: the net
runs only after a real DataCloneError, so there is zero overhead on the
fast path. Skipped paths surface via the parsing processor alongside the
skipped-language telemetry. The strip drops the same values the store
path's JSON.stringify already silently removes, so store/no-store runs
converge.

Scope: PR-1 -- failure mode C, the deterministic POOL_SIZE=1 killer. The
timeout/native-abort graceful-degradation cascade (failure modes A & B)
is coupled to downstream-exclusion + a hard worker watchdog and is
tracked as follow-up work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

@magyargergo is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
11034 11018 0 16 578s

✅ All 11018 tests passed

16 test(s) skipped — expand for details
  • COBOL pipeline benchmark > scales with file count
  • C++ ADL emit benchmark > emit phase scales sub-quadratically with co-scaled files and sites
  • C++ pipeline benchmark > scales with file count
  • C# pipeline benchmark > scales with file count — namespaces spread across the solution
  • C# pipeline benchmark > scales with file count — all types in one (global) namespace bucket
  • C# pipeline benchmark > scales with file count — all types in one (named) namespace bucket
  • Go pipeline benchmark > scales with file count (workers enabled)
  • Go pipeline benchmark — worker pool (issue Worker idle timeout kills long Go scope extraction and surfaces as Napi::Error during analyze #1848) > does not quarantine the large generated Go file on sub-batch idle timeout
  • Go structural interface detection benchmark > scales linearly with interface × struct count
  • Go structural interface detection split-phase benchmark > separates index-build and detection time
  • PHP pipeline benchmark > scales with file count (workers enabled)
  • Ruby pipeline benchmark > scales with file count (workers enabled)
  • Rust pipeline benchmark > scales with file count (workers enabled)
  • Vue pipeline benchmark > scales with component count
  • run.cjs direct-exec entrypoint (fix(cli): steer docs, skills, and hooks through a CLI-neutral project-local runner (#1939) #1945) > resolves a .cmd shim via the Windows shell branch, passing args and exit code
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 75.35% 36129/47945 N/A% 🟢 ███████████████░░░░░
Branches 63.15% 22388/35451 N/A% 🟢 ████████████░░░░░░░░
Functions 80.97% 3885/4798 N/A% 🟢 ████████████████░░░░
Lines 79.12% 32655/41268 N/A% 🟢 ███████████████░░░░░

📋 View full run · Generated by CI

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tri-review: 3 methods — GitNexus swarm + Compound-Engineering personas + Codex

Engine breakdown: 6 Claude reviewer lanes (risk, test/CI, correctness, adversarial, maintainability, performance) + Codex (live — the one genuinely independent engine). Two of the three methods are Claude under different persona prompts, so Claude-only agreement is "consistent across personas," not independent confirmation; the strong signal is Codex+Claude agreement, flagged below.

What's solid (validated by the review, not merely unchallenged)

  • Healthy path is zero-overhead — all lanes + Codex confirm the fast path is a single postMessage with an early return; the sanitizer runs only inside the DataCloneError catch.
  • worker-pool.ts is genuinely comments-only — risk + correctness + Codex each diffed it; no logic change (two stale comments corrected).
  • The fix neutralizes the #2112 deterministic POOL_SIZE=1 killer — the DataCloneError is intercepted before the worker re-posts {type:'error'}; the integration GREEN/RED pair demonstrates it end-to-end.
  • skippedPaths is backward-compatible (optional, ?? []-guarded everywhere) and structured-clone reject-set parity holds for the real ParseWorkerResult shapes — the correctness lane probed functions/symbols/Promises/WeakMaps/data-class-instances/BigInt/boxed-primitives/null-proto/ArrayBuffer/getters empirically. The earlier "Maps don't clone" hypothesis stays refuted — Maps clone fine and are left untouched.

Headline (inline) — [P2 · defensive hardening] the safety net can re-arm the cascade it fixes

In postResultCloneSafe the recovery call makeWorkerResultCloneSafe(...) and the re-post sit outside the try/catch (which wraps only the first post), and containsNonCloneable/stripNonCloneable recurse with a cycle guard but no depth bound. A throw inside the sanitizer — a RangeError from a deeply-nested record (the adversarial lane reproduced this at depth ≥3000 against the real exported helper) or a throwing getter — escapes to the handler's {type:'error'}, which under POOL_SIZE=1 is exactly the worker-death → respawn-exhaustion → phase-abort cascade this PR fixes. Reachability is low and both engines agree (Codex + correctness: real parse-result records are shallow plain-JSON, no getters), so this is hardening, not a production-likely trigger. Cheap fix: wrap the sanitizer + re-post in try/catch (fail closed deliberately) and bound the recursion depth.

Inline — [P3] integration-test fidelity (Codex + 2 Claude lanes)

The GREEN worker re-implements postResultCloneSafe inline (it imports the real helper functions but hand-rolls the orchestration), so the production wiring — the {type:'warning'} post and the skippedPaths append/merge — has no integration coverage. The RED control asserts a bare .rejects.toThrow(), so an unrelated failure would satisfy it. (CI builds dist/ via setup-gitnexus build:'true' before vitest run, so the "stale dist → false green" risk is mitigated in CI; it bites only local runs that skip the build.)

Lower-priority (body only)

  • [P3] stripNonCloneable over-drops DAG-aliased records (Claude-only): the shared seen set returns the original on revisit, so when a non-cloneable is reachable via two paths the whole record is dropped (reported as skipped) instead of stripped-and-kept — contradicting the "record kept" intent. The last-resort guard prevents a crash. Fix: memoize the stripped copy per object.
  • Failure-path efficiency (Claude-only, severity disputed): the performance lane flags a transient ~2× memory spike (containsNonCloneable structuredClones a whole non-plain subtree to probe it — on a path that fires exactly when the worker is near its heap limit), plus a 2× per-element scan and a scan of all ~15 array fields regardless of which is dirty. The adversarial lane measured it as bounded (~185 ms / 1000 records, once per flush) and refuted it as a DoS. In store-mode the non-plain Scope Maps live in parsedFiles, which are emptied before the post, so the worst case is mostly the no-store path. A single-pass + shared-seen cleanup is worthwhile but not blocking.
  • [P2 type-safety] maintainability: the result as unknown as Record<string,unknown> double-cast and the untyped Set(['parsedFiles']) / Set(['skippedPaths']) field-name literals would silently break the drop-whole protection if a field is renamed — pin them with satisfies keyof ParseWorkerResult.
  • [P3] parsing-processor logs the skipped paths a second time but drops the per-file reason the worker already logged; [P3] findFilePath can attribute to a sibling child's path (telemetry-only); [P3] skippedPaths isn't zeroed in slimParseWorkerResultsForCache → minor cache-shard bloat (ignored on replay, no correctness impact).

Test gaps

No coverage for: a deeply-nested / throwing-getter / detached-buffer element (the slow-path escape); a DAG-aliased element (the over-drop); the {type:'warning'} + skippedPaths production wiring; the last-resort "unsalvageable → drop" branch.

CI

Quality (typecheck/lint/format/typecheck-web), CodeQL, gitleaks, benchmarks, packaged-install-smoke (ubuntu), tree-sitter ABI — pass. Pending at review time: Analyze (js-ts), Build & Push, windows/macos platform tests, ubuntu coverage (runs the new unit+integration tests). Vercel fail = deploy-authorization, unrelated to code.


Automated multi-tool digest (GitNexus swarm + CE personas + Codex), critic-gated. Verify before acting — the headline is a defensive-hardening recommendation with low production reachability, not a known production break.

Comment thread gitnexus/src/core/ingestion/workers/parse-worker.ts Outdated
Comment thread gitnexus/test/integration/parse-impl-clone-skip.test.ts Outdated
magyargergo and others added 11 commits June 10, 2026 08:11
…abhigyanpatwari#2135 review)

The clone-safety recovery path could re-arm the abhigyanpatwari#2112 worker-death cascade it
was built to prevent: in postResultCloneSafe the sanitizer call and the re-post
sat outside the try/catch, and containsNonCloneable/stripNonCloneable recursed
with a cycle guard but no depth bound. A throw inside the sanitizer (a RangeError
from a deeply-nested record, reproduced at depth >=3000) escaped to the message
handler's {type:'error'}, which under GITNEXUS_WORKER_POOL_SIZE=1 is the
respawn-budget-exhaustion abort.

Wrap the sanitizer + re-post in their own try/catch so any throw fails closed to
a primitive-only {type:'error'} deliberately, and thread a MAX_CLONE_DEPTH bound
through both scan/strip functions so an over-deep subtree is treated as
non-cloneable (dropped/undefined) instead of overflowing the stack. The
isStructuredCloneable catch-all is left broad on purpose — it bounds
structuredClone's own internal recursion in the non-plain-object probe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… buffers (abhigyanpatwari#2135 review)

Two sanitizer-defeat vectors let the re-post throw a DataCloneError again:

- A throwing getter on a record: containsNonCloneable/stripNonCloneable read
  obj[key], so a getter that throws escaped the scan/strip pass. Read defensively
  — a throwing property read is treated as non-cloneable (scan returns true,
  strip drops the property).
- A detached ArrayBuffer/TypedArray: both passed buffers/views through
  unconditionally, but structuredClone rejects a detached one, so the re-post
  threw. Route buffers/views through the authoritative isStructuredCloneable
  probe instead. No byteLength heuristic — a legitimately empty new Uint8Array(0)
  also has byteLength 0 yet clones fine, so a length check would false-positive.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r-dropped (abhigyanpatwari#2135 review)

stripNonCloneable carried a shared `seen` WeakSet and returned the ORIGINAL
(un-stripped) value on revisit. When a non-cloneable was reachable via two paths
(a DAG), the second path spliced the original function-bearing object back into
the output, so the rebuilt element failed the last-resort isStructuredCloneable
guard and the whole record was dropped as "unsalvageable" — contradicting the
"record kept, value stripped" contract.

Replace the WeakSet with a Map<object, stripped-copy>: allocate the empty copy,
memoize it before recursing into children (so cycles return the in-progress
copy), and return the memoized copy on revisit. DAG-aliased subtrees now collapse
to one shared stripped copy and are kept-and-stripped, not dropped. The array
branch moves from .map() to allocate-then-push so its identity can be
pre-inserted. Object Map/Set keys aren't identity-preserved across stripping —
acceptable because parse-result Maps are primitive-keyed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bhigyanpatwari#2135 review)

makeWorkerResultCloneSafe scanned each dirty array twice — a field-level
whole-array containsNonCloneable probe, then a per-element pass — and always
reassigned the field. Fold into one per-element pass that builds the output
array lazily (copying the clean prefix only once the first dirty element
appears) and reassigns the field only when something changed. A fully-clean
array is now scanned once and keeps its referential identity; the clean prefix
of a dirty array is copied by reference. Behavior is otherwise identical
(failure-path-only code).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…keyof (abhigyanpatwari#2135 review)

makeWorkerResultCloneSafe carried a generic `<T extends Record<string,unknown>>`
that was never load-bearing (it mutates in place and returns {skipped}), and the
call site passed untyped string-literal option sets — so renaming `parsedFiles`
or `skippedPaths` would silently disable the drop-whole / skip protection.

Drop the generic (plain `Record<string,unknown>` param) and type the option sets
at the call site as `Set<keyof ParseWorkerResult>`, so a field rename is now a
compile error. The `as unknown as Record<string,unknown>` widening stays — it's
the standard cast for a no-index-signature interface (TS rejects a single-step
`as`); the function genuinely operates structurally on the result's arrays.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…higyanpatwari#2135 review)

The processor's skipped-file warning logged only the paths, dropping the
per-file reason the worker already attached — losing the distinction between a
recoverable "stripped N value(s)" and a whole-record "dropped" entry. Format each
entry as `path (reason)` so the aggregate line carries the diagnostic detail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…higyanpatwari#2135 review)

findFilePath swept all child objects one level deep in Object.keys order, so a
ParsedNode could be attributed to a sibling child's path-like key instead of its
real path at properties.filePath. Check the known `properties` child first, then
fall back to the generic sweep, so node attribution is deterministic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ri#2135 review)

slimParseWorkerResultsForCache spread the worker result without clearing the
clone-safety skippedPaths telemetry, so a sanitized result persisted its skip
list into the on-disk parse-cache shard. Replay already ignores the field; zero
it (like calls/assignments/parsedFiles) to keep shards lean and the intent
explicit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ontrol (abhigyanpatwari#2135 review)

The integration GREEN worker re-implemented postResultCloneSafe inline, so the
production wiring (the {type:'warning'} post + the skippedPaths append) had no
coverage, and the RED control asserted a bare .rejects.toThrow() that any
failure would satisfy.

Extract postResultCloneSafe into a side-effect-free module (post-result.ts) —
importing it from the parse-worker entry module would construct the parser, post
ready, and attach the real handler — and have the GREEN test worker import and
call the real one. Tighten the RED matcher to the actual abort contract
(/circuit breaker|consecutive failures|respawn budget|could not be cloned/),
which also documents that the raw poison result aborts via the pool's
consecutive-failure circuit breaker under POOL_SIZE=1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@magyargergo

Copy link
Copy Markdown
Collaborator Author

Tri-review findings addressed (9 commits)

All findings from the tri-review are fixed, one commit each (rebased onto the updated branch):

Commit Finding
14f79c32 [P2] fail-closed recovery (try/catch around the sanitizer + re-post) + MAX_CLONE_DEPTH bound, so the net can't re-arm the cascade it prevents
45c9f34b [P3] harden against throwing getters + detached buffers (probe, no byteLength heuristic)
fb8093e5 [P3] memoize stripped copies (Map<object,copy>) so DAG-aliased records are stripped-and-kept, not over-dropped
9c440b95 [perf] single-pass scan preserving array referential identity
873938a0 [P2] drop the unused generic + pin field-name sets to keyof ParseWorkerResult (rename → compile error)
04bbe0cb [P3] processor log keeps the per-file reason
1840f1af [P3] findFilePath prefers properties.filePath over the sibling sweep
c3f9c4af [P3] zero skippedPaths in the slim cache result
bc4a0a0b [P3] extract postResultCloneSafe → side-effect-free module; integration test calls the real one; RED matcher tightened to the contract

Notes worth surfacing:

  • The deepening pass caught three would-be bugs before coding: the byteLength === 0 detached-buffer check false-positives on an empty live Uint8Array(0) (dropped it for the probe), the memoization array branch couldn't use .map() (would infinite-loop on a cyclic array), and the isStructuredCloneable catch must stay broad.
  • TS rejects a single-step as Record<string,unknown> for a no-index-signature interface, so the as unknown as widening stayed — the real win (compile-checked field names) is intact.

Verified locally: tsc clean · clone-safety unit 20/20 · worker-pool suites 60/60 · integration (clone-skip + quarantine) 4/4 · prettier clean.

🤖 Generated with Claude Code

magyargergo and others added 6 commits June 10, 2026 09:10
…nly DataCloneError (abhigyanpatwari#2135 review)

The V8 structured-clone research surfaced the net's one real correctness hole:
structuredClone invokes getters, and a getter that THROWS surfaces its own error
(a RangeError, etc.) — NOT a DataCloneError (confirmed against a real
MessageChannel). postResultCloneSafe gated recovery on isDataCloneError, so such
a throw re-threw past the sanitizer and re-armed, under POOL_SIZE=1, the
worker-death cascade the net exists to prevent.

Attempt the sanitize + re-post recovery for ANY first-post failure (the sanitizer
already reads properties defensively, so a throwing getter is dropped), falling
closed to a primitive-only {type:'error'} only if the re-post still fails. Adds
an integration case: a node with a throwing getter is recovered and delivered,
not re-thrown.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nostic (abhigyanpatwari#2135 review)

The clone-safety net's skip reason named only the array field + file ("stripped
1 value from nodes"), not the offending property key — which is precisely why
the original abhigyanpatwari#2112 leak stayed unpinned. Thread a dotted key path through
stripNonCloneable (recording each stripped value's path: properties.toString,
meta.data[3], …) and surface the first few in the reason ("from nodes:
properties.toString"). Now a single log line — or the contract/strict checks —
names the leaking property, so a residual runtime escape can be fixed at source.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tructured-cloneable (abhigyanpatwari#2135 review)

Shape-regression guard: builds a representative ParseWorkerResult (typed as the
real interface) and asserts isStructuredCloneable. Typing it as ParseWorkerResult
makes adding a new boundary field a compile error here until the test is updated,
and the runtime assert catches a field whose type regresses to a non-cloneable
shape — independent of language input.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…udly instead of silent sanitize (abhigyanpatwari#2135 review)

The runtime net's silent recovery in production is exactly what let the original
abhigyanpatwari#2112 leak stay unpinned. Add an opt-in strict mode (GITNEXUS_STRICT_CLONE=1,
inherited by workers): on a clone failure, postResultCloneSafe THROWS with the
exact offending key path instead of sanitizing + delivering, so a leak
introduced by a future provider/extractor change fails loudly at its origin
(CI/dev) rather than being quietly stripped. Off in production, where the net
keeps the run alive.

Adds a self-contained integration case (sets the flag, asserts the poison run
rejects with the key path) and skips the synthetic-poison suite under a global
strict run (its value there is running the REAL-extractor integration tests
under strict). Wiring a strict CI lane (GITNEXUS_STRICT_CLONE=1 on a vitest
integration step) is left to the maintainer — it needs a green full-suite
verification and touches the protected workflow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@magyargergo

Copy link
Copy Markdown
Collaborator Author

Root-cause deep-dive + prevention layer

A deep investigation of the V8/Node structured-clone serializer (the postMessage path), cross-checked empirically in this Node, established the authoritative non-cloneable rule and narrowed our exposure precisely:

  • Why it happens: the serializer recurses only own-enumerable, string-keyed properties via [[Get]]; any callable value (incl. native fns), Symbol, Promise/WeakMap/Proxy/generator, or a tree-sitter SyntaxNode (its own-enumerable tree.getText is the exact throw — confirmed live) aborts the clone.
  • Our exposure: the entire boundary surface is plain data except three unknown sinks (ParsedNode.properties index signature + the two provider hooks), and zero SyntaxNode-typed interface fields. The hooks' only impls are provably node-free today — so the risk is the untyped holes, not a current code path.

Added to this PR (the runtime guarantee)

  • c33e0455net-gap fix: the net's one real correctness hole — a throwing getter's RangeError (not a DataCloneError) re-threw past the sanitizer. Recovery now runs on any post failure (getter case added).
  • 43274216key-path diagnostic: the skip reason now names the exact offending property (nodes: properties.toString), so a residual escape is locatable from one log line.
  • baa2382bclone contract test: a representative ParseWorkerResult must be structured-cloneable (shape-regression guard + compile-nudge on new fields).
  • 1c0a4256strict-mode gate (GITNEXUS_STRICT_CLONE=1): throws with the key path instead of silently sanitizing, so a future extractor leak fails loudly at its origin. (Wiring a strict CI lane is left to the maintainer — it needs a green full-suite check and touches the protected workflow.)

Deferred (compile-time complement) → #2143

The typed Cloneable boundary (making a leak a compile error) hits a TypeScript wall — closed payload interfaces aren't assignable to a recursive index-signature type without index-signature changes across several interfaces, some in gitnexus-shared. Tracked in #2143.

The runtime net stays as the production backstop; strict mode + the contract test give the CI/test guarantee that a silent recurrence can't slip through unnoticed.

🤖 Generated with Claude Code

magyargergo and others added 5 commits June 10, 2026 10:25
…boundary (abhigyanpatwari#2112)

The forked analyze worker reports completion to the parent over
child_process IPC, which uses Node's DEFAULT 'json' serialization
(api.ts forks with no `serialization:` option). `AnalyzeResult.pipelineResult`
is populated on every successful analysis and carries `pipelineResult.graph`
— the live KnowledgeGraph closure object. Sending the raw result is wrong
three ways: (1) the graph's nodes/relationships getters force-materialize
the entire graph into two arrays and JSON-stringify them on every analyze,
discarded immediately (a multi-hundred-MB no-op on a large repo — the abhigyanpatwari#2112
scenario); (2) the graph's methods are own function properties that JSON
drops silently, so a surviving graph is a husk whose forEachNode() throws far
from the cause; (3) a BigInt/circular value anywhere in the payload makes
process.send throw TypeError synchronously — caught and re-sent as
{type:'error'}, mis-reporting a SUCCESSFUL analysis (DB already written) as a
FAILURE. This is the abhigyanpatwari#2112 failure family on the server path, and unlike the
parse-worker result boundary it has no clone-safety net.

The parent (api.ts) reads only result.repoName; pipelineResult's real
consumers (CLI skill generation, cli/analyze.ts) call runFullAnalysis
in-process and never cross this fork. So project the result down to an
explicit JSON-safe allowlist of scalar fields. Typed as
Omit<AnalyzeResult,'pipelineResult'> so a future non-serializable field added
to AnalyzeResult fails to compile until handled here deliberately.

Found by the abhigyanpatwari#2112 cross-process serialization-boundary audit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…boundary guard (abhigyanpatwari#2143)

The runtime clone-safety net is the production backstop; this is its
compile-time complement. The worker result is plain data except a few
`unknown`-typed sinks (a node's `properties` bag, the provider
`extractTemplateConstraints` / `collectCaptureSideChannel` hook returns) —
`unknown` lets a non-serializable value (a function, a leaked tree-sitter
SyntaxNode, …) cross the structured-clone boundary with no compile-time
guard. That is the structural hole abhigyanpatwari#2112 leaked through.

`Cloneable<T>` is a homomorphic recursive mapped type that maps a function or
symbol member to `never`, so a struct carrying one is no longer assignable to
its own `Cloneable<T>`. `assertCloneable(value)` is a runtime identity (zero
cost) whose parameter is `T extends Cloneable<T> ? T : Cloneable<T>`, so a
clone-unsafe argument fails to compile, naming the offending key.

Because it is a homomorphic mapped type it preserves `interface` shapes and
`readonly` modifiers and needs NO index signature on the payload types — this
sidesteps the "closed interface is not assignable to a recursive
index-signature type" wall that blocked the original value-typed-`Cloneable`
attempt (the reason abhigyanpatwari#2143 was deferred from PR abhigyanpatwari#2135). The conditional
parameter type avoids the `T extends Cloneable<T>` circular-constraint error.

Tests: runtime identity contract, plus type-level @ts-expect-error assertions
(enforced by tsconfig.test.json) that a function/symbol member is rejected and
clean interface payloads are accepted. Applied to the real provider hooks in
the next commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…able (abhigyanpatwari#2143)

Apply the compile-time guard to the provider hooks that feed the `unknown`-typed
worker-result sinks, so a future non-serializable value in their payloads is a
compile error at the source site rather than a runtime DataCloneError at the
worker post:

- C++  extractTemplateConstraints  (CppConstraintPayload)
- C++  collectCaptureSideChannel    (CppCaptureSideChannel)
- C    collectCaptureSideChannel    (CCaptureSideChannel)
- Kotlin collectCaptureSideChannel  (KotlinCaptureSideChannel)

The C++ template-constraint adapter previously returned `unknown`; it now
returns the concrete `CppConstraintPayload | undefined` and routes its payload
through `assertCloneable`. The side-channel hooks are wrapped at their provider
wiring sites. `assertCloneable` is a runtime identity, so behavior is unchanged
(C static-linkage + C++ constraint suites stay green); the guarantee is the
type-check — src tsc now proves every nested member of those real payload trees
is structured-clone safe.

Test: type-level assertions (enforced by tsconfig.test.json) that each concrete
payload type is `Cloneable<T>`, INDEPENDENT of the provider wiring — so the
regression is caught even if the assertCloneable wrapper is later removed.
Proven non-vacuous (a function-bearing type fails the same assertion).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

✨ PR Autofix

Found fixable formatting / unused-import issues across 28 changed lines. Comment /autofix on this PR to apply them, or run npm run lint:fix && npm run format locally.

{"schema":"gitnexus.pr-autofix/v2","state":"fixes-available","pr_number":2135,"changed_lines":28,"head_sha":"106bb540e0510ee30a1eb68134c80bfdb057fb63","run_id":"27271736369","apply_command":"/autofix"}

Comment thread gitnexus/test/unit/clone-safety-payloads.test.ts Fixed
Comment thread gitnexus/test/unit/clone-safety-payloads.test.ts Fixed

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tri-review (dismiss-and-replace): clone-safety + #2143 + analyze-worker IPC

Methods — 3, with engine breakdown: 7 Claude lanes (GitNexus swarm: risk, test/CI · CE personas: correctness, adversarial, performance, maintainability, testing) + Codex (live — the only independent engine). Two Claude lanes share priors, so Codex+Claude agreement is weighted strong and Claude-only as "consistent across personas." This review supersedes the prior tri-review (its two findings — recovery-path re-arm and test fidelity — were addressed in this PR's review-fix commits; those stale inline comments are minimized).

Verdict: the runtime clone-safety net and the #2143 compile-time guard are sound and well-tested for every payload the parse pipeline actually produces. Codex found no P0–P2 defect; adversarial+correctness refuted the scary hypotheses. The findings below are defense-in-depth holes for value shapes that don't occur today (P2/P3) plus one real CI failure. No P0/P1.

Credit (validated, not merely unchallenged):

  • Fast-path is genuine zero-overheadpostResultCloneSafe posts as-is and returns; the sanitizer runs only after a real post failure (performance lane, confirmed).
  • The analyze-worker IPC projection drops the right field — Codex + correctness confirmed the parent (api.ts:1560) reads only result.repoName; omitting pipelineResult (the live KnowledgeGraph) is correct and a perf win.
  • Refuted (validation is a feature): general sanitizer divergence on non-plain objects (both paths defer to the authoritative structuredClone probe), detached-buffer false-negatives, BigInt/circular in stats (all Number()-wrapped), boxed primitives / SharedArrayBuffer views, probe-induced OOM/hang, deep/cyclic/DAG over-drop, strict-mode double-throw, and transferList detached-buffer reuse.

Headline findings (inline)

  1. [P2] analyze-worker-ipc.ts — type/runtime contract split; the Omit<> guarantee is overstated. (3 lanes · code-read) The type AnalyzeResultIpc = Omit<AnalyzeResult,'pipelineResult'> (:42) includes isPrimaryBranch?, but the runtime projectAnalyzeResultForIpc (:54-61) is an explicit allowlist that omits it — so the type claims a field the wire never carries. The doc's "a future field fails to compile until handled here" holds only for required fields; an optional one (the common case for AnalyzeResult) is silently absent. No live break (parent reads only repoName; isPrimaryBranch?:boolean at run-analyze.ts:200 isn't consumed server-side over IPC). Fix: Pick<AnalyzeResult, 'repoName'|'repoPath'|'stats'|'alreadyUpToDate'|'ftsRepairedOnly'|'ftsSkipped'> makes the allowlist the type — exhaustive by construction.
  2. [P3] clone-safety.ts — the array branch ignores non-index own-enumerable properties. (correctness · reproduced) structuredClone of an array carrying e.g. arr.meta = fn THROWS, but containsNonCloneable/stripNonCloneable iterate numeric indices only (:169-173 / :279-284). So containsNonCloneable returns false, makeWorkerResultCloneSafe leaves the field unrewritten (skipped:[]), the re-post still throws, falls through to the outer catch (post-result.ts:87), and emits {type:'error'} — the same worker-death signal that arms the POOL_SIZE=1 cascade #2112 fixes. Not pipeline-reachable today (no producer attaches non-index array props), but a hole in a net whose purpose is the unanticipated leak. Fix: also scan/copy non-index Object.keys, or fall back to the isStructuredCloneable probe when an array's own-key set exceeds its indices.
  3. [P3] clone-safety.ts findFilePath violates its documented "never throws"; a sanitizer-internal throw escapes recovery. (correctness + adversarial · reproduced) findFilePath (:347 doc) reads rec[key] unguarded in its generic sweep (:353, :361-363; pathFromChild :338-345). A throwing getter at a non-path key (reproduced: makeWorkerResultCloneSafe throws RangeError out) — or a Proxy with a throwing getPrototypeOf/ownKeys trap (adversarial) — escapes to the fail-closed {type:'error'}, mis-reporting a recoverable result as a worker death. Not pipeline-reachable (real records aren't Proxies/throwing getters), but the "recover from ANY failure" promise doesn't cover throws during the sanitizer's own structural enumeration. Fix: try/catch findFilePath's reads (honor its doc) and wrap each element's sanitize in makeWorkerResultCloneSafe to drop-on-throw.

Lower-priority (body only)

  • Maintainability: isDataCloneError (clone-safety.ts:58) is now a dead export — the net recovers on ANY post failure and never inspects the error type; only clone-safety.test.ts imports it (git grep confirms no prod caller). makeWorkerResultCloneSafe's JSDoc still says "Call this ONLY after a real DataCloneError" though the caller is deliberately broader. parsing-processor.ts inlines {path,reason} instead of importing the exported SkippedPath. All trivial.
  • Forward-looking: makeWorkerResultCloneSafe sanitizes only array fields (:401) — a future non-array result sink would bypass the net; Cloneable<any> false-accepts (an any-typed member; add an IsAny branch); the extractTemplateConstraints/collectCaptureSideChannel hook contracts in language-provider.ts don't document the assertCloneable requirement for a future language implementer.
  • Performance: the failure path traverses each dirty element twice (containsNonCloneable then stripNonCloneable). Failure-path-only; the pre-scan is what preserves clean-element referential identity, so it's a deliberate tradeoff — notable only if the failure path becomes hot.
  • Tests: the GREEN integration test asserts graph content but not that skippedPaths/the {type:'warning'} post actually surfaced (its docstring claims it covers that wiring — dispatchChunkParse only logs them); the IPC test uses a synthetic object, not a real KnowledgeGraph; the GITNEXUS_STRICT_CLONE=1 gate exists but no CI job runs the suite under it, so it gives no continuous protection; the "dropped unsalvageable" branch and the mergeResult skippedPaths union are untested.

CI

  • quality / format (root prettier) FAILS on c-cpp.ts + kotlin.ts (the #2143 wrapping lines) — a real merge-blocker, fixable with prettier --write. CI Gate (2s) is the aggregate gating on it. Vercel fails on deploy authorization (not code). Everything substantive passes: typecheck, lint, all test matrices (ubuntu/windows/macOS coverage), tree-sitter ABI, CodeQL, packaged-install smoke, benchmarks.

Automated multi-tool digest (GitNexus swarm + CE personas + live Codex), critic-gated. All findings P2/P3 — none live-reachable from current parse output; verify before acting.

Comment thread gitnexus/src/server/analyze-worker-ipc.ts
// yet clones fine, so a length check would false-positive.
if (obj instanceof ArrayBuffer || ArrayBuffer.isView(obj)) return !isStructuredCloneable(obj);
seen.add(obj);
if (Array.isArray(obj)) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3 · defense-in-depth hole] (correctness · reproduced) This array branch (and stripNonCloneable at line 279) iterates numeric indices only, but structuredClone of an array also serializes non-index own-enumerable properties and throws on a non-cloneable one. Reproduced: with arr.meta = ()=>{}, structuredClone throws, yet makeWorkerResultCloneSafe returns skipped:[] and leaves the field unrewritten — the re-post then throws, falls through to post-result.ts:87, and emits {type:'error'}, the worker-death signal that arms the POOL_SIZE=1 cascade. Not produced by any current extractor, but it's a silent gap in a net meant for unanticipated leaks. Fix: scan/copy non-index Object.keys here and at line 279, or fall back to the isStructuredCloneable probe when an array's own-key set exceeds its indices. [reproduced]

Comment thread gitnexus/src/core/ingestion/workers/clone-safety.ts Outdated
magyargergo and others added 12 commits June 10, 2026 11:38
…itizer (abhigyanpatwari#2135 review)

structuredClone serializes an array's NON-index own-enumerable properties (e.g.
`arr.meta = fn`) and throws DataCloneError on a non-cloneable one. The clone
sanitizer's array branches iterated numeric indices only, so such an array was
waved through (containsNonCloneable returned false, makeWorkerResultCloneSafe
left the field unrewritten with skipped:[]) — the re-post then threw, fell
through to the fail-closed {type:'error'}, and re-armed the POOL_SIZE=1 cascade
the net exists to prevent.

Add isArrayIndexKey() and, in BOTH containsNonCloneable and stripNonCloneable
array branches (kept in lockstep), scan/strip the non-index own-enumerable keys
after the index loop. A cloneable non-index prop is carried onto the stripped
copy; a non-cloneable one is stripped and recorded. Not reachable from current
parse output (no extractor attaches non-index array props) — a defense-in-depth
hole closed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aping to fail-closed (abhigyanpatwari#2135 review)

findFilePath was documented "never throws" but read element properties
unguarded in its generic sweep — a throwing getter at a non-path key (or a
Proxy with a throwing ownKeys trap) threw out of makeWorkerResultCloneSafe, past
postResultCloneSafe's recovery, to the fail-closed {type:'error'} that under
POOL_SIZE=1 re-arms the cascade the net prevents. Likewise a Proxy with a
throwing getPrototypeOf trap throws inside containsNonCloneable's instanceof
checks.

- findFilePath/pathFromChild now read via safeGet (try/catch) and guard
  Object.keys, honoring the "never throws" contract.
- Each element's sanitize in makeWorkerResultCloneSafe is wrapped: a throw during
  scan/strip drops that one element (recorded as "sanitizer error") rather than
  sinking the whole result — so one pathological element can't fail-close the run.
- Corrected the makeWorkerResultCloneSafe JSDoc ("ONLY after a DataCloneError" →
  after ANY post failure, matching the caller) and documented the deliberate
  failure-path double-traversal (the non-allocating pre-scan is what preserves
  clean-element referential identity).

Tests: a throwing getter on a path-less element is stripped & delivered (not
escaped); a Proxy structural-trap element is dropped, clean siblings survive.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…itizer (abhigyanpatwari#2135 review)

makeWorkerResultCloneSafe rewrote only ARRAY result fields, so a future
non-array sink (a nested object / Map result field) carrying a non-cloneable
value — or an array field whose own non-index property the element loop didn't
reach — would survive the sanitizer and throw on the re-post. Add a final
`if (!isStructuredCloneable(result))` gate that strips any remaining offending
field in place, making "the returned result is structured-cloneable" a hard
postcondition independent of future ParseWorkerResult shape. Failure-path-only
and a no-op once the array loop already made the result clean (the per-field
probe short-circuits every clean field, so it adds no work or skip entries then).

Tests: a function on a non-array result field is stripped & the result becomes
cloneable; the gate adds no skip entry when the array loop already cleaned up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-time guard (abhigyanpatwari#2135 review)

`Cloneable<any>` previously resolved to `any` (not `never`), so a payload with
an `any`-typed member — the most likely escape hatch, since `unknown` is already
blocked — passed `assertCloneable` with no compile error. Add an `IsAny<T>`
branch (the canonical `0 extends 1 & T` probe) as the FIRST arm so `any` resolves
to `never`, matching how `unknown` is already rejected. It must precede the
primitive arm: `any extends CloneablePrimitive` would otherwise resolve to `any`
and re-admit it.

The IsAny-first arm perturbs inference for a bare `undefined` literal argument
(T infers as `unknown` → never); real consumers pass `X | undefined` unions
(the provider hooks), which are unaffected (src tsc clean), so the runtime
identity test now uses a `string | undefined` value — the realistic shape.

Tests: an `any` member fails `assertCloneable` (@ts-expect-error, enforced by
tsconfig.test.json) and `Cloneable<any>` resolves to `never` at the type level.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…st, not Omit (abhigyanpatwari#2135 review)

`AnalyzeResultIpc = Omit<AnalyzeResult,'pipelineResult'>` kept every other field
in the type — including optional ones like `isPrimaryBranch?` — so the type
advertised a field the runtime allowlist never sends, and the doc-comment's
"a future field fails to compile until handled here" only held for REQUIRED
fields. Switch to `Pick<AnalyzeResult, …the six scalar fields…>`: the allowlist
IS the type, so the projection return literal is exhaustive by construction
(omitting a key is a compile error) and a new `AnalyzeResult` field is simply
absent from the wire until deliberately added here. `isPrimaryBranch` is
intentionally excluded (nothing consumes it server-side over this fork; the
parent reads only `repoName`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…npatwari#2135 review)

postResultCloneSafe recovers on ANY fast-path post failure and never inspects
the error type (a throwing getter surfaces a RangeError, not a DataCloneError —
gating on the type was the original net-gap bug). isDataCloneError has no
production caller; it was only exercised by its own unit test. Remove the
function and that test block.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…or (abhigyanpatwari#2135 review)

The clone-safety telemetry accumulator inlined `Array<{path,reason}>` — a
structural duplicate of the exported `SkippedPath`. Import and use the canonical
type so a future rename of its fields is a compile error here instead of a silent
structural drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ndary hooks (abhigyanpatwari#2135 review)

extractTemplateConstraints and collectCaptureSideChannel return `unknown` and
feed values across the worker structured-clone boundary, but the hook contracts
didn't state the cloneability requirement — a future language implementing them
without care could leak a non-serializable value. Document that the return MUST
be structured-clone-safe and should be wrapped with assertCloneable, so the
guarantee is a compile error at the source (abhigyanpatwari#2143).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tegration case (abhigyanpatwari#2135 review)

The GREEN clone-safety integration test asserted only graph content (all files
present), not that the skippedPaths / {type:'warning'} wiring its docstring
claims to cover actually fired. Capture the production logger via _captureLogger
and assert the sanitize telemetry names the offending file (poison.ts) AND the
exact stripped key path (properties.toString) — proving the worker's
skippedPaths append + the parsing-processor warning surfaced end to end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bhigyanpatwari#2135 review)

The IPC projection tests used a hand-built hostile object. Add a case that puts
a real createKnowledgeGraph (whose nodes/relationships getters would materialize
the whole graph under JSON.stringify) in pipelineResult and asserts the
projection drops it entirely — the serialized payload stays under 300 bytes
(a materialized 50-node graph would be thousands), with the scalar fields intact.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… merge union (abhigyanpatwari#2135 review)

Two untested clone-safety branches from the tri-review:

- "dropped unsalvageable": a dirty element whose stripped copy is STILL not
  structured-cloneable must be dropped, not delivered (else the re-post throws).
  Add a deterministic test (a non-plain member with a stateful getter that the
  strip-time probe sees clean but that turns into a function on the post-strip
  verification) asserting the element is dropped and the run survives.

- mergeResult skippedPaths union across sub-batches. mergeResult (and its
  appendAll helper) was module-private in the parse-worker ENTRY module, which a
  main-thread test can't import (it runs MessagePort setup). Extract it to a
  side-effect-free result-merge.ts (mirroring post-result.ts) and unit-test the
  union (including the `??=` target-init path), the skippedLanguages sum, and
  array append. parse-worker imports it back; verified the built worker still
  parses + merges via the real-worker integration path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bhigyanpatwari#2135 review)

Clears the failing `quality / format` CI gate (root prettier, not the
gitnexus-local config). Reformats the pre-existing abhigyanpatwari#2143 wrapping lines in
c-cpp.ts + kotlin.ts plus the clone-safety review-fix files touched in this
PR-update (clone-safety.ts and the new/updated tests). Formatting-only — no
behavior change; tsc, the type-level assertions (tsconfig.test.json), and the
unit + integration suites stay green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread gitnexus/test/unit/clone-safety-cloneable.test.ts Fixed
…ertions (abhigyanpatwari#2135 review)

CodeQL flagged the `expect(a && b && c).toBe(true)` lines in the type-level test
assertions as js/trivial-conditional: after type erasure the operands are
constant `true`, so the `&&` chain always evaluates the same. Replace the `&&`
chain with array equality (`expect([...]).toEqual([true, ...])`) — no
conditional, and the real assertions remain the `const x: …IsNever = true` /
`: IsCloneable<…> = true` annotations (enforced by tsconfig.test.json, which
fail to compile if a guard regresses).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@magyargergo magyargergo merged commit 3d30b94 into abhigyanpatwari:main Jun 10, 2026
43 of 46 checks passed
@magyargergo magyargergo deleted the fix/2112-large-repo-degradation branch June 10, 2026 12:47
@magyargergo magyargergo linked an issue Jun 10, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large Repo Analyze Failed

2 participants