fix(ingestion): lazy-load optional grammars so analyze never crashes when one is missing (#2091, #2093)#2101
Conversation
…when one is missing (abhigyanpatwari#2091, abhigyanpatwari#2093) The scope-resolution registry and the language-provider index statically import every language provider; each optional grammar's query.ts did a top-level `import X from 'tree-sitter-Y'` (swift/dart/kotlin). On a default install where the vendored/optional native binding is absent, that import resolution threw ERR_MODULE_NOT_FOUND at module-load on the main thread — before any runtime gate — crashing `gitnexus analyze` regardless of the repo's actual languages. The worker, parser-loader, and optional-grammars warning helper were all already guarded; only these three query.ts modules defeated the design. - Route the three query.ts getters through the existing lazy, guarded parser-loader.getLanguageGrammar(), so a grammar binding is only required at first use (inside the worker, for a file of that language) — never at module-load. This removes the crash for analyze, the MCP server, and doctor. - Honor GITNEXUS_SKIP_OPTIONAL_GRAMMARS at runtime in parser-loader (it was an install-time-only env): a runtime opt-out for swift/dart/kotlin. Required deps routed through the optional machinery for ABI safety (C, severity:'error') are never skippable this way. - Exclude unavailable-grammar files at the scope-resolution phase too, mirroring the parse phase, so a skipped-language file never falls through to the main-thread re-extract (which would log a noisy "Unsupported language" and needlessly load the grammar on the main thread). - When a grammar is skipped via the runtime opt-out (not a genuinely-missing binding), the parse-phase warning says so instead of suggesting `npm rebuild`. - Close the warning-coverage gaps: add kotlin (.kt/.kts) to the optional-grammar warning list and widen the analyze precheck glob to .swift/.kt/.kts. - Tests: registry static-import-closure regression (no optional grammar binding loads — with a non-vacuity guard that a required binding still does — when importing the scope-resolution registry); a pipeline-level regression that the scope-resolution phase cleanly excludes an opted-out grammar (no "scope extraction failed" noise) while still indexing available languages; and parser-loader runtime skip-gate unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@magyargergo is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 10738 tests passed 16 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
magyargergo
left a comment
There was a problem hiding this comment.
Tri-review — optional-grammar lazy-load fix
Methods (3): GitNexus swarm (risk, test/CI) + Compound-Engineering personas (correctness, adversarial, performance, maintainability, testing) + Codex (live). Engine breakdown: 7 Claude lanes + 1 Codex. Only Codex is a different engine; agreement among the Claude lanes is "consistent across personas," not independent confirmation.
Verdict: no correctness/crash bugs — the core fix is sound
All three methods independently confirmed:
- The lazy
getLanguageGrammar()returns the same object the old static default-import did (CJSmodule.exports; no.defaultinterop) — no parse-result regression for repos that have the grammars. - No circular import (
query.ts → parser-loaderis leaf-direction). - Required-grammar (C) failures are not newly swallowed — still logged at
errorseverity inloadGrammarbefore the scope-phase skip. isRuntimeSkippedGrammarenv-parsing handles all documented forms.
Adversarial + correctness refuted every break attempt (C silent-drop, circular import, env abuse, worker bypass, test false-pass). Refuted nit: the not.toContain('Foo')/Struct assertion is meaningful — swift structs are labeled Struct.
Worth addressing (all P2/P3 — diagnostics, test-robustness, maintainability)
Inline comments mark the six in-diff items. Plus, in body (unchanged file):
- [P2 · Codex+Claude] Worker ignores
GITNEXUS_SKIP_OPTIONAL_GRAMMARS—parse-worker.ts:38-56,388-405keeps its own grammar table. Harmless today (the main-threadparseableScannedfilter excludes skipped-grammar files before dispatch), but the enforcement boundary is undocumented; a future change dispatching files without that pre-filter would silently parse a "skipped" grammar. Worth a comment now; full consolidation (route the worker throughparser-loader) is a planned follow-up.
Lower-priority P3 (not inlined): memoize the env-parse in isRuntimeSkippedGrammar; add a direct unit test for isGrammarRuntimeSkipped; naming consistency isGrammarRuntimeSkipped ↔ isLanguageAvailable; include result.signal in the registry-test diagnostic; factor the 3 near-identical query.ts comment/getter blocks.
Residual (pre-existing, out of scope)
- The worker statically
import C from 'tree-sitter-c', so a broken C ABI still crashes all workers — crash-immunity doesn't fully extend to required grammars in the worker (ties into the worker-consolidation follow-up). - COBOL emits a spurious "npm rebuild tree-sitter-cobol" skip warning (regex parser, no grammar).
CI
Green except a non-code Vercel deploy-auth failure and a pending Build&Push. tsc + eslint clean; targeted + scope-resolution suites pass locally.
Automated 3-tool digest (7 Claude lanes + Codex). 11-file diff fully read. Verify before acting.
| const result: LoadResult = { | ||
| ok: false, | ||
| error: new Error('skipped via GITNEXUS_SKIP_OPTIONAL_GRAMMARS'), | ||
| note: source.unavailableNote, |
There was a problem hiding this comment.
[P3 · Codex+Claude · reproduced] Under a deliberate GITNEXUS_SKIP_OPTIONAL_GRAMMARS opt-out, this skip branch reuses source.unavailableNote ("…no prebuilt .node… failed to load") — which contradicts the opt-out; the (skipped via …) suffix is the only hint. Observed live in the skip unit-test log output. Fix: in the skip branch use a dedicated note (e.g. "disabled via GITNEXUS_SKIP_OPTIONAL_GRAMMARS — unset to re-enable") instead of source.unavailableNote.
| const isRuntimeSkippedGrammar = (key: string, source: GrammarSource): boolean => { | ||
| // Only genuinely-optional grammars (optionalDependencies / vendored), never | ||
| // required deps that merely use the optional machinery for ABI safety. | ||
| if (source.optional !== true || source.severity === 'error') return false; |
There was a problem hiding this comment.
[P2 · maintainability · code-read] Skippability is inferred from optional !== true || severity === 'error'. A future required grammar added as optional:true without severity:'error' would silently become user-skippable. Consider an explicit userSkippable?: boolean on GrammarSource and gate on that — the C-vs-(swift/dart/kotlin) invariant then reads off the SOURCES table instead of a predicate.
| // degraded index. | ||
| try { | ||
| const matches = await glob(['**/*.dart', '**/*.proto'], { | ||
| const matches = await glob(['**/*.dart', '**/*.proto', '**/*.swift', '**/*.kt', '**/*.kts'], { |
There was a problem hiding this comment.
[P2 · maintainability · code-read] This hardcoded glob duplicates OPTIONAL_GRAMMARS[].extensions (optional-grammars.ts) — they must be kept in sync by hand. Export e.g. getOptionalGrammarExtensions() and derive the glob so a future grammar addition tracks automatically.
| }); | ||
|
|
||
| it('skips the Swift file at the parse phase (non-vacuity: Swift was present)', () => { | ||
| expect(messages.some((m) => /Skipping 1 swift file\(s\)/.test(m))).toBe(true); |
There was a problem hiding this comment.
[P2 · testing+adversarial · code-read] /Skipping 1 swift file\(s\)/ is a prefix shared by BOTH the new opt-out message and the old "npm rebuild" message, so the isGrammarRuntimeSkipped routing branch in parse-impl can regress silently. Also assert messages.some(m=>/GITNEXUS_SKIP_OPTIONAL_GRAMMARS/.test(m)) and messages.every(m=>!/npm rebuild/.test(m)).
|
|
||
| beforeAll(async () => { | ||
| prevEnv = process.env[ENV]; | ||
| process.env[ENV] = 'swift'; |
There was a problem hiding this comment.
[P2 · Codex+Claude · code-read] Env set in beforeAll without vi.resetModules(); parser-loader's loadCache is a module singleton, so this is safe only via fork-isolation + the fact nothing calls isLanguageAvailable(Swift) before this. Add vi.resetModules() (then dynamic-import the pipeline), or document the constraint, so a later import-order change can't make it pass vacuously.
| expect(skipped.isLanguageAvailable(SupportedLanguages.Python)).toBe(pyBase); | ||
| }); | ||
|
|
||
| it('a comma list skips only the named grammars (language-id form)', async () => { |
There was a problem hiding this comment.
[P2 · testing · code-read] "skips only the named grammars" asserts Swift is skipped but not that Dart/Kotlin are unaffected — a prefix/union bug wouldn't be caught. Add baseline-comparison assertions that the un-named optional grammars are unchanged (same pattern as the C/Python case).
…r handling (abhigyanpatwari#2101) Polish + test-hardening from the PR abhigyanpatwari#2101 tri-review (no correctness bugs were found; the lazy-load crash fix is unchanged). All P2/P3: - parser-loader: a runtime-skipped grammar now logs an accurate "disabled via GITNEXUS_SKIP_OPTIONAL_GRAMMARS (unset to re-enable)" note instead of reusing the install/platform `unavailableNote` ("no prebuilt .node … failed to load"), which misled a deliberate opt-out. - parser-loader: make skippability explicit — add `userSkippable` to GrammarSource (set on swift/dart/kotlin) and gate the runtime opt-out on it, replacing the implicit `optional && severity!=='error'` predicate. C (required dep via the optional machinery, severity:'error') stays non-skippable. - parser-loader: parse GITNEXUS_SKIP_OPTIONAL_GRAMMARS once (module-scope memo) instead of re-reading + re-allocating a Set on every call. - cli: derive the analyze precheck glob from OPTIONAL_GRAMMARS (no hand-synced duplicate extension list), and make the preflight runtime-skip-aware so an installed-but-opted-out grammar gets an accurate upfront note (reason: skipped) rather than silence — `getOptionalGrammarExtensions()` + a `reason` on MissingGrammar. - parse-worker: document the enforcement boundary — the worker intentionally ignores the runtime env; the main-thread `parseableScanned` filter excludes opted-out files before dispatch. (Comment only; the worker→parser-loader consolidation is the deferred Tier-1 follow-up.) - tests: assert the opt-out message routing (env-var named, no "npm rebuild"); harden the pipeline test's module-cache coherence (vi.resetModules + dynamic import); add comma-list isolation, direct isGrammarRuntimeSkipped, and accurate-note assertions; include result.signal in the registry-test diagnostic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…olute path (abhigyanpatwari#2111) The recurring Windows `EPERM: operation not permitted, symlink` (errno -4048) when adding the MCP server to Antigravity is NOT the abhigyanpatwari#2101/abhigyanpatwari#2110 module-load crash — it is an install-time arborist failure during the `_npx` reify that the MCP client triggers on every `npx gitnexus` launch. Root cause: the `postinstall` materialize step copied each vendored grammar (`vendor/tree-sitter-{c,dart,proto,swift,kotlin}`) into `node_modules/gitnexus/node_modules/tree-sitter-*` as a real package so runtime `require('tree-sitter-dart')` would resolve. Those packages are in no dependency graph, so every subsequent npm/npx reify treats them as **extraneous** and prunes/relocates them — on Windows the relocation goes through `@npmcli/move-file`'s symlink path and throws EPERM (symlinks need Developer Mode/admin), and on every OS the 2nd run silently deletes the grammars. This is the same class as abhigyanpatwari#1728, which the materialize step itself claimed to have fixed. Fix (the prebuildify + node-gyp-build ecosystem pattern): never copy grammars into node_modules. Load each by absolute path from `vendor/<name>` via the new `requireVendoredGrammar` helper — the grammar's own `bindings/node` runs `node-gyp-build(<dir>)` and loads the committed `vendor/<name>/prebuilds/ <platform>-<arch>/…` directly (all 5 ship all 6 tuples). vendor/ is inside the package but not a node_modules subtree, so arborist never sees the grammars and the reify is idempotent — no EPERM, no silent deletion. - new src/core/tree-sitter/vendored-grammars.ts (requireVendoredGrammar / vendoredGrammarDir / VENDORED_GRAMMAR_PACKAGES; VENDOR_ROOT stable in dev+dist) - route all consumers through it: parser-loader, parse-worker, grpc proto, include-extractor (C), http-patterns kotlin, cli optional-grammars probe - postinstall drops the materialize step; build-tree-sitter-grammars.cjs builds in-place under vendor/ (gitignored) and deletes materialize-vendor-grammars.cjs - tests + grammar-introspection helper load grammars from vendor/ too (single source of truth); new vendored-grammars.test.ts guards against reintroducing a bare `require('tree-sitter-<vendored>')` Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…olute path (#2111) (#2144) * fix(grammars): load vendored tree-sitter grammars from vendor/ by absolute path (#2111) The recurring Windows `EPERM: operation not permitted, symlink` (errno -4048) when adding the MCP server to Antigravity is NOT the #2101/#2110 module-load crash — it is an install-time arborist failure during the `_npx` reify that the MCP client triggers on every `npx gitnexus` launch. Root cause: the `postinstall` materialize step copied each vendored grammar (`vendor/tree-sitter-{c,dart,proto,swift,kotlin}`) into `node_modules/gitnexus/node_modules/tree-sitter-*` as a real package so runtime `require('tree-sitter-dart')` would resolve. Those packages are in no dependency graph, so every subsequent npm/npx reify treats them as **extraneous** and prunes/relocates them — on Windows the relocation goes through `@npmcli/move-file`'s symlink path and throws EPERM (symlinks need Developer Mode/admin), and on every OS the 2nd run silently deletes the grammars. This is the same class as #1728, which the materialize step itself claimed to have fixed. Fix (the prebuildify + node-gyp-build ecosystem pattern): never copy grammars into node_modules. Load each by absolute path from `vendor/<name>` via the new `requireVendoredGrammar` helper — the grammar's own `bindings/node` runs `node-gyp-build(<dir>)` and loads the committed `vendor/<name>/prebuilds/ <platform>-<arch>/…` directly (all 5 ship all 6 tuples). vendor/ is inside the package but not a node_modules subtree, so arborist never sees the grammars and the reify is idempotent — no EPERM, no silent deletion. - new src/core/tree-sitter/vendored-grammars.ts (requireVendoredGrammar / vendoredGrammarDir / VENDORED_GRAMMAR_PACKAGES; VENDOR_ROOT stable in dev+dist) - route all consumers through it: parser-loader, parse-worker, grpc proto, include-extractor (C), http-patterns kotlin, cli optional-grammars probe - postinstall drops the materialize step; build-tree-sitter-grammars.cjs builds in-place under vendor/ (gitignored) and deletes materialize-vendor-grammars.cjs - tests + grammar-introspection helper load grammars from vendor/ too (single source of truth); new vendored-grammars.test.ts guards against reintroducing a bare `require('tree-sitter-<vendored>')` Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(grammars): throw on a non-vendored name in requireVendoredGrammar Drift guard (PR #2144 review, P3): validate the argument against VENDORED_GRAMMAR_PACKAGES and fail loudly on an unknown name, so the three grammar lists (package set / CLI probe / build registry) drifting out of sync surfaces as a clear error instead of a confusing absolute-path require miss. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(grammars): prepack guard against stray vendor/<g>/build/ shadowing prebuilds Publish hygiene (PR #2144 review, P2). Now that build-tree-sitter-grammars.cjs source-builds into vendor/<name>/build/, a stray build dir would ship in the tarball (files:["vendor"] overrides .gitignore/.npmignore) AND shadow the committed prebuild — node-gyp-build resolves build/Release before prebuilds/. assert-publish-grammar-coverage.cjs (prepack) now fails `npm pack` if any vendor/*/build exists (findStrayBuildArtifacts), with a clear `rm -rf` fix hint. Adds unit coverage for the new pure function. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(grammars): harden the #2111 no-bare-require regression guard PR #2144 review (P2). The guard regex missed dynamic import(), side-effect `import 'x'`, /subpath, and backtick loads, and only scanned src/. It now covers every node_modules-forcing form (single/double/backtick quotes, optional subpath), scans test/ too (excluding fixtures and the guard file itself), drops the `//`-substring false-negative (leading-comment-only heuristic), and adds a self-test asserting every load form is caught while prose mentions and tree-sitter-cpp are ignored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(grammars): correct stale vendored-grammar comments PR #2144 review (P3). kotlin/query.ts called tree-sitter-kotlin an "optionalDependency" — it is vendored and loaded from vendor/ by absolute path (#2111). proto.ts now states its remaining `_require` is only for the real `tree-sitter` dependency, not a vendored grammar (which goes through requireVendoredGrammar). Comment-only; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Problem
gitnexus analyzeaborts at module-load withERR_MODULE_NOT_FOUNDfor an optional tree-sitter grammar (e.g.tree-sitter-swift) on any install where that grammar's native binding is absent — regardless of the repo's actual languages. This blocks all indexing (incl. the.githooksauto-reindex), the MCP server, anddoctor.Closes #2091. Closes #2093.
Root cause
The optional grammars are swift / dart / kotlin (cobol is regex-based with no grammar; vue reuses the TypeScript grammar;
tree-sitter-protois a guarded gRPC-extractor concern, not an ingestion language). Eachlanguages/<lang>/query.tsdid a top-level staticimport X from 'tree-sitter-Y'. The scope-resolution registry (scope-resolution/pipeline/registry.ts) and the language-provider index (languages/index.ts) statically import every provider, so thosequery.tsmodules are pulled onto the main thread at module-load — and the static grammar import throwsERR_MODULE_NOT_FOUNDbefore any runtime gate can run.parse-worker.ts,parser-loader.ts, and theoptional-grammarswarning helper were all already guarded; only these threequery.tsmodules defeated the design.GITNEXUS_SKIP_OPTIONAL_GRAMMARSdidn't help because the static import fails before any runtime check, and that env was install-time-only.Changes
query.tsgetters (getXParser/getXScopeQuery, consumed only by the siblingcaptures.ts) now resolve the grammar through the existing lazy, guardedparser-loader.getLanguageGrammar(). No grammar is required at module-load — only at first use, inside the worker, for a file of that language.parser-loadernow honorsGITNEXUS_SKIP_OPTIONAL_GRAMMARSat runtime (1/true/all/*or a comma list of lang-ids /tree-sitter-*package names). Required deps routed through the optional machinery for ABI safety (C,severity:'error') are never skippable this way.isLanguageAvailable(mirroring the parse phase), so an unavailable/opted-out language's files are cleanly excluded instead of falling through to the main-thread re-extract (which logged a noisy "Unsupported language" and needlessly loaded the grammar on the main thread).npm rebuilda grammar that built fine. Added kotlin (.kt/.kts) to the optional-grammar warning list and widened the analyze precheck glob to.swift/.kt/.kts(was.dart/.protoonly).Testing
test/integration/optional-grammars/registry-import-closure.test.ts: importing the built scope-resolution registry loads no optional grammar binding (with a non-vacuity guard that a required binding still loads, proving the import chain is still walked). Characterization-first — fails on the pre-fix static import.test/integration/optional-grammars/skip-optional-pipeline.test.ts: drives the real pipeline over a Python+Swift repo with the opt-out set — Python indexes, Swift is cleanly excluded, no "scope extraction failed" noise.test/unit/parser-loader-skip-optional.test.ts: runtime skip-gate semantics (install-state-robust).tsc+ eslint clean; targeted + full scope-resolution suites pass (1189 tests); end-to-endanalyzeworks both with the grammar present (Swift indexed) and with it skipped (Swift excluded, exit 0, no crash).🤖 Generated with Claude Code