feat(install): toolchain-free tree-sitter via vendored prebuilds#2113
Conversation
… prebuilds
Eliminate the C/C++-toolchain requirement at install for the at-risk grammars
(dart, proto, kotlin) by generating + vendoring native prebuilds, mirroring the
existing vendored tree-sitter-swift. The 10 grammars that already ship 6 upstream
prebuilds stay npm dependencies (toolchain-free AND dependency-review-tracked).
- .github/workflows/build-tree-sitter-prebuilds.yml: a registry-parameterized
workflow that builds {dart,proto,kotlin} x {linux,darwin,win32}-{x64,arm64}
prebuilds natively, validates each loads + parses on its arch, and opens a PR
vendoring them. A `guard` job gates the heavy matrix to run ONLY on dispatch
or a real grammar-version change — ordinary code PRs cost zero matrix minutes.
- dart/proto: prefer a committed prebuild; fall back to today's source build
when none matches (no behavior change until prebuilds are vendored).
- kotlin: vendor it (Swift parity) instead of compiling the third-party
optionalDependency from source at the user's install — supersedes abhigyanpatwari#2110's
optionalDependency mechanism. The ~23 MB parser.c is NOT vendored (the
workflow builds from the published package); only node-types + bindings +
prebuilds are. Removed from optionalDependencies; lock regenerated; probe,
parser-loader note, README/.devcontainer docs, and the abhigyanpatwari#2110 tests updated.
DO NOT MERGE until vendor/tree-sitter-kotlin/prebuilds/ is populated by the
build-tree-sitter-prebuilds workflow: until then Kotlin is unavailable (vendored
with no source-build fallback). dart/proto remain fully functional throughout.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@magyargergo is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 10825 tests passed 16 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
Regression guard so a toolchain-less install can never silently lose a tree-sitter
language on a supported platform-arch:
- Vendored grammars (vendor/tree-sitter-*): every one MUST ship a loadable N-API
prebuild for all 6 tuples {linux,darwin,win32}-{x64,arm64}. Asserts the
napi_register_module_v1 entry symbol in each .node (cross-platform, no need to
run the binary). Currently RED for dart/proto/kotlin until the
build-tree-sitter-prebuilds workflow populates their prebuilds/ — this is the
must-fill-before-merge gate (swift already passes 6/6).
- npm-dependency grammars: asserts upstream ships 6/6 N-API too, catching a
future platform drop. tree-sitter-c is allow-listed at 4/6 (missing
linux-arm64/win32-arm64) pending abhigyanpatwari#2116; the guard also fails if that gap is
silently closed (prompting allow-list removal).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Added: full prebuild/ABI coverage verification + regression guardVerified every tree-sitter grammar for native-binding coverage across Result: 10 npm grammars + vendored New regression guard (
So the |
…builds (abhigyanpatwari#2116) tree-sitter-c is the one grammar dependency upstream ships incomplete prebuilds for (4/6 — no linux-arm64/win32-arm64), AND it is a REQUIRED grammar: its own `install` (node-gyp-build) compiles from source when no prebuild matches and exits non-zero, so on a toolchain-less ARM host `npm install gitnexus` HARD-FAILS at the c step — during npm's dependency phase, before any GitNexus postinstall runs (so a postinstall "supplement" can't help). Fix: vendor c prebuild-only at the pinned 0.21.4 (Kotlin pattern), with all six prebuilds GitNexus-cross-built, and drop it from `dependencies`: - vendor/tree-sitter-c/ (bindings + node-types + manifest + prebuilds); build probe scripts/build-tree-sitter-c.cjs; added to the build workflow registry (kind 'npm' — built from c@0.21.4 source). - materialize-vendor-grammars.cjs: c is REQUIRED, so it is always materialized, even under GITNEXUS_SKIP_OPTIONAL_GRAMMARS (it needs no toolchain). - Removed from package.json dependencies + lockfile (nothing else needs npm c — tree-sitter-cpp's dep on c is dev-only and not installed). Preserves the abhigyanpatwari#1242 ABI pin: vendoring 0.21.4 keeps the good ABI while closing the ARM gap. - parser-loader note + the prebuild-coverage guard + a cli-commands assertion updated; c moves from the npm-gap allow-list into the vendored 6/6 cohort. Verified: tsc clean, 31 unit tests pass, c loads/parses; the guard is RED for c/dart/proto/kotlin until the workflow populates prebuilds (the must-fill gate). Closes the operational risk in abhigyanpatwari#2116. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… pre-prebuilds The vendored prebuild-only grammars (c, kotlin) had empty prebuilds/ until the build-tree-sitter-prebuilds workflow runs, so they could not load in CI — and C is hard-required by cross-platform tests (tree-sitter-languages/parsing on ubuntu+macos+windows), which I cannot pre-build for macos/windows locally. The robust fix is a source-build fallback that works on every CI runner (all have a toolchain), mirroring dart/proto: - Vendor the grammar source (binding.gyp + src/) for c and kotlin; their build scripts now PREFER a committed prebuild (toolchain-free) and fall back to `node-gyp rebuild` from the vendored source when no prebuild matches. Verified both compile against the hoisted node-addon-api@^8 and the runtime loads. - prebuild-coverage guard is now bootstrap-tolerant: a grammar that vendors its source (binding.gyp) may have an incomplete prebuild set (the workflow fills it); a prebuild-only grammar (swift) still must ship all six. Any present prebuild must still be N-API. Guard goes green; it re-tightens per-grammar as the workflow populates prebuilds. - actionlint: silence a false-positive SC2016 (JS template literals inside the single-quoted `node -e` validate block). Note: kotlin's generated parser.c is large (~23 MB on disk; compresses heavily in git). Once the workflow populates all six kotlin prebuilds, the source serves only as the fallback and could be slimmed if desired. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`npm prune --omit=dev` in the gitnexus CLI image drops anything not in package.json's dependency tree — including the VENDORED tree-sitter grammars (materialized by postinstall, not declared deps) and their built bindings. The `serve` image analyzes/parses repos at runtime, so re-run the grammar postinstall after the prune (in the toolchain-equipped builder) to restore them. Load-bearing for tree-sitter-c, a core REQUIRED grammar now vendored (abhigyanpatwari#2116): as a former dependency it survived prune; vendored, it would not. Also restores swift/dart/proto/kotlin, which were silently pruned from the image before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d pipeline Swift was the last grammar handled differently — it shipped only upstream prebuilds, while c/dart/proto/kotlin vendor their grammar source and use a prefer-prebuild -> source-build-fallback activation script. Vendor swift's source so all five are handled identically (one uniform build path). - vendor/tree-sitter-swift: add binding.gyp (win-hardened), bindings/node/ binding.cc, src/parser.c (ABI-14 default, ~18 MB), src/scanner.c, and src/tree_sitter/ headers. The 6/6 prebuilds are retained. The legacy parser_abi13.c alternate is intentionally not vendored. - build-tree-sitter-swift.cjs: rewrite the prebuild probe into the dart-style prefer-prebuild then source-build fallback (keeps the GITNEXUS_SKIP gate and the never-exit-non-zero postinstall invariant). - build-tree-sitter-prebuilds.yml: register swift (kind 'vendored'); add its package.json to the version-gated pull_request paths and a validate snippet. - prebuild-coverage guard auto-moves swift into the source-fallback cohort (binding.gyp now present); refresh the stale "swift is prebuild-only" comments. - tests: add build-tree-sitter-swift-probe.test.ts; fix the pre-existing build-tree-sitter-kotlin-probe.test.ts breakage (it still asserted the old probe strings after kotlin's dart-style conversion); assert swift's vendored source in cli-commands.test.ts. - docs: README / .devcontainer / kotlin vendor README — swift's prebuilds are now GitNexus-cross-built from vendored source like the rest, not upstream-only. Verified: swift source-builds against node-addon-api@8 -> N-API binary -> loads against the pinned tree-sitter@0.21.1 (ABI 14) -> parses cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ge guard Vendoring grammar source (parser.c) alongside the prebuilds means the npm tarball now carries ~50 MB of generated source it almost never compiles (every supported platform-arch has a prebuild). Prepare to drop it from the published package once all prebuilds exist — safely. - .npmignore: add a GATED, commented-out "lean publish" block that excludes the source-build inputs (parser.c/scanner.c/tree_sitter/binding.gyp/binding.cc) but keeps prebuilds/ + the runtime files. Uncommenting ships prebuilds-only. - scripts/assert-publish-grammar-coverage.cjs: a prepack guard that refuses to pack/publish if the source exclusion is active while any vendored grammar still lacks 6/6 prebuilds (which would ship a grammar with no loadable binding). Wired into `prepack` (runs on npm pack + publish, incl. the publish.yml dry-run) and exposed as `npm run assert-publish-coverage`. - test: pure-core decision cases + a real-repo publish-safety check that fails CI if .npmignore is activated prematurely. Net: the prebuilds already publish today (files: ["vendor"]); this makes the future switch to a prebuilds-only tarball a one-line uncomment that can't ship a dead grammar. The guard currently reports "source + prebuilds" (only swift has 6/6 prebuilds so far) and passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The per-grammar activation scripts (c/dart/proto/swift/kotlin) were ~95% identical — same prefer-prebuild → source-build → never-fail flow, differing only in name, target_name, required-vs-optional, and the display label in warnings. - scripts/build-tree-sitter-grammars.cjs: one registry-driven script. Bare call builds all (postinstall); `... <name>` builds only the named grammars (so the probe test can isolate one). c is `required: true` (ignores the opt-out gate); the rest honor GITNEXUS_SKIP_OPTIONAL_GRAMMARS. Per-grammar try/catch + a final process.exit(0) preserve the postinstall never-exit-non-zero invariant. - package.json: postinstall is now `materialize && build-tree-sitter-grammars.cjs` (was five chained `build-tree-sitter-<name>.cjs` calls). - tests: replace the two near-identical *-probe.test.ts files with one parameterized build-tree-sitter-grammars-probe.test.ts that also covers the required-vs-optional opt-out split and an unknown-grammar arg. - update cli-commands.test.ts postinstall assertions + the vendor c/kotlin/swift README + swift provenance to reference the consolidated script. Behavior is preserved (warnings normalized to one consistent format). Removes 5 scripts + 1 test file; adds 1 script + 1 test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
magyargergo
left a comment
There was a problem hiding this comment.
Tri-review (3 methods, Codex live) — toolchain-free vendored tree-sitter
Methods & engines. GitNexus swarm + Compound-Engineering personas (both Claude) + Codex (the one independent engine, live) + two requested personas (DevOps/release-pipeline, npm native-build). 4 of 6 Claude lanes returned full structured findings (correctness, adversarial, DevOps, npm-native); 2 (risk, test/CI) ended mid-investigation, their domains covered by the others. Claude-lane agreement = "consistent across personas," not independent; the strong signal is Codex+Claude.
What's solid (the reviews validated this). The consolidated build script's required-vs-optional gate + never-exit-non-zero invariant; the Docker prune → re-postinstall → COPY chain (builder has the toolchain, so the .node + node-gyp-build reach runtime); the publish coverage-guard correctly blocking a lean tarball while prebuilds are incomplete; the cost-gating (an ordinary PR triggers zero native matrix — confirmed); least-privilege workflow perms + persist-credentials: false; N-API / ABI-14 consistency. Genuine care here.
🔴 Headline — P1 (resolve before un-drafting/merge)
Unguarded static import C from 'tree-sitter-c' crashes analyze at module-load when C has no binding. tree-sitter-c is now vendored prebuild-only and removed from dependencies, but it has 0/6 committed prebuilds (only .gitkeep). On a toolchain-less host (source-build fails) or under npm install --ignore-scripts (materialize never runs), C ends up with no binding. Three sites import it with a hard static ESM import that runs before parser-loader's optional/degradation machinery — parse-worker.ts:7, core/ingestion/languages/c/query.ts:2 (eager on the main thread via languages/index.ts → c-cpp.ts → c/index.ts), core/group/extractors/include-extractor.ts:5 — so the import throws ERR_MODULE_NOT_FOUND at module-load and crashes the whole run, instead of the clean "C disabled, other languages fine" the parser-loader entry intends. This is the exact bug class #2091/#2093 fixed for Swift/Dart/Kotlin via the lazy getLanguageGrammar pattern; C was left static because it used to be an always-present npm dep.
- Mechanism: adversarial (Claude) lane + coordinator code-read + the #2091/#2093 precedent. Codex independently flagged the
--ignore-scriptsfailure path (its F2), corroborating the install-failure trigger — but it framed the result as "parser loader fails," not the static-import module-load crash, so the crash mechanism itself is single-Claude-lane + coordinator.[code-read] - Fix: commit C's six prebuilds before un-drafting, or convert the three C imports to the lazy/guarded pattern Swift/Dart/Kotlin already use.
🔴 P0 — supply chain (pre-merge)
attest-build-provenance is pinned to bd77c077… commented # v2.4.0 # PLACEHOLDER-PIN, but real v2.4.0 = e8998f94… (coordinator-verified, GitHub API) — an untagged mid-stream commit, so the SLSA-attestation step runs unvetted action code and the comment misrepresents what runs. The PR's own # PLACEHOLDER-PIN — verify before merge marker flags it as unfinished; the specific finding is that the SHA is wrong, not merely unverified. The sibling setup-python@a26af69b… # v5.6.0 PLACEHOLDER-PIN is, by contrast, a correct pin (coordinator-verified) — only its comment is stale. (Inline on the workflow line.)
Other findings (body)
- P1 (DevOps) —
aggregatehard-fails with no graceful skip ifRELEASE_APP_ID/RELEASE_APP_PRIVATE_KEYare unset; the "open a PR with prebuilds" feature can't run until the App is provisioned, after a full 6-runner build spends its budget. Gate on secret presence + fall back toopen_pr=falseartifacts.[code-read] - P2 (Codex F3 + correctness + npm-native — STRONG) — the publish guard detects source-exclusion only via the
vendor/**/src/parser.ctoggle; a partial.npmignoreedit (e.g. excludingbinding.gypbut leavingparser.ccommented) passes the guard while shipping an unbuildable grammar. Validate the full source-build set, or assert againstnpm pack --dry-run --json. - P2 (Codex F5 + adversarial + correctness — STRONG) — test false-confidence:
gitnexus/test/unit/prebuild-coverage.test.ts's strict 6/6 assertion is dormant wheneverbinding.gypexists (all grammars have it), so a missing prebuild passes CI silently; andparser-loader-abi.testonly drives the lazy loader, so it's blind to the P1 crash. CI is green precisely because CI hosts have a toolchain. Add a per-grammar "fully-prebuilt" gate + a smoke that imports the real static-import surface. - P2 (npm-native + adversarial) — "toolchain-free" is aspirational today: only Swift has 6/6 prebuilds; c/dart/proto/kotlin are source-build-only until
build-tree-sitter-prebuilds.ymlruns (itself blocked on the P0 SHA + new runners + RELEASE_APP secrets). This transitional state is what makes the P1 live. - P2/P3 (DevOps) —
gitnexus/package-lock.jsonin the workflowpaths:fires the (cheap) guard on most dependency PRs; the new arm runners (esp.windows-11-arm) are unproven in this repo (failure mode is safe —fail-fast: false+ the 6/6aggregateassertion refuse a partial set); the 30-min build timeout is tight for the 23.6 MB kotlinparser.c. - P3 (Codex F4 — verify) —
aggregate'sif: inputs.open_pr != falseisnullonpull_requestevents; GHA null-coercion makes the direction non-obvious (over-fire per Codex's read, or under-fire the documented version-change-PR → prebuild-PR flow). Make it explicitly event-gated. - P3 (npm-native) — promote
node-gyp-build/node-addon-apito gitnexus's owndependencies(currently safe under--omit=optionalonly via the requiredtree-sitter's transitive edge — see Refuted); add the/std:c11 /utf-8win block to c'sbinding.gypfor parity (inert today — no non-ASCII bytes in anyparser.c/scanner.c);AGENTS.md:177still calls kotlin/swift "optional" (stale). - P3 (correctness) — pre-existing materialize double-rename-failure edge could leave a grammar unmaterialized (very low probability).
✅ Validated / refuted (the reviews doing their job)
- REFUTED — Codex F1 (P1:
node-gyp-buildoptionalDependency →--omit=optionalbreaks C). The requiredtree-sitter@0.21.1declaresnode-addon-api+node-gyp-buildas regular dependencies (lockfile verified), as do 12 other regulartree-sitter-*deps — sonode-gyp-buildsurvives--omit=optionaland the npm-11 arborist prune, and every vendored grammar'srequire("node-gyp-build")resolves. The independent engine's plausible P1 is a false positive; two Claude lanes + the coordinator's lockfile check refute it. (The P3 hardening note above is the residual.) - Docker image, the publish guard's "can't ship a fully-dead grammar," and
GITNEXUS_SKIP_OPTIONAL_GRAMMARS=1were all probed and found safe.
CI: ABI gates ×3, packaged-install smoke (ubuntu+windows), lint/format/typecheck/actionlint/zizmor/CodeQL/gitleaks/Trivy all green; build-matrix + "Vendor prebuilds" correctly skip (no version change); a few pending; Vercel = deploy-auth (ignore).
Coverage: read the substantive diff (scripts, parser-loader, workflow, Dockerfile, .npmignore, package.json, vendor binding.gyp/package.json); the generated parser.c/node-types.json/binaries were not line-read.
Automated multi-tool digest (3 methods, Codex live). Verify before acting.
| 'usually indicates a corrupted install, an unsupported Node version, ' + | ||
| 'or a native ABI mismatch with the bundled tree-sitter runtime. ' + | ||
| 'Try `npm rebuild tree-sitter-c` or reinstalling, then re-run analyze. ' + | ||
| 'C parsing disabled: vendored `tree-sitter-c` (under ' + |
There was a problem hiding this comment.
P1 (resolve before un-drafting) — this C degradation is bypassed by unguarded static imports. This optional / severity: error entry is meant to turn a missing tree-sitter-c binding into a clean "C disabled, other languages fine." But tree-sitter-c is now vendored prebuild-only with 0/6 committed prebuilds, and three sites import it with a hard static ESM import that runs before this loader and never reaches it: parse-worker.ts:7, core/ingestion/languages/c/query.ts:2 (eager on the main thread via languages/index.ts → c-cpp.ts → c/index.ts), and core/group/extractors/include-extractor.ts:5.
On a toolchain-less host (source-build fails) or under npm install --ignore-scripts (materialize never runs), C has no binding → those imports throw ERR_MODULE_NOT_FOUND at module-load → the whole analyze crashes, never reaching this degradation. This is the exact bug class #2091/#2093 fixed for Swift/Dart/Kotlin via the lazy getLanguageGrammar pattern; C was left static because it used to be an always-present npm dep.
Fix: commit C's six prebuilds before un-drafting, OR convert the three C imports to the lazy/guarded pattern. (Anchored here on the bypassed degradation handler — the actual crash sites are the three import lines above. Adversarial lane traced the mechanism; Codex independently flagged the --ignore-scripts path; verified by code-read + the #2091/#2093 precedent.) [code-read]
tree-sitter-c is now vendored prebuild-only (abhigyanpatwari#2116) with 0/6 committed prebuilds, so on a toolchain-less or `--ignore-scripts` install C has no native binding. Three modules loaded it via a hard top-level `import C from 'tree-sitter-c'`, which throws ERR_MODULE_NOT_FOUND at module-load — crashing `analyze` before parser-loader's optional/severity:error degradation can run. This is the abhigyanpatwari#2091/abhigyanpatwari#2093 bug class (previously fixed for swift/dart/kotlin); C was left static because it used to be an always-present npm dependency. - languages/c/query.ts: load via the lazy guarded getLanguageGrammar(C), mirroring swift/query.ts; the main-thread isLanguageAvailable filter ensures the getters are reached only when C is present. - workers/parse-worker.ts: guarded `_require('tree-sitter-c')` + conditional languageMap spread, like swift/dart/kotlin. - group/extractors/include-extractor.ts: guarded `_require`; getLanguageForFile returns null for .c/.h when absent, so C include-extraction degrades to a no-op (C++ unaffected). - extend the registry-import-closure regression test (abhigyanpatwari#2091/abhigyanpatwari#2093) to assert C also loads lazily at registry static-import time.
The workflow pinned actions/attest-build-provenance@bd77c077… commented `# v2.4.0`, but v2.4.0 is e8998f94… (verified via the GitHub API); bd77c077… is an untagged mid-stream commit, so the SLSA-attestation step ran unvetted action code and the comment misrepresented what runs. Repin to the real v2.4.0 commit and drop the `# PLACEHOLDER-PIN` markers on both this line and the setup-python pin (a26af69b… is already the correct v5.6.0 — only its comment was stale). Update the header NOTE accordingly.
…absent The aggregate job mints a GitHub App token as its first step; with RELEASE_APP_ID/RELEASE_APP_PRIVATE_KEY unset it hard-failed AFTER a full (up-to-6-runner) native build. Since the `secrets` context isn't available in a job-level `if:`, the guard job now computes a `release_app` boolean output (a step can read secrets) and emits an actionable `::notice::`; aggregate gates on it and skips cleanly, while the build job's artifacts still upload (run with open_pr=false for artifacts-only).
…en build timeout `gitnexus/package-lock.json` changes on nearly every dependency PR, so it fired the prebuild workflow's guard job on unrelated churn (the matrix stayed correctly skipped — `gitnexus/package.json` already covers the transition-window pin, so removing the lock only drops guard noise). Also bump the native build job timeout 30 -> 45 min for headroom compiling the 23 MB kotlin / 18 MB swift parser.c, especially under arm emulation.
`inputs.open_pr` is null on pull_request events, and the prior `inputs.open_pr != false` leg relied on GHA's direction-ambiguous null coercion (Codex F4) to decide whether to open the prebuild PR. Gate explicitly on the event: a non-fork pull_request that bumped a grammar version opens the prebuild PR (the documented flow), and `open_pr` is only consulted on workflow_dispatch — so a manual run with open_pr=false stays artifacts-only and no event's behavior rests on coercion.
…e guard The publish guard inferred "is source shipped?" from a single .npmignore toggle line, which a partial/out-of-order edit could defeat (exclude binding.gyp but leave parser.c → unbuildable yet "source-shipping"). It now inspects the EFFECTIVE tarball via `npm pack --dry-run --ignore-scripts --json` (the --ignore-scripts avoids re-entering this guard through prepack): a grammar "ships source" only when EVERY on-disk source-build input (binding.gyp + binding.cc + parser.c + scanner.c when present + a tree_sitter header) is actually in the packed file list. This also surfaced that the gated lean-publish .npmignore block was inert: package.json's `files: ["vendor"]` allow-list overrides .npmignore for the vendored subtree, so those exclusion lines never dropped anything. Replace the dead toggle with documentation of the real mechanism (narrow the `files` field) and note the guard enforces safety on the effective pack regardless of how the slim is done.
…erage The strict 6/6 prebuild assertion was dormant whenever a grammar vendors source (binding.gyp) — which is every grammar — so a dropped prebuild passed CI silently. Add a FULLY_PREBUILT allowlist of grammars GitNexus has committed 6/6 for (today: swift); those must keep all six even with a source fallback, so losing one now fails CI. Grammars graduate into the set as the build-tree-sitter-prebuilds workflow lands their binaries. (The static-import degradation smoke is covered by the registry-import-closure regression test extended in the C lazy-load commit.)
…ncies
Every vendored grammar's index.js does `require("node-gyp-build")` at runtime
to load even a prebuilt .node, so node-gyp-build is runtime-load-critical (and
node-addon-api is needed for the source-build fallback). They were
optionalDependencies, surviving `--omit=optional` only via the required
tree-sitter's transitive edge — correct today but fragile. Promote both to
regular dependencies so the contract is explicit (optionalDependencies is now
empty and removed). Lock the contract with a cli-commands assertion.
…ng.gyp c's binding.gyp used an unconditional `cflags_c: ["-std=c11"]`, while kotlin/swift gate MSVC flags behind an `OS=='win'` condition (/std:c11 /utf-8). Inert today (no non-ASCII bytes in c's parser.c, and node-gyp ignores cflags_c on MSVC anyway), but align the three so a future source-build fallback on Windows behaves consistently.
AGENTS.md still said postinstall "patches tree-sitter-swift, builds tree-sitter-proto" and that only kotlin/swift are "optional". Update to the vendored-uniform model: postinstall materializes the vendored grammars and prefers a committed prebuild (source-build only when none matches); c is required while dart/proto/swift/kotlin are optional + skippable via GITNEXUS_SKIP_OPTIONAL_GRAMMARS=1, with non-fatal warnings only on a toolchain-less host with no matching prebuild.
…lize rollback If renameSync(partial, dest) failed AND the rollback renameSync(backup, dest) also failed, the grammar was left unmaterialized (node_modules/<name> missing) with only a generic "could not materialize" warning — the recoverable backup at <dest>.materialize-bak was unmentioned. Emit a CRITICAL warning naming the backup path and the recovery command on that double-failure, and document that the fail-soft catch removes only the scratch `partial`, never the `backup` (which may be the sole recoverable copy). Never-throw / exit-0 contract intact.
The prepack guard shelled out to `npm pack --dry-run --ignore-scripts --json`, but the `--ignore-scripts` flag is not reliably honored by npm pack's prepare/prepack lifecycle on the CI npm — so build.js ran, polluted the --json stdout with `[build] …`, and the guard's JSON.parse threw. That broke every `npm pack` (packaged-install-smoke on ubuntu+windows) and failed the guard's own real-repo unit test (the only coverage-job failure). Force script-skipping via the reliable `npm_config_ignore_scripts` env config (also removes the prepack re-entry/recursion risk) and parse defensively from the JSON-array start.
…ot `npm pack` The npm-pack-based guard timed out in CI: `npm pack`'s prepare/prepack lifecycle is not skipped by `--ignore-scripts` (flag or env config) on the CI npm, so the inner pack ran the full build (~20s+) — fine for the slow smoke job, but it blew past vitest's 30s test timeout in the coverage job (and risked re-entering this prepack guard). Replace it with a deterministic, fast (~0.1s) check that needs no subprocess: since `files: ["vendor"]` OVERRIDES `.npmignore` for the vendored subtree (so `.npmignore` can never drop vendored source — verified), the ONLY lever that can exclude source is narrowing the package.json `files` field. The guard now reads `files` directly: a grammar "ships source" iff `files` includes the vendor subtree AND the grammar carries a buildable source set on disk. A lean publish that narrows `files` while a grammar lacks 6/6 prebuilds still fails the gate.
Adds a weekly (+ dispatchable) workflow that checks each vendored grammar against its source-of-origin (npm for swift/kotlin, the GitHub default branch for dart/proto; c is excluded — held at 0.21.4 for ABI safety) and opens a PR re-vendoring any update that is ABI-COMPATIBLE with the pinned tree-sitter@0.21.1 (LANGUAGE_VERSION 13-14). ABI awareness is the point: most upstreams have moved to ABI 15 (newer tree-sitter), so a blind "bump to latest" would open PRs that can't build. The monitor fetches the candidate source, reads its parser.c LANGUAGE_VERSION, and only re-vendors 13/14 — incompatible updates are reported (notice + job summary), never applied. (Confirmed live: dart/proto upstreams are ABI 15 today and are correctly held; swift/kotlin are current.) The re-vendor refreshes only the source-build inputs + runtime entrypoints, preserving the GitNexus-hardened binding.gyp / README / prebuilds; the version bump then triggers build-tree-sitter-prebuilds.yml, whose ABI-validation is the final safety net so a subtly-wrong re-vendor can't silently ship. PR creation is gated on the RELEASE_APP secret (skips with a notice if absent), mirroring the build aggregate. Unit test locks the ABI gate; the script is import-safe.
c was excluded from the update monitor, so an upstream c update went unnoticed. Include it, but as report-only via a `hold`: c is ABI-pinned at 0.21.4 (abhigyanpatwari#1242/abhigyanpatwari#858) and must not auto-bump without a tree-sitter runtime upgrade, so an available c update is detected + surfaced (notice + job summary) but never auto-PR'd — even if it were ABI-13/14. `--apply c` refuses defensively. (Live: upstream c is 0.24.1 / ABI 15 today, so c is doubly held — reported, not applied.)
CodeQL flagged the GitHub-tarball fetch — it used `bash -c "gh api …/tarball/$ref
> src.tgz && tar xzf src.tgz"`, interpolating the API-derived ref into a shell
command (the shell-command-injection family: "this shell command depends on an
uncontrolled file name"). Replace it with a shell-free path: capture `gh api`'s
binary tarball as a Buffer via execFileSync, write it to a fixed file, and
extract with execFileSync('tar', …). No shell, no injection surface. Verified the
dart/proto fetch + ABI read still work.
abhigyanpatwari#2113 added the full vendored-grammar + prebuild + monitor feature with no changelog entry; document it under Unreleased ahead of the next release.
3ed551d to
089dad6
Compare
Warning
DRAFT — DO NOT MERGE until
gitnexus/vendor/tree-sitter-kotlin/prebuilds/is populated by thebuild-tree-sitter-prebuildsworkflow. Until then Kotlin is vendored with an emptyprebuilds/and no source-build fallback, so a fresh install has no Kotlin (even on toolchain hosts).dart/protoremain fully functional throughout.Goal
Eliminate the C/C++-toolchain requirement at install for every tree-sitter grammar GitNexus uses ("no operational risk for any tree-sitter"), by generating and vendoring native prebuilds — mirroring the existing vendored
tree-sitter-swift.Why this scope (audit-driven)
Of 15 grammars, only 3 are at risk:
dart+proto(vendored but compiled from source at postinstall) andkotlin(third-partyoptionalDependency, source-only). The other 10 already ship 6 upstream prebuilds and staydependency-review-tracked — left as npmdependencies.swiftalready vendors upstream prebuilds. Becausedart/protoare already off the dependency graph, only kotlin newly leaves it (the sole new CVE-tracking blind spot; mitigated by a recommended drift-check job).What this PR does
.github/workflows/build-tree-sitter-prebuilds.yml— registry-parameterized workflow building{dart,proto,kotlin} × {linux,darwin,win32}-{x64,arm64}natively (no cross-compile), validating each.nodeloads and parses on its arch, then opening a PR that vendors them. Aguardjob gates the heavy matrix to run only on dispatch or a real grammar-version change — ordinary code PRs cost zero matrix minutes.dart/proto— prefer a committed prebuild; fall back to today's source build when none matches → no regression before prebuilds exist.kotlin→ vendored (Swift parity; supersedes the merged fix(install): graceful Kotlin optional-grammar install + accurate toolchain docs #2110 optionalDependency mechanism). The ~23 MBparser.cis not vendored (the workflow builds from the published npm package); onlynode-types.json+ bindings + prebuilds are. Removed fromoptionalDependencies; lockfile regenerated; probe /parser-loadernote / README+devcontainer docs / the fix(install): graceful Kotlin optional-grammar install + accurate toolchain docs #2110 tests all updated.Verification (local)
tsc --noEmitclean · 44 unit tests pass (incl. updatedcli-commands+build-tree-sitter-kotlin-probe) · prettier + eslint clean · all build scriptsnode --check+ exit-0 smoke · workflow YAML + embedded scripts validated. The actual cross-platform prebuild load is only provable by dispatching the workflow (that's the point of the first run).To complete (open decisions)
ubuntu-24.04-arm,windows-11-arm) are available to this repo's plan — else those legs queue forever and no prebuild PR lands.setup-python,attest-build-provenance) + zizmor/Scorecard allowlist.RELEASE_APP_*token must have Contents+PRs write for the aggregate job (publish.yml pattern).prebuilds/here → un-draft.tree-sitter-c4/6-prebuild gap when the tree-sitter 0.21→0.23 runtime bump lands.🤖 Generated with Claude Code