fix(docker): ship runtime-needed published assets (hooks/, skills/) into the image (#2130)#2132
Conversation
…atwari#2130) `gitnexus analyze` inside the official image (akonlabs/gitnexus, ghcr.io/abhigyanpatwari/gitnexus) crashed at startup with: Error: Cannot find module '../../hooks/claude/resolve-analyze-cmd.cjs' Require stack: - /app/gitnexus/dist/cli/resolve-invocation.js `dist/cli/resolve-invocation.js` does `createRequire(import.meta.url)('../../hooks/claude/resolve-analyze-cmd.cjs')` at module load (it is the single source of truth for the npm-11 npx-crash invocation decision, abhigyanpatwari#1939), and `analyze.ts` statically imports it. The Dockerfile.cli runtime stage copied dist/node_modules/package.json/the duckdb script/vendor but never `hooks/`, so the require throws before the command does any work. `hooks/` is in package.json `files`, so npm already ships it — Docker was the only distribution dropping it. Fix: copy `hooks/` into the runtime stage, mirroring what npm publishes. Also add `test/unit/dockerfile-runtime-asset-parity.test.ts`: a regression guard that derives every out-of-dist `require()`/`createRequire()` target from source and asserts each is a runtime-stage `COPY`. Scoped to the require family (not `fs.access`/`new URL`), so it locks the abhigyanpatwari#2130 class without false-flagging the intentionally-omitted, gracefully-degrading `web/` and `skills/` assets. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@magyargergo is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
Follow-up to the hooks/ fix: `skills/` is another published runtime asset (in package.json `files`) the Docker image dropped. The CLI reads the bundled SKILL.md templates from `<pkg>/skills/` for `gitnexus analyze --skills` (ai-context skill generation) and `gitnexus setup`/`uninstall` (installing skills into editor configs). Unlike the hooks/ require(), these reads degrade SILENTLY when the dir is absent — `--skills` writes minimal placeholder content (ai-context.ts), `setup` installs zero skills (setup.ts readdir → []) — so the image looked fine but produced wrong output. Copy `skills/` so the image is fully usable for all CLI tooling. `web/` (also in `files`) is intentionally NOT shipped: this image never builds gitnexus-web (the builder doesn't copy it, build.js logs "skipping web UI"), so it is API-only by design — the UI is the separate Dockerfile.web image / hosted app. The duckdb script is the only runtime asset needed from scripts/, so that stays a single-file copy. Extends the runtime-asset-parity guard with an explicit skills/ assertion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 10863 tests passed 16 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
magyargergo
left a comment
There was a problem hiding this comment.
🔍 Tri-review — PR #2132 (Docker hooks/ + skills/ copy)
Methods: GitNexus risk lane + Compound-Engineering correctness / adversarial / testing lanes + Codex (the one independent engine, live). A GitNexus test/CI lane was also dispatched but truncated mid-run — CI coverage was instead verified directly by me. Engine breakdown: 4 Claude finding-lanes + Codex — not three independent engines (only Codex is a different engine). The strong findings below carry Codex + Claude agreement, and I reproduced the headline myself.
✅ The fix is correct and low-risk — mergeable
Every lane including Codex independently confirmed the two COPY lines are right: the sources exist in the builder (COPY gitnexus ./gitnexus, not dockerignored), --chown=node:node is consistent, ~108 KB is added, and the default serve CMD is unaffected (analyze is lazy-loaded). The web/ omission is validated as genuinely correct — web/ is read via fs.access with a landing-page fallback (api.ts:134-151) and is never produced in this image (the builder doesn't copy gitnexus-web), so a COPY web/ would actually fail the build. The new guard runs in CI (ci-tests.yml → npx vitest run) and passes 4/4.
All findings below concern the durability of the new regression test, not the fix. None are merge-blocking.
🟠 [P2 — guard quality, no runtime impact] The guard is weaker than its docstring claims
requiredExternalAssets() only matches string-literal require('…') / _require('…') / createRequire(…)('…'). It misses module-load requires via computed paths and aliased createRequire bindings — both of which already exist in this repo:
src/core/ingestion/community-processor.ts:30—const leiden = _require(leidenPath)loadsvendor/leiden/index.cjsat module load. I ran the guard's own scanner: it outputs only["package.json","hooks/claude/resolve-analyze-cmd.cjs"]—vendor/leidenis not detected. It survives only because thevendorCOPY happens to exist; deleting that COPY would not fail this test.src/storage/parse-cache.ts:68(requireCJS),src/core/embeddings/embedder.ts:48(ortRequire); backtick specifiers too.
So the 4th test's assertion "copies every out-of-dist asset that dist require()s at module load" (line 134) overstates coverage. The shipped fix is unaffected, but a future computed/aliased module-load require into a non-copied dir — the exact #2130 class — would slip the guard. (Codex + adversarial + correctness + testing; reproduced.)
🟡 [P3] Stale top-of-file docstring contradicts the skills/ test
Lines 24-25 list skills/ as "intentionally not copied … out of scope", but the 2nd commit copies skills/ and adds it('copies skills/…'). Drop skills/ from that sentence (keep only web/). (Codex + correctness + adversarial.)
🟡 [P3] Latent test brittleness (no current trigger)
- False-fail: the regex scans raw source without stripping comments/strings — a future doc-comment like
// require('../../web/x')in a shallowsrc/file would spuriously fail the build. (Verified harmless today: existingrequire('./m')doc-comments resolve insidedist/.) ~5-line fix: strip comments. - No
FROMstop-guard:runtimeStageCopiedSourcesscans to EOF; a stage added afterruntimewould have its COPYs miscounted. - Minor:
AS runtimematch is case-sensitive (/imakes it reformat-proof); addexpect(copied.length).toBeGreaterThan(0)to close a vacuous-pass window if the/app/gitnexus/prefix ever changes; the scanner also ignores shipped.cjs/.mjs(e.g.hooks/**).
🟡 [P3 — advisory, build-efficiency] Layer-cache ordering
The two new COPYs sit before the DuckDB FTS-extension RUN (which network-fetches), so future hooks//skills/ edits bust that cache layer. Moving them after the DuckDB step (before USER node) would shield it. (risk lane.)
CI
The load-bearing Build & Push gitnexus check (which actually builds the CLI image — the ultimate Dockerfile validation) and Tests / ubuntu / coverage are still pending; confirm both green before merge. Vercel fails on deploy-auth, not code.
Automated multi-tool digest (GitNexus swarm + Compound-Engineering personas + Codex). Verify findings before acting.
The 2nd commit on this branch added a skills/ COPY + an it('copies skills/…')
assertion, but the top-of-file docstring still grouped skills/ with web/ as
'intentionally not copied / out of scope'. Drop skills/ from that sentence and
note it is shipped (and covered by its own test). web/ remains the sole
fs-accessed-but-uncopied example. Documentation-only; assertions unchanged.
Docker accepts a lowercase `as runtime`; the parity guard's stage-detection regex was case-sensitive on `AS`, so a future Dockerfile reformat would empty the parsed COPY set and trip the named assertions. Add the /i flag.
runtimeStageCopiedSources scanned from the runtime FROM to EOF. Bound the scan to the runtime stage (start after its FROM, break on the next FROM) so a build stage added after runtime can't have its COPY lines misattributed. No-op today (runtime is the last stage); the copied set is unchanged.
If the runtime FROM or the /app/gitnexus/ source prefix ever stops matching, the copied set goes empty and the parity assertion passes vacuously. Add an explicit copied.length>0 guard so that failure mode is loud and named.
requiredExternalAssets() regex-scanned raw source, so a future doc-comment such
as a commented-out require('../../web/x') in a shallow src file would resolve
outside dist/ and spuriously fail the parity guard. Strip // line comments
first. Block comments are deliberately not stripped (a naive block strip mangles
slash-star inside string/glob literals). Verified the real-tree scanner output
is byte-identical with and without the strip, and resolve-invocation.ts's
multi-line createRequire is still detected. (Also swaps a stray non-ASCII glyph
in the prior commit's comment for ASCII.)
…closed)
The parity scanner only matched string-literal require/createRequire, so it
missed module-load requires via aliased createRequire bindings and computed
paths — and already failed to see community-processor.ts's
`_require(leidenPath)` -> vendor/leiden, making the "every out-of-dist asset"
claim untrue.
Broaden the scan:
- Discover per-file createRequire bindings (requireCJS, _require, …) and match
their literal-arg calls; keep the createRequire(...)('…') IIFE form.
- Detect COMPUTED (non-literal) requires and gate them on MODULE-LOAD position
(brace-depth 0), so the four in-function computed requires that target
node_modules/package.json (optional-grammars, native-check, capabilities,
parse-cache) are correctly out of charter and ignored. A module-load computed
require must be vetted in KNOWN_COMPUTED_REQUIRES (seed: community-processor ->
vendor/leiden) or the test FAILS CLOSED for manual review.
- Allowlist entries are coverage-checked via isCovered, never trusted: a new
test removes the `vendor` COPY from a fixture and asserts leiden surfaces as
uncovered (so deleting a COPY can't silently pass — the abhigyanpatwari#2130 class).
- Exclude `<id>.resolve(...)` (a path lookup, not a load).
- Upgrade the comment stripper to a string-aware pass that removes line AND
block comments without mangling slash-star inside string/glob literals — the
computed branch needs JSDoc requires (e.g. javascript/index.ts) gone, and the
literal scan output stays byte-identical.
Honest claim wording: the 4th test now says coverage = resolvable + vetted
module-load requires, unrecognized computed requires fail for review. Adds
unit tests for fail-closed, aliased-literal, and in-function-ignored paths.
The guard only scanned src/**/*.ts, so hand-written shipped runtime files were
invisible — and they DO require siblings: hooks/claude/gitnexus-hook.cjs and
hooks/antigravity/gitnexus-antigravity-hook.cjs each require('./hook-lock.cjs'),
'./hook-db-lock-probe.cjs', './resolve-analyze-cmd.cjs'. Add a second pass over
shipped .cjs/.mjs assets (the runtime COPY set minus dep/data roots), resolving
each relative require against the asset's OWN package-relative dir and checking
COPY coverage — by prefix, NOT on-disk existence: the antigravity hook's
'./hook-lock.cjs' resolves to hooks/antigravity/hook-lock.cjs (which doesn't
physically exist; hook-lock.cjs lives under hooks/claude) yet is covered by the
whole-hooks COPY. All 6 shipped sibling requires resolve under the hooks COPY.
The hooks/ and skills/ COPYs sat between the vendor COPY and the DuckDB FTS-extension install RUN, so any edit to hook/skill content invalidated that RUN's cache layer — which performs a one-time network INSTALL of the extension (~tens of seconds per affected build). The COPYs have no input dependency on the DuckDB step; relocate them to after it (before USER node) so stable infrastructure layers are not rebuilt on hook/skill churn. Image contents are unchanged. The runtime-asset-parity guard still detects both (its scan covers the whole runtime stage), and the two are consolidated under one comment.
|
Addressed all eight findings from the tri-review above — each as its own commit (planned via
Guard went 4 → 9 tests (added: coverage-not-trust via a vendor-removed fixture, fail-closed on an unrecognized computed require, aliased-literal + in-function-ignored, shipped- |
Summary
Fixes #2130 and closes the broader gap it exposed:
Dockerfile.cli's runtime stage hand-copies a subset of the package's published assets (package.jsonfiles=dist, hooks, scripts, skills, vendor, web), so assets the CLI needs at runtime silently go missing while the npm package keeps working.Two missing assets are restored so the image is fully usable for any CLI tooling, not just the reported
analyzecrash:1.
hooks/— crash (the reported #2130)dist/cli/resolve-invocation.jsdoescreateRequire(import.meta.url)('../../hooks/claude/resolve-analyze-cmd.cjs')at module load (single source of truth for the npm‑11 npx‑crash decision, #1939), andanalyze.tsstatically imports it. Withouthooks/,gitnexus analyzedies at startup:2.
skills/— silent degradationThe CLI reads the bundled
SKILL.mdtemplates from<pkg>/skills/for:gitnexus analyze --skills(ai-context skill generation,ai-context.ts:383) → writes minimal placeholder content when absent,gitnexus setup(setup.ts:842) → installs zero skills (readdir →[]),gitnexus uninstall(uninstall.ts:209).These don't crash — they silently produce wrong/empty output, so the image looked fine but wasn't fully functional.
Deliberately unchanged
web/is infilesbut is intentionally not shipped here: this image never buildsgitnexus-web(the builder doesn't copy it;build.jslogsskipping web UI), sogitnexus/web/is never produced. The CLI image is API‑only by design — the UI is the separateDockerfile.webimage / hosted app, which theservelanding page points to.scripts/is correctly a single‑file copy (install-duckdb-extension.mjs) — the onlyscripts/asset referenced at runtime (extension-loader.ts:91).Regression guard
gitnexus/test/unit/dockerfile-runtime-asset-parity.test.ts:distrequire()/createRequire()target from source and asserts each is a runtime‑stageCOPY(the crash class — future‑proof, would have caught Cannot find module '../../hooks/claude/resolve-analyze-cmd.cjs' in Docker image v1.6.7 —hooks directory not copied to runtime stage #2130 automatically). Scoped to the require family so it does not false‑flag the gracefully‑degradingweb/.hooks/andskills/are copied (the two runtime‑read dirs that must ship).Verification
hooks/crash in a scratch layout mirroring the Dockerfile's runtime copies; confirmed the copy clears it and the CLI proceeds normally.hooks/copy.prettier --check+eslintclean.hooks/andskills/survive into the builder stage (full‑treeCOPY+npm pruneonly touchesnode_modules), so the newCOPY --from=builderlines have a valid source.🤖 Generated with Claude Code