Skip to content

fix(lbug): skip init lock and filesystem mutations for read-only opens (#1783)#1784

Merged
magyargergo merged 7 commits into
abhigyanpatwari:mainfrom
Avicennasis:fix/1783-readonly-init-lock
May 24, 2026
Merged

fix(lbug): skip init lock and filesystem mutations for read-only opens (#1783)#1784
magyargergo merged 7 commits into
abhigyanpatwari:mainfrom
Avicennasis:fix/1783-readonly-init-lock

Conversation

@Avicennasis

Copy link
Copy Markdown
Contributor

Summary

  • Splits doInitLbug into a read-only fast path and a writable path
  • Read-only opens skip: path cleanup (symlink/directory removal), acquireInitLock, orphan sidecar removal, and fs.mkdir — all of which write to the workspace filesystem
  • Read-only opens go straight to preflightLbugSidecars (with allowQuarantine: false) → openLbugConnectionensureReadOnlyConnectionUsable
  • Writable opens retain the existing behavior unchanged (lock, cleanup, open)

Fixes the EROFS: read-only file system, open '/workspace/dir/.gitnexus/lbug.init.lock' error reported in #1783 and by @sengirab in #1549.

This is the companion fix to #1550, which addressed the same :ro mount issue in ensureGitNexusIgnored. The LadybugDB adapter was the remaining code path that unconditionally mutated the workspace filesystem.

Test plan

  • npx tsc --noEmit — clean
  • npx vitest run test/unit/repo-manager.test.ts — existing tests pass
  • npx vitest run test/integration/lbug-lock-retry.test.ts — lock retry tests pass
  • Manual: Docker compose with :ro workspace mount → gitnexus index + query via MCP → no EROFS

@vercel

vercel Bot commented May 22, 2026

Copy link
Copy Markdown

@Avicennasis is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 6m 15s —— View job


Deep PR Review: fix(lbug): skip init lock and filesystem mutations for read-only opens (#1784)

Tasks:


1. Verdict

Request Changes

One blocking finding: the read-only fast path still reaches a writable filesystem open inside ensureReadOnlyConnectionUsable in the shadow-replay crash-recovery scenario, which will throw EROFS on a :ro mount. The PR resolves the primary acquireInitLock regression but does not fully honour its stated contract of "no workspace filesystem mutations for read-only opens."

Additionally, no new automated regression tests pin the contract that was fixed.


2. Production-Readiness Bar

For this PR to merge, the bar is:

  1. A read-only doInitLbug path never performs any fs.open(O_CREAT), fs.mkdir, fs.unlink, fs.rm, fs.rename against the workspace — in all code paths reachable under readOnly=true, including recovery branches.
  2. The writable path's lock, orphan-cleanup, and crash-recovery behavior is unchanged (verified by reading the diff and existing tests).
  3. At minimum one automated test pins the regression: "read-only init never attempts to create lbug.init.lock."
  4. All existing tests pass.

The PR meets bar #2 and #4. It does not fully meet #1, and does not meet #3.


3. What the PR Gets Right

Correct split of the init critical section. The refactor moves acquireInitLock, orphan sidecar cleanup (fs.unlink), and fs.mkdir into the else branch exclusively. These mutations only run for writable opens. The read-only fast path starts at preflightLbugSidecars with allowQuarantine: false — verified in sidecar-recovery.ts line 241: if (sidecarPreflightDisabled() || !options.allowQuarantine) return state; — which means inspectLbugSidecars is called (pure stat reads) but no rename/unlink/create occurs.

Correct currentDbReadOnly = true assignment. The PR correctly sets this so ensureLbugInitialized's cache check (currentDbReadOnly === readOnly at line 677) returns the cached read-only handle on repeated calls, preventing spurious re-inits.

Correct loadFTSExtension(..., { policy: 'load-only' }) for read-only path. This was already correct pre-PR and is preserved: the read-only branch skips DDL/schema creation and only loads the FTS extension.

Writable path is structurally unchanged. The diff shows only reorganization: the writable block is now inside else { ... } with identical logic. Lock acquisition, orphan sidecar cleanup, fs.mkdir, preflightLbugSidecars(mode: 'write', allowQuarantine: true), and writable open all remain in the same order with the same error handling. The finally { await releaseInitLock(); } scope is preserved.

preflightLbugSidecars is called before open in both paths. This was an inconsistency in the old code — the writable path called preflight after acquiring the lock (correct), and now the read-only path also has its own preflight before opening. The sequencing is sensible.


4. Blocking Findings

BLOCKING-1: ensureReadOnlyConnectionUsable performs a writable open on the shadow-replay recovery branch — EROFS on :ro mounts

Severity: HIGH

File: gitnexus/src/core/lbug/lbug-adapter.ts, lines 524–572

Exact code path:

// doInitLbug (read-only branch):
const opened = await openLbugConnection(lbug, dbPath, { readOnly: true }); // ✓ read-only
const usable = await ensureReadOnlyConnectionUsable(dbPath, opened);        // ← enters here

// ensureReadOnlyConnectionUsable:
try {
  await queryAndDrain(handle.conn, READ_ONLY_SHADOW_REPLAY_PROBE); // throws isReadOnlyShadowReplayError
  return handle;
} catch (err) {
  if (isMissingShadowSidecarError(err)) { /* ... */ }
  if (!isReadOnlyShadowReplayError(err)) { /* throw */ }
}

// Reach here on isReadOnlyShadowReplayError — WAL shadow replay needs write access:
await closeLbugConnection(handle);
const writable = await openLbugConnection(lbug, dbPath);   // ← NO { readOnly: true } → EROFS on :ro mount

Why it matters. The PR's stated contract is "read-only opens skip all workspace filesystem mutations." openLbugConnection wraps new lbug.Database(dbPath, ...) without readOnly: true. On a Docker :ro bind mount, LadybugDB's native constructor creates/opens the WAL file with write flags (O_RDWR), which throws EROFS.

When does this trigger? When gitnexus analyze crashes mid-checkpoint — leaving lbug.wal and lbug.shadow both on disk — and then gitnexus serve runs on the :ro-mounted workspace. LadybugDB opens the DB read-only, then the probe MATCH (n) RETURN n LIMIT 1 triggers shadow-page replay, LadybugDB sees it needs write access and throws the replay error, and ensureReadOnlyConnectionUsable attempts the writable open.

Reproduction scenario:

# 1. Run analyze, kill it mid-checkpoint (e.g., kill -9 during heavy write load)
# 2. Verify: lbug.wal and lbug.shadow both exist on disk
# 3. Mount workspace :ro
# 4. gitnexus serve → EROFS from openLbugConnection (line ~544)

Before this PR, :ro mounts failed at acquireInitLock. After this PR, the primary scenario (clean DB) works, but the crash-recovery scenario still fails — with a raw EROFS that lacks any actionable guidance.

Suggested minimal fix:

// In ensureReadOnlyConnectionUsable, replace the writable open with an early
// check and a clear actionable error when the filesystem may be read-only:
await closeLbugConnection(handle);

let writable: LbugConnectionHandle;
try {
  writable = await openLbugConnection(lbug, dbPath);
} catch (openErr) {
  const code = (openErr as NodeJS.ErrnoException)?.code;
  if (code === 'EROFS' || code === 'EACCES' || code === 'EPERM') {
    throw new Error(
      shadowSidecarRecoveryMessage(dbPath, err) +
      '\n  The workspace is mounted read-only — mount it read-write to perform shadow replay recovery.',
    );
  }
  throw openErr;
}

Alternatively, pass a readOnly flag into ensureReadOnlyConnectionUsable and surface a policy-level error before attempting the writable open on a known-:ro path.

Suggested test:

it('ensureReadOnlyConnectionUsable surfaces actionable error instead of EROFS on :ro shadow-replay path', async () => {
  // Mock openLbugConnection to succeed for readOnly=true, then throw EROFS on writable open
  // Verify the error message contains the shadowSidecarRecoveryMessage text
});

5. Non-Blocking Findings / Nits

MEDIUM-1: No new automated regression test pins the #1783 fix

The PR test plan lists:

  • npx tsc --noEmit
  • Existing tests pass
  • Manual Docker test

There is no new test that verifies: "when doInitLbug(dbPath, readOnly=true) runs, acquireInitLock is never called."

The existing lbug-orphan-sidecar-recovery.test.ts tests the writable initLbug path (orphan recovery, lock lifecycle) but has no test for readOnly=true. The PR's correctness is verified by code review alone — a future refactor could accidentally re-add the lock to the read-only path with no automated safety net.

Minimum expected test:

it('initLbug(readOnly=true) never creates .init.lock on the workspace', async () => {
  const tmp = await createTempDir('gitnexus-lbug-ro-');
  const dbPath = path.join(tmp.dbPath, 'lbug');

  // Create a pre-existing DB (writable first)
  const adapter = await import('../../src/core/lbug/lbug-adapter.js');
  await adapter.initLbug(dbPath);
  await adapter.closeLbug();

  // Verify no lock file is created during read-only open
  await adapter.withLbugDb(dbPath, async () => {}, { readOnly: true });
  await expect(fs.access(`${dbPath}.init.lock`)).rejects.toMatchObject({ code: 'ENOENT' });
  await adapter.closeLbug();
  await tmp.cleanup();
});

This test would work without chmod/EROFS simulation and is portable across all platforms.

MEDIUM-2: finalizeLbugSidecarsAfterClose can attempt fs.rename on a :ro workspace after read-only close

safeClose() (line 1620) always calls finalizeLbugSidecarsAfterClose(closingDbPath, { logger }). In sidecar-recovery.ts lines 296–312, a tiny-orphan-wal state (WAL ≤4KB, no shadow) triggers quarantineWalForMissingShadow, which does fs.rename(walPath, quarantinePath).

When does this trigger on :ro? If a previous writable analyze left a tiny orphan WAL (< 4096 bytes) without a shadow file (rare but possible after a crash at WAL initialization), then a read-only serve opens and closes successfully, but safeClose tries to rename that WAL — throwing EROFS.

Mitigating factor: The finalizeLbugSidecarsAfterClose failure is caught at lines 303–311:

} catch (err) {
  if (!missing(err)) {
    warnOnce(..., 'failed to quarantine tiny orphan WAL after close...');
  }
}

So this does NOT crash the process — it logs a warning. This is a spurious warning on :ro mounts, not a fatal error. This is acceptable as a nit rather than a blocker.

Nit: Consider checking currentDbReadOnly (or passing a flag) in finalizeLbugSidecarsAfterClose to suppress the warning on known :ro closes. Or, export a finalizeLbugSidecarsAfterClose variant that takes an allowMutations: boolean.

NIT-1: Writable open inside ensureReadOnlyConnectionUsable lacks a clear actionable error

Even on a writable filesystem, if openLbugConnection(lbug, dbPath) succeeds but the subsequent probe fails with a missing-shadow error (lines 549–553), the error message is shadowSidecarRecoveryMessage which says "Rebuild the index with gitnexus analyze --force." This is correct. But the failure from EROFS (see BLOCKING-1) would surface as a raw node error, not this actionable message.

NIT-2: Overly broad mode: 'read-only' label in preflightLbugSidecars warning

sidecar-recovery.ts line 256 warns: "lbug.wal without lbug.shadow before read-only open" when mode='read-only' and a large orphan WAL is found. This warning fires even though allowQuarantine: false means no action is taken. The warning is visible to operators and might prompt them to take action when there's nothing to do. Consider clarifying: "found orphan WAL before read-only open; no repair attempted because allowQuarantine=false — run gitnexus analyze to repair." But this is cosmetic.

NIT-3: doInitLbug comment says "skip ... orphan sidecar removal" but preflightLbugSidecars with allowQuarantine:false still runs inspectLbugSidecars

The stat calls in inspectLbugSidecars are pure reads, so this isn't a correctness issue. But the comment "skip orphan sidecar removal" could be read as "skip all sidecar operations," which is slightly inaccurate — inspection still runs. Minor documentation clarity gap.


6. Historical Compatibility Check

Issue/PR Role Status after #1784
#1549 Original :ro Docker break (two failures: binary on PATH + .gitignore EROFS) .gitignore was fixed by #1550. LadybugDB lock failure addressed by #1784. Primary scenario resolved.
#1550 Fixed ensureGitNexusIgnored EROFS Unaffected by #1784. ✅
#1618 Crash loop from orphan .wal.checkpoint (motivated acquireInitLock + orphan cleanup in #1622) The writable path retains full #1622 behavior: lock acquired, orphan .shadow and .wal.checkpoint cleaned when main DB is missing. ✅
#1622 Init lock + orphan sidecar recovery for writable opens Writable path is structurally identical post-refactor. Lock acquire, cleanup loop, mkdir, preflightLbugSidecars(allowQuarantine:true), finally { releaseInitLock() } — all preserved. ✅
#1747 Sidecar preflight/recovery, finalizeLbugSidecarsAfterClose, reopenReadOnlyAfterMissingShadow Read-only path correctly uses allowQuarantine: false. Writable path correctly uses allowQuarantine: true. finalizeLbugSidecarsAfterClose still called from safeClose for both. reopenReadOnlyAfterMissingShadow still reachable from ensureReadOnlyConnectionUsable. ✅ (writable open inside that function is the BLOCKING-1 concern)
#1783 Init lock EROFS on :ro mount (lbug.init.lock creation fails) Fixed for primary scenario (clean DB, no crash-recovery state needed). ✅ for clean state, ⚠️ for crash-recovery state (BLOCKING-1).
#1784 This PR Partially achieves stated goal. See BLOCKING-1.

Gap identified: The historical chain is nearly complete. The one gap is the crash-recovery scenario where lbug.wal + lbug.shadow exist simultaneously on a :ro mount. This leaves a narrow but real failure window.


7. OS/Filesystem Matrix

Environment Primary scenario (#1783 fix) Crash-recovery (BLOCKING-1) Notes
Linux Docker :ro ✅ Fixed — no EROFS from acquireInitLock ❌ EROFS from writable open in ensureReadOnlyConnectionUsable if .shadow+.wal present EROFS is the expected errno on bind-mount write attempts
macOS Docker Desktop ✅ Expected fixed ❌ May surface as EPERM or EACCES instead of EROFS (VirtioFS/gRPC-FUSE layer). Error message less obvious. Case-insensitive fs may mask path comparison issues, but none are in the changed code
Windows native ✅ Expected fixed ⚠️ EPERM/EBUSY rather than EROFS; writable open in ensureReadOnlyConnectionUsable may fail differently Known LadybugDB handle-release lag (see waitForWindowsHandleRelease); post-close finalizeLbugSidecarsAfterClose is the higher risk on Windows
WSL2 ✅ Expected fixed EROFS expected on :ro mounts via WSL2 Docker 9P/virtio-fs :ro semantics mirror Linux
CI / root containers chmod simulation unreliable as root — but mocks avoid this ⚠️ Root CI can write to :ro mounts depending on bind-mount implementation — tests may not catch the failure Use mock-based tests (not chmod) for EROFS simulation
Network / synced FS ✅ Lock file never created ⚠️ fs.rename (quarantine in ensureReadOnlyConnectionUsable) may fail with EXDEV on cross-device rename; finalizeLbugSidecarsAfterClose also at risk NFS/CIFS stale-handle semantics add unpredictability

Untested risk: No CI job runs gitnexus serve against a Docker :ro workspace. The "manual Docker test" in the PR test plan is the only evidence for the primary scenario.


8. LadybugDB Lifecycle Analysis

Lock necessity for read-only opens: The PR's reasoning is correct. acquireInitLock guards three operations: (1) checking whether the main DB file is missing, (2) unlinking orphan sidecars, (3) creating the DB. A read-only open does none of these. The race protected by the lock — another process creating a fresh DB between the access() check and the sidecar unlink() — cannot happen on a read-only open that does neither.

Read-only open semantics: new lbug.Database(path, ..., readOnly=true) is expected to open the main DB file for reading only. However, LadybugDB (based on the isReadOnlyShadowReplayError error text "Couldn't replay shadow pages under read-only mode") indicates that the native engine may need write access to complete a checkpoint/shadow replay even on a read-only open if the shadow file is present. This is the BLOCKING-1 issue: LadybugDB's read-only contract is not "never writes" — it's "never writes after successful open."

WAL/shadow replay behavior: When lbug.shadow exists, LadybugDB needs to apply those shadow pages to the main DB file during open. This is a write operation. If the DB is opened read-only, LadybugDB throws isReadOnlyShadowReplayError. The GitNexus workaround (open writable, replay, close, reopen read-only) is correct for writable filesystems but fails on :ro at the writable open step.

Does MATCH (n) RETURN n LIMIT 1 trigger replay? Yes — the probe executes a full read scan, which forces LadybugDB to apply any pending shadow pages. This is by design (the probe is intentionally chosen to trigger replay so failures surface early rather than during real MCP queries). On :ro with crash-recovery state, this probe causes the shadow-replay error.

Is skipping schema creation in read-only safe? Yes. Schema queries (SCHEMA_QUERIES) are DDL (CREATE NODE TABLE IF NOT EXISTS, etc.). Running them read-only would either fail silently (isReadOnlyDbError is swallowed in runSchemaCreationQueries) or throw. Skipping them entirely for read-only opens is correct — the schema was created by the writable gitnexus analyze run.

Data-safety: The allowQuarantine: false policy in the read-only path correctly prevents discarding any WAL content. A large orphan WAL (potentially containing unrecovered data) is left intact — only logged as a warning. This is the safe choice. Tiny orphan WALs (≤4KB, effectively empty WAL files from initialization) are also preserved when allowQuarantine: false. No data can be lost by the read-only path.

LADYBUGDB-CONTRACT coupling: The isReadOnlyShadowReplayError regex in sidecar-recovery.ts (/replay shadow pages under read-only mode/i) is version-pinned to @ladybugdb/core ^0.16.1. If LadybugDB changes this error message, the ensureReadOnlyConnectionUsable shadow-replay branch won't trigger and the writable open won't be attempted — the error would instead fall through to if (!isReadOnlyShadowReplayError(err)) { throw err }, surfacing the raw LadybugDB error. This is acceptable degradation.


9. Test Plan Assessment

PR test plan:

  • npx tsc --noEmit
  • npx vitest run test/unit/repo-manager.test.ts
  • npx vitest run test/integration/lbug-lock-retry.test.ts
  • Manual Docker :ro test

What's actually changed: Only lbug-adapter.ts — specifically doInitLbug and the path for readOnly=true.

Missing tests (blockers/required follow-up):

Test Why missing Severity
Unit: doInitLbug(readOnly=true) never calls acquireInitLock The core regression (#1783) has no automated pin Required before merge
Unit: doInitLbug(readOnly=true) never calls fs.mkdir Secondary mutation not tested Required before merge
Unit: read-only open on mock-EROFS workspace succeeds (no lock created) Tests the exact failure scenario from #1783 Required before merge
Unit: ensureReadOnlyConnectionUsable with writable-open EROFS surfaces actionable error Tests BLOCKING-1 scenario Required if BLOCKING-1 is fixed
Integration: read-only + allowQuarantine: false does not rename WAL Tests finalizeLbugSidecarsAfterClose on :ro Nice-to-have
Integration: writable path still quarantines tiny orphan WAL Regression for #1622/#1747 Existing test covers this indirectly via sidecar-recovery.test.ts

What tests exist that give confidence:

  • lbug-orphan-sidecar-recovery.test.ts — integration tests for writable initLbug with orphan sidecars. These pass (test plan says so), confirming the writable path is unbroken.
  • sidecar-recovery.test.ts — unit tests for preflightLbugSidecars, finalizeLbugSidecarsAfterClose, inspectLbugSidecars. These pass.
  • lbug-lock-retry.test.ts — lock retry tests (writable path).

None of these test the read-only doInitLbug path directly.


10. Final Recommendation

Request changes on BLOCKING-1 before merge. The fix for the primary acquireInitLock regression (#1783) is correct and well-structured. The writable path is provably unchanged. The code is clean and the split is architecturally sound.

Minimal patch direction:

  1. In ensureReadOnlyConnectionUsable (line ~544), catch EROFS/EACCES/EPERM from the writable openLbugConnection call and surface shadowSidecarRecoveryMessage with an additional note about read-only filesystem. This prevents raw EROFS from reaching the user and completes the "no surprising write errors on :ro" contract.

    Fix this →

  2. Add a unit test in lbug-orphan-sidecar-recovery.test.ts (or a new lbug-readonly-init.test.ts) that:

    • Creates a real DB (writable)
    • Mocks or verifies acquireInitLock is never called during withLbugDb(dbPath, ..., { readOnly: true })
    • Confirms no lbug.init.lock file is created on disk

    Add test →

These two changes would bring this PR to "Approve with nits" standing.


@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
9669 9668 0 1 492s

✅ All 9668 tests passed

1 test(s) skipped — expand for details
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 79.37% 32477/40917 N/A% 🟢 ███████████████░░░░░
Branches 68% 20792/30575 N/A% 🟢 █████████████░░░░░░░
Functions 84.21% 3389/4024 N/A% 🟢 ████████████████░░░░
Lines 82.73% 29241/35341 N/A% 🟢 ████████████████░░░░

📋 View full run · Generated by CI

@magyargergo

Copy link
Copy Markdown
Collaborator

@Avicennasis Thank you for raising this PR! Really good job! Could you please look into claude's findings? 🙏

@Avicennasis

Copy link
Copy Markdown
Contributor Author

@magyargergo Thanks for the review! I've pushed two commits addressing the findings:

BLOCKING-1 (EROFS in shadow-replay recovery): ensureReadOnlyConnectionUsable's writable fallback at line ~544 now catches EROFS/EACCES/EPERM and surfaces an actionable error message (shadowSidecarRecoveryMessage + guidance to mount read-write or re-run analyze) instead of a raw EROFS.

MEDIUM-1 (No regression test): Added test/integration/lbug-readonly-init.test.ts with two tests:

Also fixed a variable scoping issue — the original err from the shadow replay catch block wasn't in scope at the writable open catch; hoisted it to shadowReplayErr.

NIT-2 / NIT-3 — Acknowledged but deferred; they're cosmetic and don't affect correctness.

MEDIUM-2 (finalizeLbugSidecarsAfterClose rename warning on :ro) — As the review noted, this is caught and logged as a warning, not a fatal error. Acceptable as-is for this PR.

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 7m 4s —— View job


Deep Follow-up PR Review: fix(lbug): skip init lock and filesystem mutations for read-only opens (#1784)

Tasks:

  • Post initial task list
  • Verify current head SHA and PR state
  • Read lbug-adapter.ts — full diff and key functions
  • Read lbug-readonly-init.test.ts — new test file
  • Read sidecar-recovery.ts — verify allowQuarantine behavior
  • Read test-indexed-db.ts — verify helper API contract
  • Verify correct withTestLbugDB calling convention across test suite
  • Run Unicode/bidi scans on changed files
  • Assess: shadow-replay EROFS handling (BLOCKING-1 from prior review)
  • Assess: test strength (mutation thinking)
  • Assess: read-only mutation boundary
  • Assess: writable path preservation
  • CI status check
  • Final verdict

Review Bar for This PR

A read-only Docker :ro open must never perform filesystem mutations. The prior review identified two gaps: (1) raw EROFS from the shadow-replay writable fallback path, and (2) no automated regression test that would fail if acquireInitLock were re-added to the read-only path. The author claims both are addressed. This review verifies the current head directly.


Problem Being Solved

doInitLbug unconditionally called acquireInitLock, which created ${dbPath}.init.lock on the workspace filesystem. Docker :ro bind mounts fail that fs.open(O_CREAT) with EROFS. This PR splits doInitLbug into a read-only fast path (no lock, no mkdir, no sidecar mutation) and the existing writable path.


Current PR State

Head SHA: f6b8e7f — confirmed. Three commits: de1b91f, c6660db, f6b8e7f. Changed files: exactly lbug-adapter.ts and lbug-readonly-init.test.ts.

CI on current head: Only 5 completed workflows visible (Dependency Review, Gitleaks, Docker Build & Push, PR Autofix, CodeQL — all green). The main test pipeline (unit/integration on Ubuntu/macOS/Windows with 9554+ tests) is not confirmed on the current head f6b8e7f. The 9554-test report in the PR timeline predates these commits.


Findings


FINDING 1 — BLOCKING: New test file uses wrong withTestLbugDB API — tests are broken and will fail at runtime

Risk: The two new regression tests appear to cover the MEDIUM-1 gap from the prior review, but they are written using an incorrect API. If run, both tests fail immediately. The DoD criterion "Automated tests would fail if the read-only path called acquireInitLock" is not met.

Evidence to check:

  • withTestLbugDB signature in test/helpers/test-indexed-db.ts: (prefix: string, fn: (handle: IndexedDBHandle) => void, options?: WithTestLbugDBOptions)
  • Canonical usage from every other test file: withTestLbugDB('name', (handle) => { it('...', async () => { /* use handle.dbPath from closure */ }); }, { seed: [...] })
  • New test call at lbug-readonly-init.test.ts:15: withTestLbugDB(describe, { seed: [...] })

Confirmed evidence:

The new test file calls the helper as:

withTestLbugDB(describe, {           // ← 'describe' function as prefix, options object as fn
  seed: [ 'CREATE NODE TABLE...', 'CREATE (:File ...)' ],
});

This is wrong in two ways:

  1. prefix receives the vitest describe function (not a string). fn receives { seed: [...] } (an object, not a callback).
  2. Inside withTestLbugDB, the library calls fn(lazyHandle) at line 176 of test-indexed-db.ts. Since fn = { seed: [...] } is a plain object, this throws TypeError: fn is not a function inside the inner describe callback.
  3. The two it tests — registered at the outer describe level — access (ctx as any).lbugHandle. The helper never sets ctx.lbugHandle; it passes the handle only to fn (the callback). This property is always undefined.
  4. Both it tests throw 'withTestLbugDB did not attach handle' and fail.

Every other test file using withTestLbugDB accesses the handle from the closure argument to fn, not from ctx:

withTestLbugDB('core-adapter', (handle) => {
  it('...', async () => {
    const { dbPath } = handle;  // ← from closure
  });
}, options);

The new test uses a pattern that does not exist in this helper. There is no vitest context injection (test.extend) wiring up lbugHandle.

Recommended fix: Rewrite lbug-readonly-init.test.ts to use the correct API:

withTestLbugDB(
  'lbug-readonly-init',
  (handle) => {
    it('read-only open never creates lbug.init.lock on disk', async () => {
      const { dbPath } = handle;
      const lockPath = _initLockPathForTest(dbPath);
      const adapter = await import('../../src/core/lbug/lbug-adapter.js');
      await adapter.closeLbug();
      await expect(fs.access(lockPath)).rejects.toMatchObject({ code: 'ENOENT' });
      await adapter.withLbugDb(dbPath, async () => {}, { readOnly: true });
      await expect(fs.access(lockPath)).rejects.toMatchObject({ code: 'ENOENT' });
      await adapter.closeLbug();
    });

    it('read-only open does not create parent directories (fs.mkdir spy)', async () => {
      const { dbPath } = handle;
      const adapter = await import('../../src/core/lbug/lbug-adapter.js');
      await adapter.closeLbug();
      // Spy on fs.mkdir to confirm it is not called during read-only open
      const mkdirSpy = vi.spyOn(fs, 'mkdir');
      try {
        await adapter.withLbugDb(dbPath, async () => {}, { readOnly: true });
        expect(mkdirSpy).not.toHaveBeenCalled();
      } finally {
        mkdirSpy.mockRestore();
        await adapter.closeLbug();
      }
    });
  },
  {
    seed: [
      `CREATE NODE TABLE IF NOT EXISTS File (path STRING, PRIMARY KEY (path))`,
      `CREATE (:File {path: '/test/file.ts'})`,
    ],
  },
);

Note the fs.mkdir spy in test 2 — this actually proves fs.mkdir was not called, rather than just verifying the parent exists (which would pass regardless). Without the spy, test 2 passes even if fs.mkdir is erroneously called on a writable temp filesystem.

Blocks merge: YES. Both tests fail at runtime. The only regression coverage for the #1783 fix is non-functional.

Fix this →


FINDING 2 — RESOLVED: Shadow-replay EROFS handling (prior BLOCKING-1)

Risk: The writable openLbugConnection call inside ensureReadOnlyConnectionUsable could throw raw EROFS on :ro mounts in the crash-recovery scenario.

Evidence to check: lbug-adapter.ts lines 524–587, specifically the writable open after isReadOnlyShadowReplayError.

Confirmed evidence:

shadowReplayErr is now correctly scoped outside the try-catch at line 528:

let shadowReplayErr: unknown;    // ← hoisted by f6b8e7f
try {
  await queryAndDrain(handle.conn, READ_ONLY_SHADOW_REPLAY_PROBE);
  return handle;
} catch (err) {
  if (isMissingShadowSidecarError(err)) { ... }
  if (!isReadOnlyShadowReplayError(err)) { ... throw err; }
  shadowReplayErr = err;         // ← assigned in catch, in scope below
}
// ...
try {
  writable = await openLbugConnection(lbug, dbPath);
} catch (openErr) {
  const code = extractErrnoCode(openErr);
  if (code === 'EROFS' || code === 'EACCES' || code === 'EPERM') {
    throw new Error(
      shadowSidecarRecoveryMessage(dbPath, shadowReplayErr) +   // ← uses shadowReplayErr ✅
        '\n  The workspace appears to be read-only — mount it read-write to perform shadow replay recovery,' +
        ' or re-run `gitnexus analyze` on a writable filesystem to rebuild the index.',
    );
  }
  throw openErr;    // ← non-permission errors propagate correctly ✅
}

Error taxonomy is correct: only EROFS, EACCES, EPERM are converted to the actionable message. DB corruption errors, ENOENT, ENOSPC, and other failures propagate as raw errors.

The message includes shadowSidecarRecoveryMessage(dbPath, shadowReplayErr) which contains the original LadybugDB error text and the actionable gitnexus analyze --force guidance. The EROFS-specific note adds mount-level guidance. openErr is not chained as cause, which means the EROFS path/errno is not in the error chain — but this matches project error style and is not blocking.

Recommended fix: None required. Resolved correctly.

Blocks merge: No (resolved).


FINDING 3 — CONFIRMED CLEAN: Read-only mutation boundary

Confirmed evidence by reading doInitLbug lines 699–843:

Every operation reachable in the if (readOnly) branch:

Operation Classification
preflightLbugSidecars(... allowQuarantine: false) Read-only: calls inspectLbugSidecars (stat reads only); allowQuarantine: false returns at sidecar-recovery.ts:241 before any rename/quarantine
openLbugConnection(lbug, dbPath, { readOnly: true }) Read-only: LadybugDB opens in read-only mode
ensureReadOnlyConnectionUsable(dbPath, opened) Read-only probe + EROFS-caught writable fallback (see Finding 2)
loadFTSExtension(undefined, { policy: 'load-only' }) Read-only: policy: 'load-only' skips DDL and INSTALL
finalizeLbugSidecarsAfterClose (in safeClose) Best-effort non-fatal: quarantineWalForMissingShadow failure is caught at sidecar-recovery.ts:303–310 and logged as warn, never thrown

acquireInitLock, fs.mkdir, fs.unlink, fs.rm, and fs.rename are all in the else branch only. ✅

Blocks merge: No.


FINDING 4 — CONFIRMED CLEAN: Writable path preservation

Confirmed evidence by diffing the else branch against origin/main behavior:

The else block preserves the exact pre-PR ordering from #1622:

  1. fs.lstat + symlink/directory removal (unchanged) ✅
  2. acquireInitLock(dbPath) (unchanged) ✅
  3. fs.access(dbPath) + orphan sidecar fs.unlink (unchanged, inside try under lock) ✅
  4. fs.mkdir(parentDir, { recursive: true }) (unchanged, after orphan cleanup) ✅
  5. preflightLbugSidecars(... allowQuarantine: true) (unchanged) ✅
  6. openLbugConnection(lbug, dbPath) writable open (unchanged) ✅
  7. releaseInitLock() in finally (unchanged) ✅
  8. runSchemaCreationQueries and reopenWritableAfterMissingShadow retry (unchanged) ✅
  9. loadFTSExtension(undefined, {}) default policy (unchanged) ✅

Blocks merge: No.


FINDING 5 — CONFIRMED CLEAN: Unicode and file hygiene

Confirmed evidence:

git diff --check HEAD~3..HEAD → clean (no output)
grep -Pn "[\x{202A}-\x{202E}\x{2066}-\x{2069}]" → no matches

Non-ASCII characters found: em dashes (, U+2014) in comments only, consistent with existing file style throughout lbug-adapter.ts. No bidi controls. ✅


FINDING 6 — CONFIRMED: CI on current head is incomplete

Confirmed evidence: mcp__github_ci__get_ci_status returns 5 completed workflows for the current head — Dependency Review, Gitleaks, Docker Build, PR Autofix, CodeQL. The main CI workflow (unit/integration tests, multi-platform) is absent from completed runs. The 9554-test report in PR comments predates the c6660db/f6b8e7f commits. CI cannot be confirmed green on the current head for the test suite that would catch the broken tests.

Blocks merge: No on its own, but combined with FINDING 1 (broken tests) this means there is no automated validation for the current head.


Assessment Sections

1. Read-only mutation boundary

Clean. All fs.* mutations are gated by the else branch. allowQuarantine: false in preflight is enforced at sidecar-recovery.ts:241. Post-close finalizer is best-effort non-fatal. ✅

2. Shadow-replay recovery

Resolved. The writable fallback attempt still occurs (required by LadybugDB semantics) but permission failures are caught and surfaced as actionable recovery guidance. Policy contract: "clean read-only opens do not mutate; crash-recovery requiring shadow replay surfaces clear guidance." This is acceptable. ✅

3. Error taxonomy

Correct. Only EROFS/EACCES/EPERM convert to the shadow-replay recovery message. All other errors propagate unmodified. ✅

4. Writable path preservation

Identical to pre-PR behavior. Ordering, locking, cleanup, and finally-block semantics all preserved. ✅

5. Test strength

Broken. The test file calls withTestLbugDB with the wrong argument order and wrong types. Both it tests fail at runtime with "withTestLbugDB did not attach handle". See FINDING 1.

Even if fixed for correct API usage, test 2 ("does not create parent directories") still needs an fs.mkdir spy — asserting the parent exists beforehand does not prove mkdir was not called.

6. Cross-platform behavior

The EROFS/EACCES/EPERM catch covers Linux Docker :ro, macOS Docker Desktop (EACCES/EPERM from VirtioFS), and Windows (EPERM). WSL2 would surface EROFS like Linux. CI root containers: the tests should not rely on chmod simulation (and don't). ✅ for the code. Tests don't cover EPERM path at all (broken). ⚠️

7. Observability

Recovery error message (lines 552–556) is clear: identifies the shadow-replay state, instructs operator to mount read-write or run gitnexus analyze. shadowSidecarRecoveryMessage provides the original LadybugDB error text. Adequate. ✅

8. Unicode hygiene

Em dashes in comments only. No hidden/bidi controls. ✅


Back-and-forth Avoided

  • The production code change (lbug-adapter.ts) is correct. The read-only fast path, the EROFS handling fix, and the writable path preservation are all verified directly in the file.
  • The f6b8e7f scoping fix correctly hoists shadowReplayErr — the variable is in scope at the EROFS catch.
  • The preflightLbugSidecars allowQuarantine: false early return at sidecar-recovery.ts:241 is confirmed: inspection-only, no mutations.
  • finalizeLbugSidecarsAfterClose failure on :ro mount is non-fatal (caught at sidecar-recovery.ts:303–310) — this remains an accepted nit from the prior review.

Final Verdict

not production-ready

The production code change in lbug-adapter.ts is correct: the read-only fast path is clean, the EROFS handling in ensureReadOnlyConnectionUsable is properly scoped and actionable, the writable path is structurally identical to the pre-PR baseline, and there are no bidi controls. However, the new test file lbug-readonly-init.test.ts calls withTestLbugDB with incorrect arguments (describe function as prefix, seed options object as the fn callback), causing a TypeError: fn is not a function in the nested describe callback and both it tests to fail with "withTestLbugDB did not attach handle". The DoD criterion — "Automated tests would fail if the read-only path called acquireInitLock, fs.mkdir, or tried to create ${dbPath}.init.lock" — is not met because neither test runs against a real DB handle. Additionally, the main test pipeline has not completed on the current head f6b8e7f. The test file needs to be rewritten using the correct callback-based API (consistent with every other test in the suite), with handle access from the closure rather than ctx.lbugHandle, and test 2 should add an fs.mkdir spy to actually prove the mutation is skipped rather than just asserting the parent directory exists.

@magyargergo

Copy link
Copy Markdown
Collaborator

@Avicennasis please look into this review and please make sure that the CI is healthy

@Avicennasis

Copy link
Copy Markdown
Contributor Author

@magyargergo Apologies for the broken tests — I used the wrong withTestLbugDB API. Fixed in b64391d:

What was wrong: Passed describe function as the prefix argument and the options object as the fn callback. Both tests would throw TypeError: fn is not a function at runtime.

What's fixed:

  • Rewrote to the correct pattern: withTestLbugDB('lbug-readonly-init', (handle) => { ... }, { seed: [...] })
  • Handle accessed from the closure (not ctx.lbugHandle which doesn't exist)
  • Test 2 now uses vi.spyOn(fs, 'mkdir') to actually prove fs.mkdir is not called during read-only open
  • Removed unused describe import

CI should be clean on this push.

@magyargergo

Copy link
Copy Markdown
Collaborator

@Avicennasis tests are failing again

@Avicennasis

Copy link
Copy Markdown
Contributor Author

@magyargergo Pushed c952c8a — the test failure was caused by seed data that conflicted with the existing LadybugDB schema. The seed tried to create a File table with a path column, but the real schema has id, name, filePath, content with id as PK. Error was: Prepare failed: Binder exception: Cannot find property path for .

Fix: Removed the seed entirely. The test only needs a valid, empty DB to verify that read-only opens don't create filesystem artifacts. This matches the pattern in lbug-lock-retry.test.ts which also uses withTestLbugDB with no seed.

Changes in this push:

  • No seed data (avoids schema conflicts)
  • Follows exact withTestLbugDB('prefix', (handle) => { ... }) pattern from lbug-lock-retry.test.ts
  • Test 1: verifies no lbug.init.lock created on read-only open
  • Test 2: vi.spyOn(fs, 'mkdir') proves fs.mkdir is never called

@magyargergo

Copy link
Copy Markdown
Collaborator

Please make sure the removal retries are in place for the cleanup 🙏

@magyargergo

Copy link
Copy Markdown
Collaborator

@Avicennasis please revert the mkdir mocking from your integration tests

#1783)

`doInitLbug` unconditionally called `acquireInitLock`, which creates
`${dbPath}.init.lock` inside the workspace. On a Docker `:ro` bind
mount this fails with EROFS.

The init lock prevents a TOCTOU race during DB creation — read-only
opens never create databases and don't need it. Split the init path:

- Read-only: skip path cleanup, init lock, orphan sidecar removal,
  and mkdir. Go straight to preflightLbugSidecars (allowQuarantine:
  false) then openLbugConnection with readOnly: true.
- Writable: unchanged behavior (lock, cleanup, open).
- Shadow-replay recovery: catch EROFS/EACCES/EPERM from the writable
  fallback in ensureReadOnlyConnectionUsable and surface an actionable
  error instead of a raw filesystem exception.

Includes integration test verifying read-only open never creates
lbug.init.lock on disk.

Fixes #1783
@Avicennasis Avicennasis force-pushed the fix/1783-readonly-init-lock branch from c952c8a to c2b1222 Compare May 22, 2026 16:40
@Avicennasis

Copy link
Copy Markdown
Contributor Author

@magyargergo Squashed to a single clean commit (c2b1222) after setting up proper local dev environment and running tests.

Changes from previous iteration:

  • Removed the vi.spyOn(fs, 'mkdir') mock from the integration test as requested
  • Removed the broken seed data that caused the CI failure (Binder exception: Cannot find property path)
  • Single test remains: verifies read-only open never creates lbug.init.lock — real integration test, no mocking

Local validation:

  • npx tsc --noEmit — clean
  • npx vitest run test/integration/lbug-readonly-init.test.ts — PASS (1), FAIL (0)
  • npx vitest run test/integration/lbug-lock-retry.test.ts test/integration/lbug-orphan-sidecar-recovery.test.ts test/unit/repo-manager.test.ts — PASS (83), FAIL (0)

Cleanup is handled by the withTestLbugDB helper's built-in afterAll teardown which includes retry logic for Windows handle-release.

@magyargergo magyargergo merged commit a8a8a37 into abhigyanpatwari:main May 24, 2026
34 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants