fix(lbug): skip init lock and filesystem mutations for read-only opens (#1783)#1784
Conversation
|
@Avicennasis is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
|
Claude finished @magyargergo's task in 6m 15s —— View job Deep PR Review: fix(lbug): skip init lock and filesystem mutations for read-only opens (#1784)Tasks:
1. VerdictRequest Changes One blocking finding: the read-only fast path still reaches a writable filesystem open inside Additionally, no new automated regression tests pin the contract that was fixed. 2. Production-Readiness BarFor this PR to merge, the bar is:
The PR meets bar #2 and #4. It does not fully meet #1, and does not meet #3. 3. What the PR Gets RightCorrect split of the init critical section. The refactor moves Correct Correct Writable path is structurally unchanged. The diff shows only reorganization: the writable block is now inside
4. Blocking FindingsBLOCKING-1:
|
| Issue/PR | Role | Status after #1784 |
|---|---|---|
| #1549 | Original :ro Docker break (two failures: binary on PATH + .gitignore EROFS) |
.gitignore was fixed by #1550. LadybugDB lock failure addressed by #1784. Primary scenario resolved. |
| #1550 | Fixed ensureGitNexusIgnored EROFS |
Unaffected by #1784. ✅ |
| #1618 | Crash loop from orphan .wal.checkpoint (motivated acquireInitLock + orphan cleanup in #1622) |
The writable path retains full #1622 behavior: lock acquired, orphan .shadow and .wal.checkpoint cleaned when main DB is missing. ✅ |
| #1622 | Init lock + orphan sidecar recovery for writable opens | Writable path is structurally identical post-refactor. Lock acquire, cleanup loop, mkdir, preflightLbugSidecars(allowQuarantine:true), finally { releaseInitLock() } — all preserved. ✅ |
| #1747 | Sidecar preflight/recovery, finalizeLbugSidecarsAfterClose, reopenReadOnlyAfterMissingShadow |
Read-only path correctly uses allowQuarantine: false. Writable path correctly uses allowQuarantine: true. finalizeLbugSidecarsAfterClose still called from safeClose for both. reopenReadOnlyAfterMissingShadow still reachable from ensureReadOnlyConnectionUsable. ✅ (writable open inside that function is the BLOCKING-1 concern) |
| #1783 | Init lock EROFS on :ro mount (lbug.init.lock creation fails) |
Fixed for primary scenario (clean DB, no crash-recovery state needed). ✅ for clean state, |
| #1784 | This PR | Partially achieves stated goal. See BLOCKING-1. |
Gap identified: The historical chain is nearly complete. The one gap is the crash-recovery scenario where lbug.wal + lbug.shadow exist simultaneously on a :ro mount. This leaves a narrow but real failure window.
7. OS/Filesystem Matrix
| Environment | Primary scenario (#1783 fix) | Crash-recovery (BLOCKING-1) | Notes |
|---|---|---|---|
Linux Docker :ro |
✅ Fixed — no EROFS from acquireInitLock |
❌ EROFS from writable open in ensureReadOnlyConnectionUsable if .shadow+.wal present |
EROFS is the expected errno on bind-mount write attempts |
| macOS Docker Desktop | ✅ Expected fixed | ❌ May surface as EPERM or EACCES instead of EROFS (VirtioFS/gRPC-FUSE layer). Error message less obvious. |
Case-insensitive fs may mask path comparison issues, but none are in the changed code |
| Windows native | ✅ Expected fixed | EPERM/EBUSY rather than EROFS; writable open in ensureReadOnlyConnectionUsable may fail differently |
Known LadybugDB handle-release lag (see waitForWindowsHandleRelease); post-close finalizeLbugSidecarsAfterClose is the higher risk on Windows |
| WSL2 | ✅ Expected fixed | ❌ EROFS expected on :ro mounts via WSL2 Docker |
9P/virtio-fs :ro semantics mirror Linux |
| CI / root containers | ✅ chmod simulation unreliable as root — but mocks avoid this |
:ro mounts depending on bind-mount implementation — tests may not catch the failure |
Use mock-based tests (not chmod) for EROFS simulation |
| Network / synced FS | ✅ Lock file never created | fs.rename (quarantine in ensureReadOnlyConnectionUsable) may fail with EXDEV on cross-device rename; finalizeLbugSidecarsAfterClose also at risk |
NFS/CIFS stale-handle semantics add unpredictability |
Untested risk: No CI job runs gitnexus serve against a Docker :ro workspace. The "manual Docker test" in the PR test plan is the only evidence for the primary scenario.
8. LadybugDB Lifecycle Analysis
Lock necessity for read-only opens: The PR's reasoning is correct. acquireInitLock guards three operations: (1) checking whether the main DB file is missing, (2) unlinking orphan sidecars, (3) creating the DB. A read-only open does none of these. The race protected by the lock — another process creating a fresh DB between the access() check and the sidecar unlink() — cannot happen on a read-only open that does neither.
Read-only open semantics: new lbug.Database(path, ..., readOnly=true) is expected to open the main DB file for reading only. However, LadybugDB (based on the isReadOnlyShadowReplayError error text "Couldn't replay shadow pages under read-only mode") indicates that the native engine may need write access to complete a checkpoint/shadow replay even on a read-only open if the shadow file is present. This is the BLOCKING-1 issue: LadybugDB's read-only contract is not "never writes" — it's "never writes after successful open."
WAL/shadow replay behavior: When lbug.shadow exists, LadybugDB needs to apply those shadow pages to the main DB file during open. This is a write operation. If the DB is opened read-only, LadybugDB throws isReadOnlyShadowReplayError. The GitNexus workaround (open writable, replay, close, reopen read-only) is correct for writable filesystems but fails on :ro at the writable open step.
Does MATCH (n) RETURN n LIMIT 1 trigger replay? Yes — the probe executes a full read scan, which forces LadybugDB to apply any pending shadow pages. This is by design (the probe is intentionally chosen to trigger replay so failures surface early rather than during real MCP queries). On :ro with crash-recovery state, this probe causes the shadow-replay error.
Is skipping schema creation in read-only safe? Yes. Schema queries (SCHEMA_QUERIES) are DDL (CREATE NODE TABLE IF NOT EXISTS, etc.). Running them read-only would either fail silently (isReadOnlyDbError is swallowed in runSchemaCreationQueries) or throw. Skipping them entirely for read-only opens is correct — the schema was created by the writable gitnexus analyze run.
Data-safety: The allowQuarantine: false policy in the read-only path correctly prevents discarding any WAL content. A large orphan WAL (potentially containing unrecovered data) is left intact — only logged as a warning. This is the safe choice. Tiny orphan WALs (≤4KB, effectively empty WAL files from initialization) are also preserved when allowQuarantine: false. No data can be lost by the read-only path.
LADYBUGDB-CONTRACT coupling: The isReadOnlyShadowReplayError regex in sidecar-recovery.ts (/replay shadow pages under read-only mode/i) is version-pinned to @ladybugdb/core ^0.16.1. If LadybugDB changes this error message, the ensureReadOnlyConnectionUsable shadow-replay branch won't trigger and the writable open won't be attempted — the error would instead fall through to if (!isReadOnlyShadowReplayError(err)) { throw err }, surfacing the raw LadybugDB error. This is acceptable degradation.
9. Test Plan Assessment
PR test plan:
-
npx tsc --noEmit -
npx vitest run test/unit/repo-manager.test.ts -
npx vitest run test/integration/lbug-lock-retry.test.ts - Manual Docker
:rotest
What's actually changed: Only lbug-adapter.ts — specifically doInitLbug and the path for readOnly=true.
Missing tests (blockers/required follow-up):
| Test | Why missing | Severity |
|---|---|---|
Unit: doInitLbug(readOnly=true) never calls acquireInitLock |
The core regression (#1783) has no automated pin | Required before merge |
Unit: doInitLbug(readOnly=true) never calls fs.mkdir |
Secondary mutation not tested | Required before merge |
| Unit: read-only open on mock-EROFS workspace succeeds (no lock created) | Tests the exact failure scenario from #1783 | Required before merge |
Unit: ensureReadOnlyConnectionUsable with writable-open EROFS surfaces actionable error |
Tests BLOCKING-1 scenario | Required if BLOCKING-1 is fixed |
Integration: read-only + allowQuarantine: false does not rename WAL |
Tests finalizeLbugSidecarsAfterClose on :ro |
Nice-to-have |
| Integration: writable path still quarantines tiny orphan WAL | Regression for #1622/#1747 | Existing test covers this indirectly via sidecar-recovery.test.ts |
What tests exist that give confidence:
lbug-orphan-sidecar-recovery.test.ts— integration tests for writableinitLbugwith orphan sidecars. These pass (test plan says so), confirming the writable path is unbroken.sidecar-recovery.test.ts— unit tests forpreflightLbugSidecars,finalizeLbugSidecarsAfterClose,inspectLbugSidecars. These pass.lbug-lock-retry.test.ts— lock retry tests (writable path).
None of these test the read-only doInitLbug path directly.
10. Final Recommendation
Request changes on BLOCKING-1 before merge. The fix for the primary acquireInitLock regression (#1783) is correct and well-structured. The writable path is provably unchanged. The code is clean and the split is architecturally sound.
Minimal patch direction:
-
In
ensureReadOnlyConnectionUsable(line ~544), catch EROFS/EACCES/EPERM from the writableopenLbugConnectioncall and surfaceshadowSidecarRecoveryMessagewith an additional note about read-only filesystem. This prevents raw EROFS from reaching the user and completes the "no surprising write errors on:ro" contract. -
Add a unit test in
lbug-orphan-sidecar-recovery.test.ts(or a newlbug-readonly-init.test.ts) that:- Creates a real DB (writable)
- Mocks or verifies
acquireInitLockis never called duringwithLbugDb(dbPath, ..., { readOnly: true }) - Confirms no
lbug.init.lockfile is created on disk
These two changes would bring this PR to "Approve with nits" standing.
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 9668 tests passed 1 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
|
@Avicennasis Thank you for raising this PR! Really good job! Could you please look into claude's findings? 🙏 |
|
@magyargergo Thanks for the review! I've pushed two commits addressing the findings: BLOCKING-1 (EROFS in shadow-replay recovery): MEDIUM-1 (No regression test): Added
Also fixed a variable scoping issue — the original NIT-2 / NIT-3 — Acknowledged but deferred; they're cosmetic and don't affect correctness. MEDIUM-2 ( |
|
Claude finished @magyargergo's task in 7m 4s —— View job Deep Follow-up PR Review: fix(lbug): skip init lock and filesystem mutations for read-only opens (#1784)Tasks:
Review Bar for This PRA read-only Docker Problem Being Solved
Current PR StateHead SHA: CI on current head: Only 5 completed workflows visible (Dependency Review, Gitleaks, Docker Build & Push, PR Autofix, CodeQL — all green). The main test pipeline (unit/integration on Ubuntu/macOS/Windows with 9554+ tests) is not confirmed on the current head FindingsFINDING 1 — BLOCKING: New test file uses wrong
|
| Operation | Classification |
|---|---|
preflightLbugSidecars(... allowQuarantine: false) |
Read-only: calls inspectLbugSidecars (stat reads only); allowQuarantine: false returns at sidecar-recovery.ts:241 before any rename/quarantine |
openLbugConnection(lbug, dbPath, { readOnly: true }) |
Read-only: LadybugDB opens in read-only mode |
ensureReadOnlyConnectionUsable(dbPath, opened) |
Read-only probe + EROFS-caught writable fallback (see Finding 2) |
loadFTSExtension(undefined, { policy: 'load-only' }) |
Read-only: policy: 'load-only' skips DDL and INSTALL |
finalizeLbugSidecarsAfterClose (in safeClose) |
Best-effort non-fatal: quarantineWalForMissingShadow failure is caught at sidecar-recovery.ts:303–310 and logged as warn, never thrown |
acquireInitLock, fs.mkdir, fs.unlink, fs.rm, and fs.rename are all in the else branch only. ✅
Blocks merge: No.
FINDING 4 — CONFIRMED CLEAN: Writable path preservation
Confirmed evidence by diffing the else branch against origin/main behavior:
The else block preserves the exact pre-PR ordering from #1622:
fs.lstat+ symlink/directory removal (unchanged) ✅acquireInitLock(dbPath)(unchanged) ✅fs.access(dbPath)+ orphan sidecarfs.unlink(unchanged, inside try under lock) ✅fs.mkdir(parentDir, { recursive: true })(unchanged, after orphan cleanup) ✅preflightLbugSidecars(... allowQuarantine: true)(unchanged) ✅openLbugConnection(lbug, dbPath)writable open (unchanged) ✅releaseInitLock()infinally(unchanged) ✅runSchemaCreationQueriesandreopenWritableAfterMissingShadowretry (unchanged) ✅loadFTSExtension(undefined, {})default policy (unchanged) ✅
Blocks merge: No.
FINDING 5 — CONFIRMED CLEAN: Unicode and file hygiene
Confirmed evidence:
git diff --check HEAD~3..HEAD → clean (no output)
grep -Pn "[\x{202A}-\x{202E}\x{2066}-\x{2069}]" → no matches
Non-ASCII characters found: em dashes (—, U+2014) in comments only, consistent with existing file style throughout lbug-adapter.ts. No bidi controls. ✅
FINDING 6 — CONFIRMED: CI on current head is incomplete
Confirmed evidence: mcp__github_ci__get_ci_status returns 5 completed workflows for the current head — Dependency Review, Gitleaks, Docker Build, PR Autofix, CodeQL. The main CI workflow (unit/integration tests, multi-platform) is absent from completed runs. The 9554-test report in PR comments predates the c6660db/f6b8e7f commits. CI cannot be confirmed green on the current head for the test suite that would catch the broken tests.
Blocks merge: No on its own, but combined with FINDING 1 (broken tests) this means there is no automated validation for the current head.
Assessment Sections
1. Read-only mutation boundary
Clean. All fs.* mutations are gated by the else branch. allowQuarantine: false in preflight is enforced at sidecar-recovery.ts:241. Post-close finalizer is best-effort non-fatal. ✅
2. Shadow-replay recovery
Resolved. The writable fallback attempt still occurs (required by LadybugDB semantics) but permission failures are caught and surfaced as actionable recovery guidance. Policy contract: "clean read-only opens do not mutate; crash-recovery requiring shadow replay surfaces clear guidance." This is acceptable. ✅
3. Error taxonomy
Correct. Only EROFS/EACCES/EPERM convert to the shadow-replay recovery message. All other errors propagate unmodified. ✅
4. Writable path preservation
Identical to pre-PR behavior. Ordering, locking, cleanup, and finally-block semantics all preserved. ✅
5. Test strength
Broken. The test file calls withTestLbugDB with the wrong argument order and wrong types. Both it tests fail at runtime with "withTestLbugDB did not attach handle". See FINDING 1.
Even if fixed for correct API usage, test 2 ("does not create parent directories") still needs an fs.mkdir spy — asserting the parent exists beforehand does not prove mkdir was not called.
6. Cross-platform behavior
The EROFS/EACCES/EPERM catch covers Linux Docker :ro, macOS Docker Desktop (EACCES/EPERM from VirtioFS), and Windows (EPERM). WSL2 would surface EROFS like Linux. CI root containers: the tests should not rely on chmod simulation (and don't). ✅ for the code. Tests don't cover EPERM path at all (broken).
7. Observability
Recovery error message (lines 552–556) is clear: identifies the shadow-replay state, instructs operator to mount read-write or run gitnexus analyze. shadowSidecarRecoveryMessage provides the original LadybugDB error text. Adequate. ✅
8. Unicode hygiene
Em dashes in comments only. No hidden/bidi controls. ✅
Back-and-forth Avoided
- The production code change (
lbug-adapter.ts) is correct. The read-only fast path, the EROFS handling fix, and the writable path preservation are all verified directly in the file. - The
f6b8e7fscoping fix correctly hoistsshadowReplayErr— the variable is in scope at the EROFS catch. - The
preflightLbugSidecarsallowQuarantine: falseearly return atsidecar-recovery.ts:241is confirmed: inspection-only, no mutations. finalizeLbugSidecarsAfterClosefailure on:romount is non-fatal (caught atsidecar-recovery.ts:303–310) — this remains an accepted nit from the prior review.
Final Verdict
not production-ready
The production code change in lbug-adapter.ts is correct: the read-only fast path is clean, the EROFS handling in ensureReadOnlyConnectionUsable is properly scoped and actionable, the writable path is structurally identical to the pre-PR baseline, and there are no bidi controls. However, the new test file lbug-readonly-init.test.ts calls withTestLbugDB with incorrect arguments (describe function as prefix, seed options object as the fn callback), causing a TypeError: fn is not a function in the nested describe callback and both it tests to fail with "withTestLbugDB did not attach handle". The DoD criterion — "Automated tests would fail if the read-only path called acquireInitLock, fs.mkdir, or tried to create ${dbPath}.init.lock" — is not met because neither test runs against a real DB handle. Additionally, the main test pipeline has not completed on the current head f6b8e7f. The test file needs to be rewritten using the correct callback-based API (consistent with every other test in the suite), with handle access from the closure rather than ctx.lbugHandle, and test 2 should add an fs.mkdir spy to actually prove the mutation is skipped rather than just asserting the parent directory exists.
|
@Avicennasis please look into this review and please make sure that the CI is healthy |
|
@magyargergo Apologies for the broken tests — I used the wrong What was wrong: Passed What's fixed:
CI should be clean on this push. |
|
@Avicennasis tests are failing again |
|
@magyargergo Pushed Fix: Removed the seed entirely. The test only needs a valid, empty DB to verify that read-only opens don't create filesystem artifacts. This matches the pattern in Changes in this push:
|
|
Please make sure the removal retries are in place for the cleanup 🙏 |
|
@Avicennasis please revert the mkdir mocking from your integration tests |
#1783) `doInitLbug` unconditionally called `acquireInitLock`, which creates `${dbPath}.init.lock` inside the workspace. On a Docker `:ro` bind mount this fails with EROFS. The init lock prevents a TOCTOU race during DB creation — read-only opens never create databases and don't need it. Split the init path: - Read-only: skip path cleanup, init lock, orphan sidecar removal, and mkdir. Go straight to preflightLbugSidecars (allowQuarantine: false) then openLbugConnection with readOnly: true. - Writable: unchanged behavior (lock, cleanup, open). - Shadow-replay recovery: catch EROFS/EACCES/EPERM from the writable fallback in ensureReadOnlyConnectionUsable and surface an actionable error instead of a raw filesystem exception. Includes integration test verifying read-only open never creates lbug.init.lock on disk. Fixes #1783
c952c8a to
c2b1222
Compare
|
@magyargergo Squashed to a single clean commit ( Changes from previous iteration:
Local validation:
Cleanup is handled by the |
Summary
doInitLbuginto a read-only fast path and a writable pathacquireInitLock, orphan sidecar removal, andfs.mkdir— all of which write to the workspace filesystempreflightLbugSidecars(withallowQuarantine: false) →openLbugConnection→ensureReadOnlyConnectionUsableFixes the
EROFS: read-only file system, open '/workspace/dir/.gitnexus/lbug.init.lock'error reported in #1783 and by @sengirab in #1549.This is the companion fix to #1550, which addressed the same
:romount issue inensureGitNexusIgnored. The LadybugDB adapter was the remaining code path that unconditionally mutated the workspace filesystem.Test plan
npx tsc --noEmit— cleannpx vitest run test/unit/repo-manager.test.ts— existing tests passnpx vitest run test/integration/lbug-lock-retry.test.ts— lock retry tests pass:roworkspace mount →gitnexus index+ query via MCP → no EROFS