fix(lbug): add WAL checkpoint-threshold control#1772
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@copilot auto checkpoint must be always on no need cli option for that but the size is limited to 16mb when the last parameter set to -1 and that's what i want to be able to control |
Implemented in
|
|
@copilot we need to write integration tests as well and if there's a failure about that we need to recommend using this new option. Read ladybug documentation around this checkpoint size and what exception it will raise to nodejs. |
Addressed in I added checkpoint-failure guidance in I also added tests:
|
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 9554 tests passed 1 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
|
@copilot update title + description to reflect our changes |
Updated wording proposal to match the implemented changes: Title Description
|
…manual checkpoint driver with retry Address review feedback on PR #1772: - Rename CLI flag, env var, AnalyzeOptions field, recovery-hint tag, and parser/constants from lbug-* to engine-neutral wal-* (matches the existing WAL_RECOVERY_SUGGESTION / isWalCorruptionError convention). - Raise default threshold from -1 (Ladybug stock ~16 MiB) to 64 MiB so users on the default config no longer hit the original rename/remove race. - Align both READMEs to publish 67108864 (64 MiB) instead of 65536 (which would have made the crash more frequent). - Add wal-checkpoint-driver.ts: a periodic manual CHECKPOINT driver wrapped in a 3-attempt jittered retry (50/200/500 ms), driven from runFullAnalysis. Opt-out via GITNEXUS_WAL_MANUAL_CHECKPOINT=0. Moves the race window into a JS-controllable retry surface while keeping native auto-checkpoint on. - Move LBUG_CHECKPOINT_RENAME_RE / REMOVE_RE plus the predicate (renamed to isLbugCheckpointIoError) into lbug-config.ts alongside isWalCorruptionError. Predicate is now exported. Add a permissive fallback matcher and pin the matched Ladybug version in comments. - Warn instead of silently defaulting when GITNEXUS_WAL_CHECKPOINT_THRESHOLD is set to a non-empty unparseable value (closes the CLI-vs-env asymmetry). - Add a typed RecoveryHint string-literal union in cli-message.ts so future hint tags can't drift. - Add a real integration test under test/integration/ that triggers a Ladybug checkpoint IO failure via a pre-existing directory at the rename target (portable across platforms; no test-only injection hook). - Add small-disk / CI caveat (32 MiB secondary suggestion) to the recovery hint and README env-var rows. - Document CLI/env precedence in the analyze --help block. - Help placeholder: <value> -> <bytes>. - Rename analyze-lbug-auto-checkpoint.test.ts to use the new wal-* token.
✨ PR AutofixFound fixable formatting / unused-import issues across 170 changed lines. Comment |
- Drop unused `jitteredDelay` function flagged by CodeQL in PR #1772; the retry loop already inlines the same calculation with the injectable `randomImpl` so the helper was dead. Move the non-cryptographic-by-design comment next to the actual jitter site. - Apply `prettier --write` to wal-checkpoint-driver.ts and the new integration test to absorb the PR autofix bot's formatting findings.
Fixes the WAL rename/remove race that crashed large
gitnexus analyzeruns mid-index when LadybugDB auto-checkpoint hit the default ~16 MiB threshold. Keeps native auto-checkpoint always on (per maintainer requirement) and adds a JS-side manual checkpoint driver that pre-empts the native rotation, so the race window moves into a path where bounded retries can absorb transient FS failures (AV scanners, NTFS shadow copy, file locks).Behavior
gitnexus/src/core/lbug/wal-checkpoint-driver.ts): periodicCHECKPOINTissued from JS during analyze, wrapped in 3 attempts with 50 / 200 / 500 ms jittered back-off. Retries only onisLbugCheckpointIoError; every other error surfaces immediately. Opt-out viaGITNEXUS_WAL_MANUAL_CHECKPOINT=0. Driver is analyze-only; MCP and other long-lived flows keep relying on the close-timesafeCloseflush.-1(Ladybug stock ~16 MiB) to67108864(64 MiB). Default users no longer hit the original race; the recovery hint and the new default match.lbug-config.tsnext toisWalCorruptionError; matcher uses strict + permissive regex tiers so a future Ladybug message-shape change degrades to the permissive fallback instead of silently breaking. Ladybug version is pinned in a comment beside the regexes.Public surface
--wal-checkpoint-threshold <bytes>(integer ≥ -1; default 67108864;-1keeps Ladybug stock ~16 MiB).GITNEXUS_WAL_CHECKPOINT_THRESHOLD(same semantics). Flag overrides env when both are set — documented in--help.logger.warnand falls back to the default, closing the previous CLI-vs-env asymmetry (CLI rejects; env silently fell through).RecoveryHintunion ingitnexus/src/cli/cli-message.tsenumerating the 8 existing hint tags so this enum can't drift unnoticed. New tag:'wal-checkpoint-threshold'.Docs
README.mdandgitnexus/README.mdupdated with the new flag, env var, default (67108864), and a tradeoff note: larger threshold reduces checkpoint frequency but increases WAL size at rotation.Tests
gitnexus/test/integration/analyze-wal-checkpoint-failure.test.ts— drives a reallbug.Database+ CLI spawn viatsxand provokes a real checkpoint failure portably (planting a directory at the rename target). Novi.mockof the orchestrator; no test-only injection hook.wal-*rename, the strict + permissive regex tiers, and the warn-on-invalid-env path.Verification
tsc --noEmit: clean.Example: