Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
97234fb
Initial plan
Copilot May 22, 2026
dffc711
fix(analyze): add WAL auto-checkpoint CLI control and default-off beh…
Copilot May 22, 2026
3a2bc30
test(analyze): share lbug auto-checkpoint parsing and align validation
Copilot May 22, 2026
7ca6469
fix(analyze): always enable lbug auto-checkpoint and expose threshold…
Copilot May 22, 2026
8828eee
refactor(lbug): inline always-on auto-checkpoint constructor arg
Copilot May 22, 2026
77ace7c
fix(analyze): guide checkpoint-threshold on Ladybug WAL checkpoint IO…
Copilot May 22, 2026
508cea1
test(analyze): cover checkpoint IO guidance and add integration guard
Copilot May 22, 2026
25401a0
Merge branch 'main' into copilot/fix-wal-rename-crash
magyargergo May 22, 2026
b6af46a
fix(analyze): tighten checkpoint IO detection and remove test hook
Copilot May 22, 2026
46d1243
fix(analyze): remove checkpoint test hook and tighten error matching
Copilot May 22, 2026
7f7bbad
fix(analyze): rename to wal-checkpoint-threshold, raise default, add …
May 22, 2026
9ef7caa
Merge branch 'main' into copilot/fix-wal-rename-crash
magyargergo May 22, 2026
d34d43a
chore(lbug): remove dead jitteredDelay helper and apply prettier
May 22, 2026
f700be5
Merge branch 'main' into copilot/fix-wal-rename-crash
magyargergo May 22, 2026
845090a
Merge branch 'main' into copilot/fix-wal-rename-crash
magyargergo May 22, 2026
7eb298c
Merge branch 'main' into copilot/fix-wal-rename-crash
magyargergo May 22, 2026
3c7b83b
Merge branch 'main' into copilot/fix-wal-rename-crash
magyargergo May 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ gitnexus analyze --skip-git # Index folders that are not Git repositories
gitnexus analyze --embeddings # Enable embedding generation (slower, better search)
gitnexus analyze --verbose # Log skipped files when parsers are unavailable
gitnexus analyze --worker-timeout 60 # Increase worker idle timeout for slow parses
gitnexus analyze --wal-checkpoint-threshold 67108864 # 64 MiB. Control LadybugDB WAL auto-checkpoint threshold (default: 67108864 = 64 MiB; -1 keeps Ladybug stock ~16 MiB)
gitnexus analyze --workers <n> # Parse worker pool size (default: cores-1, capped at 16; 0 = sequential)
gitnexus mcp # Start MCP server (stdio) — serves all indexed repos
gitnexus serve # Start local HTTP server (multi-repo) for web UI connection
Expand Down Expand Up @@ -241,6 +242,7 @@ Most `analyze` knobs are also CLI flags (`--workers`, `--worker-timeout`, `--max
| `GITNEXUS_PROFILE_DEFERRED_SLOW_MS` | `3000` (verbose) / `5000` | Per-file threshold in ms above which `processCallsFromExtracted` emits a `slow file …` log line. Parsed via `Number()`: accepts integers (`5000`), scientific notation (`2.5e3`), decimals (`.5`), and hex (`0x10`). Non-finite or non-positive values fall back to the default. | Hunting a few outlier files dominating the deferred call-resolution stage; lower to surface more, raise to focus only on the worst. |
| `GITNEXUS_MAX_FILE_SIZE` | `512` (KB) | Walker skip threshold in KB. Hard cap is `32768` (tree-sitter buffer ceiling). Equivalent to `--max-file-size <kb>`. | Indexing repos with intentionally-large source files (generated parsers, vendored bundles) that should still be parsed. |
| `GITNEXUS_WORKER_SUB_BATCH_TIMEOUT_MS` | `30000` | Worker idle timeout in milliseconds before retry/fallback. Equivalent to `--worker-timeout <seconds>` × 1000. | Slow-parsing files (large minified JS, deeply-nested TS types) that legitimately need more than 30s. |
| `GITNEXUS_WAL_CHECKPOINT_THRESHOLD` | `67108864` (64 MiB) | LadybugDB WAL auto-checkpoint threshold in bytes. Equivalent to `--wal-checkpoint-threshold <bytes>`. `-1` keeps LadybugDB's stock threshold (~16 MiB). Larger thresholds reduce checkpoint frequency but increase the WAL size at rotation time — choose a smaller value on disk-constrained environments. | You need a larger or smaller WAL auto-checkpoint threshold for your analyze workload. |
| `GITNEXUS_WORKER_SUB_BATCH_MAX_BYTES` | `8388608` (8 MB) | Per-job byte budget the pool will send to a worker in one `postMessage`. | Very large individual files; mostly diagnostic — bumping past 8 MB risks structured-clone memory pressure. |
| `GITNEXUS_WORKER_MAX_RESPAWNS_PER_SLOT` | `3` | Max replacement spawns per worker slot before the slot is dropped from the active rotation. Bounds respawn loops on a chronically-crashing slot. | Hosts where a flaky worker should retry more (raise) or fail-fast (lower) before the slot is dropped. |
| `GITNEXUS_WORKER_MAX_CUMULATIVE_TIMEOUT_MS` | `5 × subBatchTimeoutMs` | Total retry wall-time budget per job before quarantining. Combined with `timeoutBackoffFactor`, prevents exponentially-growing retries from stalling for hours. | Slow files that legitimately need long total retry windows; lower to fail-fast on stalls. |
Expand Down
2 changes: 2 additions & 0 deletions gitnexus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ gitnexus analyze --skip-agents-md # Preserve custom AGENTS.md/CLAUDE.md gitnexu
gitnexus analyze --verbose # Log skipped files when parsers are unavailable
gitnexus analyze --max-file-size 1024 # Skip files larger than N KB (default: 512, cap: 32768)
gitnexus analyze --worker-timeout 60 # Increase worker idle timeout for slow parses
gitnexus analyze --wal-checkpoint-threshold 67108864 # 64 MiB. Control LadybugDB WAL auto-checkpoint threshold (default: 67108864 = 64 MiB; -1 keeps Ladybug stock ~16 MiB)
gitnexus mcp # Start MCP server (stdio) — serves all indexed repos
gitnexus serve # Start local HTTP server (multi-repo) for web UI
gitnexus index # Register an existing .gitnexus/ folder into the global registry
Expand Down Expand Up @@ -307,6 +308,7 @@ Configure the behavior with two environment variables:
|----------|--------|---------|--------|
| `GITNEXUS_LBUG_EXTENSION_INSTALL` | `auto`, `load-only`, `never` | `auto` | `auto` runs one bounded INSTALL if LOAD fails. `load-only` only uses already-installed extensions (recommended for offline / firewalled environments). `never` skips optional extensions entirely. |
| `GITNEXUS_LBUG_EXTENSION_INSTALL_TIMEOUT_MS` | positive integer | `15000` | Wall-clock budget for the out-of-process `INSTALL` child before it is killed. |
| `GITNEXUS_WAL_CHECKPOINT_THRESHOLD` | integer `>= -1` | `67108864` (64 MiB) | LadybugDB WAL auto-checkpoint threshold during analyze (bytes). Auto-checkpoint remains enabled; `-1` keeps Ladybug's stock ~16 MiB. Larger thresholds reduce checkpoint frequency but increase the WAL size at rotation time — choose a smaller value on disk-constrained environments. |

```bash
# Offline/airgapped: never reach the network for extensions
Expand Down
44 changes: 43 additions & 1 deletion gitnexus/src/cli/analyze.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,12 @@ import { spawn } from 'child_process';
import v8 from 'v8';
import cliProgress from 'cli-progress';
import { closeLbug } from '../core/lbug/lbug-adapter.js';
import { isWalCorruptionError, WAL_RECOVERY_SUGGESTION } from '../core/lbug/lbug-config.js';
import {
isLbugCheckpointIoError,
isWalCorruptionError,
parseWalCheckpointThreshold,
WAL_RECOVERY_SUGGESTION,
} from '../core/lbug/lbug-config.js';
import {
getStoragePaths,
getGlobalRegistryPath,
Expand Down Expand Up @@ -415,6 +420,15 @@ const forceHeapOOMForTestIfEnabled = (): void => {
for (;;) chunks.push('x'.repeat(1024 * 1024));
};

// 64 MiB keeps auto-checkpoint enabled but triggers less frequently than
// Ladybug's stock ~16 MiB threshold, reducing rename/remove churn on large
// runs. Also matches the GitNexus default in `lbug-config.ts`.
//
// IMPORTANT: keep README examples (`README.md`, `gitnexus/README.md`) and
// the `DEFAULT_WAL_CHECKPOINT_THRESHOLD` constant in
// `gitnexus/src/core/lbug/lbug-config.ts` in sync with this value.
const RECOMMENDED_WAL_CHECKPOINT_THRESHOLD = 64 * 1024 * 1024;

/** Re-exec the process with a 16GB heap and larger stack if we're currently below that. */
async function ensureHeap(): Promise<boolean> {
const nodeOpts = process.env.NODE_OPTIONS || '';
Expand Down Expand Up @@ -477,6 +491,8 @@ const ANALYZE_CLI_ENV_KEYS = [
'GITNEXUS_PROFILE_DEFERRED_SLOW_MS',
'GITNEXUS_MAX_FILE_SIZE',
'GITNEXUS_WORKER_SUB_BATCH_TIMEOUT_MS',
'GITNEXUS_WAL_CHECKPOINT_THRESHOLD',
'GITNEXUS_WAL_MANUAL_CHECKPOINT',
'GITNEXUS_EMBEDDING_THREADS',
'GITNEXUS_EMBEDDING_BATCH_SIZE',
'GITNEXUS_EMBEDDING_SUB_BATCH_SIZE',
Expand Down Expand Up @@ -562,6 +578,8 @@ export interface AnalyzeOptions {
maxFileSize?: string;
/** Override worker sub-batch idle timeout in seconds. */
workerTimeout?: string;
/** Control LadybugDB WAL auto-checkpoint threshold during analyze. */
walCheckpointThreshold?: string;
/** Parse worker pool size; 0 disables workers (sequential fallback). */
workers?: string;
embeddingThreads?: string;
Expand Down Expand Up @@ -633,6 +651,16 @@ const analyzeCommandImpl = async (inputPath?: string, options?: AnalyzeOptions):
);
}

if (options?.walCheckpointThreshold !== undefined) {
const parsed = parseWalCheckpointThreshold(options.walCheckpointThreshold);
if (parsed === undefined) {
cliError(' --wal-checkpoint-threshold must be an integer >= -1.\n');
process.exitCode = 1;
return;
}
process.env.GITNEXUS_WAL_CHECKPOINT_THRESHOLD = String(parsed);
}

// `--workers` is threaded through `runFullAnalysis` options → PipelineOptions
// → createWorkerPool, intentionally bypassing the GITNEXUS_WORKER_POOL_SIZE
// env channel so this CLI surface never mutates `process.env` for pool size.
Expand Down Expand Up @@ -1130,6 +1158,20 @@ const analyzeCommandImpl = async (inputPath?: string, options?: AnalyzeOptions):
return;
}

if (isLbugCheckpointIoError(err)) {
cliError(
` LadybugDB failed while rotating/removing WAL checkpoint files.\n` +
` This can happen when auto-checkpoint runs at the default threshold (~16MB).\n` +
` Retry with a larger checkpoint threshold to reduce checkpoint frequency:\n` +
` gitnexus analyze --wal-checkpoint-threshold ${RECOMMENDED_WAL_CHECKPOINT_THRESHOLD}\n` +
` (or set GITNEXUS_WAL_CHECKPOINT_THRESHOLD=${RECOMMENDED_WAL_CHECKPOINT_THRESHOLD})\n` +
` (Try 33554432 = 32 MiB on small-disk / CI runners.)\n`,
{ recoveryHint: 'wal-checkpoint-threshold' },
);
process.exitCode = 1;
return;
}

// HF download failure — show clean guidance without the raw stack trace.
// Checked before writeFatalToStderr so the user sees one focused message
// rather than a stack-trace dump followed by a second remediation block.
Expand Down
38 changes: 35 additions & 3 deletions gitnexus/src/cli/cli-message.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,38 @@
*/
import { logger } from '../core/logger.js';

/**
* String-literal union of all `recoveryHint` tags emitted by the CLI.
*
* Centralized so a new recovery branch added in `analyze.ts` cannot land
* without updating this union — TypeScript will reject the unknown literal
* passed via `cliError({ recoveryHint: '...' })`. To add a new hint:
* 1. Add the tag string to this union.
* 2. Pass it as the `recoveryHint` field at the relevant `cliError`
* call site.
*
* Consumers can import this type to narrow log-record `recoveryHint`
* fields without restating the literal list.
*/
export type RecoveryHint =
| 'wal-corruption'
| 'wal-checkpoint-threshold'
| 'heap-oom-respawn'
| 'native-worker-abort'
| 'hf-endpoint-unreachable'
| 'large-repo'
| 'npm-resolution'
| 'module-not-found';

/**
* Common shape for the optional structured-field bag passed to
* `cliError`/`cliWarn`/`cliInfo`. Typed so the `recoveryHint` slot is
* checked against the {@link RecoveryHint} union.
*/
export interface CliMessageFields extends Record<string, unknown> {
recoveryHint?: RecoveryHint;
}

function writeStderr(msg: string): void {
// Direct write — bypassing `console.*` so it cannot be intercepted by
// progress-bar redirection (see `cli/analyze.ts:barLog`) or other
Expand All @@ -41,7 +73,7 @@ function writeStderr(msg: string): void {
* User-facing informational message. Use for banners, listening URLs,
* and any message the user expects to read in plain text.
*/
export function cliInfo(msg: string, fields?: Record<string, unknown>): void {
export function cliInfo(msg: string, fields?: CliMessageFields): void {
writeStderr(msg);
logger.info(fields ?? {}, msg);
}
Expand All @@ -50,7 +82,7 @@ export function cliInfo(msg: string, fields?: Record<string, unknown>): void {
* User-facing warning. Operator-actionable but non-fatal — `cliWarn`
* indicates the command can still proceed in some form.
*/
export function cliWarn(msg: string, fields?: Record<string, unknown>): void {
export function cliWarn(msg: string, fields?: CliMessageFields): void {
writeStderr(msg);
logger.warn(fields ?? {}, msg);
}
Expand All @@ -59,7 +91,7 @@ export function cliWarn(msg: string, fields?: Record<string, unknown>): void {
* User-facing error. Indicates the command cannot proceed; usually
* paired with a non-zero exit code at the call site.
*/
export function cliError(msg: string, fields?: Record<string, unknown>): void {
export function cliError(msg: string, fields?: CliMessageFields): void {
writeStderr(msg);
logger.error(fields ?? {}, msg);
}
7 changes: 7 additions & 0 deletions gitnexus/src/cli/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,11 @@ program
'--worker-timeout <seconds>',
'Worker sub-batch idle timeout before retry/fallback. Default: 30.',
)
.option(
'--wal-checkpoint-threshold <bytes>',
'LadybugDB WAL auto-checkpoint threshold in bytes during analyze ' +
'(integer >= -1; default: 67108864 = 64 MiB; -1 keeps Ladybug stock ~16 MiB).',
)
.option(
'--workers <n>',
'Parse worker pool size. Default: cores-1 capped at 16. Pass 0 to disable workers (sequential).',
Expand All @@ -85,6 +90,7 @@ program
' GITNEXUS_NO_GITIGNORE=1 Skip .gitignore parsing (still reads .gitnexusignore)\n' +
' GITNEXUS_MAX_FILE_SIZE=N Override large-file skip threshold (KB). Default 512, max 32768.\n' +
' GITNEXUS_WORKER_SUB_BATCH_TIMEOUT_MS=N Worker idle timeout in milliseconds. Default 30000.\n' +
' GITNEXUS_WAL_CHECKPOINT_THRESHOLD=N LadybugDB WAL auto-checkpoint threshold in bytes (default 67108864 = 64 MiB; -1 keeps Ladybug stock ~16 MiB).\n' +
' GITNEXUS_WORKER_SUB_BATCH_MAX_BYTES=N Worker job byte budget. Default 8388608.\n' +
' GITNEXUS_WORKER_POOL_SIZE=N Parse worker count override. Default cores-1 capped at 16.\n' +
' GITNEXUS_PARSE_CHUNK_CONCURRENCY=N Concurrent in-flight parse chunks. Default 2.\n' +
Expand All @@ -93,6 +99,7 @@ program
' GITNEXUS_WORKER_CONSECUTIVE_FAILURE_THRESHOLD=N Per-slot deaths to trip circuit breaker. Default max(3, poolSize).\n' +
' GITNEXUS_EMBEDDING_THREADS=N Limit local ONNX CPU threads for --embeddings.\n' +
' GITNEXUS_SEMANTIC_EXACT_SCAN_LIMIT=N Max embedding chunks for exact-scan fallback. Default 10000.\n' +
'\nFlags override the corresponding env vars when both are provided.\n' +
'\nTip: `.gitnexusignore` supports `.gitignore`-style negation. Add e.g.\n' +
' `!__tests__/` to index a directory that is auto-filtered by default (#771).',
)
Expand Down
21 changes: 21 additions & 0 deletions gitnexus/src/core/lbug/lbug-adapter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1527,6 +1527,27 @@ export const flushWAL = async (): Promise<void> => {
}
};

/**
* Issue a manual `CHECKPOINT` against the current connection and surface
* any engine error to the caller. Unlike {@link flushWAL}, this variant
* does NOT swallow Ladybug rename/remove IO failures — the manual
* checkpoint driver (`wal-checkpoint-driver.ts`) relies on the rejection
* to drive its bounded retry loop. Returns `false` when no connection is
* open (the caller treats this as a no-op success — there is no WAL to
* flush). Returns `true` after a successful CHECKPOINT + drain.
*
* The split from `flushWAL` is deliberate: every other CHECKPOINT site
* (server flush, safeClose) is best-effort and prefers a silent skip;
* the manual driver, by contrast, must observe failures to decide
* whether to retry.
*/
export const tryFlushWAL = async (): Promise<boolean> => {
if (!conn) return false;
const checkpointResult = await conn.query('CHECKPOINT');
await drainQueryResult(checkpointResult);
return true;
};

/**
* Flush the WAL and close the connection and database handles.
*
Expand Down
Loading
Loading