Skip to content

feat(hooks): auto-reindex after git commit with embeddings preservation#205

Merged
magyargergo merged 2 commits into
abhigyanpatwari:mainfrom
L1nusB:auto-analyze
Mar 7, 2026
Merged

feat(hooks): auto-reindex after git commit with embeddings preservation#205
magyargergo merged 2 commits into
abhigyanpatwari:mainfrom
L1nusB:auto-analyze

Conversation

@L1nusB

@L1nusB L1nusB commented Mar 6, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds a PostToolUse hook that automatically re-runs gitnexus analyze after git commit or git merge, keeping the knowledge graph index fresh without manual intervention
  • Detects and preserves embeddings: reads .gitnexus/meta.json to check if the previous index included embeddings (stats.embeddings > 0) and passes --embeddings accordingly — preventing accidental deletion of expensive vector embeddings
  • Persists embeddings count in meta.json so hooks and external tools can detect previous embeddings without opening KuzuDB
  • Adds a "Keeping the Index Fresh" section to the auto-generated CLAUDE.md/AGENTS.md, providing reindex guidance for all AI coding integrations (not just Claude Code)

Why

The GitNexus index becomes stale after every commit. Currently, staleness is detected reactively — MCP tools warn the agent, and the CLAUDE.md says "run analyze if stale." This has two problems:

  1. Reactive, not proactive: The agent works with stale data until it happens to trigger a staleness check, potentially making decisions based on outdated call graphs and execution flows.
  2. Embeddings get silently destroyed: Running analyze without --embeddings wipes previously generated embeddings because KuzuDB is fully rebuilt every time (lines 196-200 in analyze.ts). There was no way to detect whether embeddings existed before, so a naive reindex would lose them.

This PR solves both by automating reindex at the right moment (post-commit) with the right flags (auto-detecting embeddings).

How It Works

  1. PostToolUse hook fires after any Bash tool execution completes
  2. Hook checks if the command matches /\bgit\s+(commit|merge)(\s|$)/ — skips everything else instantly (including git merge-base and similar subcommands)
  3. Walks up the directory tree to find .gitnexus/
  4. Reads meta.json → checks stats.embeddings > 0
  5. Spawns gitnexus analyze [--embeddings] synchronously (120s timeout)
  6. Returns additionalContext to the agent confirming the index was updated

On failure or timeout, the hook returns a recovery command that includes --embeddings when appropriate, so the user can run it manually.

For non-Claude-Code integrations (Cursor, Windsurf, etc.), the new "Keeping the Index Fresh" section in AGENTS.md provides equivalent guidance as instructions the agent can follow after committing.

Changes

File Change
gitnexus/src/cli/analyze.ts Query CodeEmbedding count after indexing, persist in meta.json stats.embeddings
gitnexus/hooks/claude/gitnexus-hook.cjs Add handlePostToolUse() for auto-reindex; refactor into modular functions
gitnexus/src/cli/setup.ts Register PostToolUse hook in ~/.claude/settings.json during gitnexus setup
gitnexus/src/cli/ai-context.ts Add "Keeping the Index Fresh" section to generated CLAUDE.md/AGENTS.md
gitnexus-claude-plugin/hooks/gitnexus-hook.js Add PostToolUse handler (plugin variant)
gitnexus-claude-plugin/hooks/hooks.json Register PostToolUse hook for plugin
README.md Update editor support table to reflect PostToolUse hooks
gitnexus/skills/gitnexus-cli.md Document auto-reindex behavior in "When to run" section

Design Decisions

Why PostToolUse hook instead of git post-commit hook?

  • Git hooks aren't committed to the repo (require core.hooksPath or husky)
  • GitNexus is optional tooling — shouldn't impose workflow requirements on all contributors
  • PostToolUse catches agent-initiated commits, which is the primary use case

Why blocking (synchronous) instead of background?

  • The agent shouldn't proceed with stale data immediately after committing
  • A few seconds of indexing is worth accurate results for subsequent queries
  • Background reindex creates race conditions where the agent queries mid-rebuild

Why 120s timeout?

  • Small-to-medium repos index in 5-15 seconds
  • Large repos can take up to 60 seconds
  • 120s provides headroom without being indefinite
  • On timeout, the hook returns a fallback message suggesting manual reindex with correct flags

Why persist embeddings count in meta.json?

  • The only prior way to detect embeddings was querying KuzuDB (MATCH (e:CodeEmbedding) RETURN count(e)), which requires loading the native module
  • A JSON file read is instant and works from any context (shell hooks, external scripts)
  • The stats.embeddings field already existed in the RepoMeta interface but was never populated

Robustness

Several hardening measures were applied during review:

  • Strict git regex: /\bgit\s+(commit|merge)(\s|$)/ prevents false triggers on git merge-base, git commit-tree, etc.
  • Proper spawnSync error handling: Checks child.error/child.signal instead of relying on try/catch (spawnSync doesn't throw on timeout)
  • PreToolUse stderr guard: Only forwards augment output as additionalContext when exit code is 0 — CLI errors no longer leak into agent context
  • Single-launch CLI resolution (plugin): Detects gitnexus binary via which/where once, then runs exactly once (prevents double execution)
  • Windows compatibility: shell: true for npx fallback on Windows where .cmd files need a shell
  • Hook timeouts in correct units: seconds (not milliseconds) matching Claude Code's expected format

🤖 Generated with Claude Code

feat(hooks): auto-reindex after git commit with embeddings preservation

Add PostToolUse hook that re-runs `gitnexus analyze` after git commit/merge,
automatically detecting and preserving embeddings via meta.json stats.

- Persist embeddings count in meta.json stats.embeddings field
- Add PostToolUse handler to both hook variants (cjs + plugin)
- Register PostToolUse hook in setup.ts for Claude Code
- Add "Keeping the Index Fresh" section to generated CLAUDE.md/AGENTS.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

revert running gitanalyz

fix: address code review findings for auto-reindex hooks

- Fix hook timeout units: seconds not milliseconds (8000->8, 120000->120)
- Remove unused execFileSync import from gitnexus-hook.cjs
- Remove unused `output` variable in PostToolUse handler
- Remove spurious template interpolation in ai-context.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: clean up gitnexus-hook.cjs per review feedback

- Hoist spawnSync import to module scope
- Add shell: isWin for npx fallback on Windows
- Extract findGitNexusDir helper, reuse in both PreToolUse and PostToolUse

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(hooks): stricter git regex, proper spawnSync error handling, embeddings in recovery commands

- Tighten commit/merge regex to not match git merge-base (require \s|$ after subcommand)
- Replace try/catch with child.error/signal inspection for spawnSync timeout detection
- Include --embeddings in manual recovery commands when embeddings were detected
- Extract emitPostToolContext helper to reduce duplication
- Apply all fixes to both hook variants (cjs + plugin)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(hooks): single-launch CLI resolution, guard PreToolUse stderr on failure

- Plugin: detect gitnexus binary via which/where once, then run exactly once
  (prevents double execution when binary exists but command fails)
- Both hooks: only forward augment stderr as additionalContext when exit code
  is 0, preventing CLI error output from leaking into agent context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: update README and CLI skill for PostToolUse auto-reindex

- README: editor support table now shows PreToolUse + PostToolUse
- README: description mentions auto-reindex after commits
- gitnexus-cli skill: document auto-reindex in "When to run" section

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel

vercel Bot commented Mar 6, 2026

Copy link
Copy Markdown

@L1nusB is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

- Remove shell:true from CJS hook npx fallback, use npx.cmd on Windows
- Use sendHookResponse() consistently in both hook variants
- Fix setup.ts path escaping with JSON.stringify for safe interpolation
- Add path.isAbsolute(cwd) guards against crafted stdin input
- Reduce PreToolUse CLI timeout from 8s to 7s
- Truncate debug error messages to 200 chars
- Add 73 regression tests covering shell injection, cwd validation,
  dispatch routing, staleness detection, and cross-platform spawning
@magyargergo

Copy link
Copy Markdown
Collaborator

Security & Cross-Platform Hardening (Review Round 2)

After multi-agent code review, this commit addresses 7 findings across both hook variants and the setup installer:

Critical Fix

  • Removed shell: true from CJS hook (gitnexus-hook.cjs) — the npm-packaged hook still used shell: isWin for the npx fallback, creating a command injection vector. Now uses npx.cmd directly on Windows, matching the plugin variant.

Medium Fixes

  • Consistent sendHookResponse() — CJS handlePreToolUse was inlining JSON output instead of using the shared helper. Now both handlers in both hook variants use the same output function.
  • Safe path escaping in setup.ts — replaced manual single-quote escaping with JSON.stringify() for the injected CLI path, which correctly handles backslashes, newlines, and all special characters.

Low-Risk Hardening

  • path.isAbsolute(cwd) guards in both hook variants — defense-in-depth against crafted stdin input providing relative paths.
  • Reduced PreToolUse CLI timeout from 8s to 7s to stay within the 10s hook timeout with margin.
  • Truncated debug error messages to 200 chars to prevent log flooding.
  • Cleaned up verbose comments in Windows .cmd explanation.

Regression Tests

Added 73 new tests in test/unit/hooks.test.ts covering:

  • Shell injection prevention (no shell: true anywhere)
  • Windows .cmd extension usage
  • cwd validation (relative paths rejected)
  • sendHookResponse consistency across both variants
  • Dispatch map routing (unknown events, empty input, invalid JSON)
  • extractPattern for Grep/Glob/Bash tool inputs
  • Git mutation regex coverage (commit/merge/rebase/cherry-pick/pull)
  • PostToolUse staleness detection with temp git repos
  • Embeddings flag preservation in reindex commands

All 921 tests pass (41 test files), build clean.

@magyargergo

Copy link
Copy Markdown
Collaborator

Design Note: Notify-Only Staleness Detection (Not Auto-Reindex)

This PR intentionally does not run gitnexus analyze automatically. Instead, the PostToolUse hook performs a lightweight staleness check and notifies the agent to reindex when ready.

Why this approach is better

Running gitnexus analyze synchronously in a hook is dangerous:

  • Blocks the agent for up to 120s while KuzuDB rebuilds
  • Risk of KuzuDB corruption if the hook times out mid-write
  • Claude Code hook timeout is 10s — a full analyze would always be killed
  • The killed process leaves the database in an inconsistent state

Notify-only is safer and smarter:

  • The staleness check takes <100ms (just git rev-parse HEAD vs meta.json)
  • The agent decides when to reindex — it can finish its current task first
  • No risk of database corruption from timeouts
  • The notification includes the right command (--embeddings flag preserved if previously used)

How it works

Agent runs: git commit -m "fix: something"
  ↓
PostToolUse hook fires (matcher: Bash)
  ↓
Checks: was it a git mutation? (commit|merge|rebase|cherry-pick|pull)
  ↓
Checks: did it succeed? (exit_code === 0)
  ↓
Compares: git rev-parse HEAD vs .gitnexus/meta.json.lastCommit
  ↓
If stale → sends: "GitNexus index is stale. Run `npx gitnexus analyze` to update."
  ↓
Agent decides when to run it (not forced mid-task)

This pattern follows the same principle as LSP diagnostics — the tool reports, the user (or agent) acts.

@magyargergo magyargergo merged commit c4eaf45 into abhigyanpatwari:main Mar 7, 2026
7 of 8 checks passed
@magyargergo

Copy link
Copy Markdown
Collaborator

Thank you for your contribution @L1nusB !

@L1nusB L1nusB deleted the auto-analyze branch March 7, 2026 11:09
CrazyBunQnQ pushed a commit to CrazyBunQnQ/GitNexus that referenced this pull request Mar 8, 2026
…bhigyanpatwari#205)

Adds PostToolUse hook that detects stale GitNexus index after git mutations (commit, merge, rebase, cherry-pick, pull) and notifies the agent to reindex. Uses lightweight staleness check (git rev-parse HEAD vs meta.json) instead of running gitnexus analyze synchronously, avoiding KuzuDB corruption and 120s blocks. Security and cross-platform hardening: remove shell:true from all spawnSync calls, use .cmd extensions on Windows, add path.isAbsolute(cwd) guards, fix setup.ts path escaping with JSON.stringify, use sendHookResponse() consistently. Includes 73 regression tests.
@magyargergo magyargergo added the enhancement New feature or request label Mar 8, 2026
motolese pushed a commit to motolese/GitNexus that referenced this pull request Apr 23, 2026
…bhigyanpatwari#205)

Adds PostToolUse hook that detects stale GitNexus index after git mutations (commit, merge, rebase, cherry-pick, pull) and notifies the agent to reindex. Uses lightweight staleness check (git rev-parse HEAD vs meta.json) instead of running gitnexus analyze synchronously, avoiding KuzuDB corruption and 120s blocks. Security and cross-platform hardening: remove shell:true from all spawnSync calls, use .cmd extensions on Windows, add path.isAbsolute(cwd) guards, fix setup.ts path escaping with JSON.stringify, use sendHookResponse() consistently. Includes 73 regression tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants