Skip to content

fix(server): sanitize repo name to prevent argument injection#1305

Merged
magyargergo merged 8 commits into
abhigyanpatwari:mainfrom
RinZ27:fix/sanitize-repo-name-rebased-v2
May 11, 2026
Merged

fix(server): sanitize repo name to prevent argument injection#1305
magyargergo merged 8 commits into
abhigyanpatwari:mainfrom
RinZ27:fix/sanitize-repo-name-rebased-v2

Conversation

@RinZ27

@RinZ27 RinZ27 commented May 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Sanitizes the extracted repository name to prevent argument injection during git clone operations and ensures compatibility with various file systems.

Motivation / context

When a user provides a repository URL that ends in a segment starting with a dash (e.g., https://github.com/user/--upload-pack=payload.git), the logic extracts --upload-pack=payload as the repository name. This name is then used as the target directory in the git clone command. Git interprets leading dashes in the target path as command-line flags, potentially leading to unintended behavior or security risks.

Areas touched

  • gitnexus/ (CLI / core / MCP server)

Scope & constraints

In scope

  • Stripping leading dashes from the extracted repository name.
  • Sanitizing unsafe directory characters (<>:"/\|?*) to ensure cross-platform compatibility.
  • Consolidating logic into a shared utility in src/storage/git.ts.
  • Added unit tests to verify the sanitization logic.

Explicitly out of scope / not done here

  • Changes to the core indexing pipeline or UI.
  • Broad architectural changes to the job management system.

Implementation notes

Applying the fix within a new sanitizeRepoName helper in storage/git.ts keeps the core logic centralized. After looking at the duplicated extraction code, I decided to route both git-clone.ts and the registry inference path through this shared function. All leading dashes are now stripped to ensure the resulting name never starts with a dash, regardless of how many were provided in the URL.

Testing & verification

  • cd gitnexus && npm test
  • Verified the logic via new test cases for argument injection and unsafe characters in both git.test.ts and git-clone.test.ts. Everything is green locally.

Risk & rollout

Low risk. This only affects the naming of the local clone directory. Existing indexes are not affected unless they are re-analyzed from a URL that previously triggered the issue.

Checklist

  • PR body meets repo minimum length
  • No secrets, tokens, or machine-specific paths committed

@vercel

vercel Bot commented May 3, 2026

Copy link
Copy Markdown

@RinZ27 is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@magyargergo

Copy link
Copy Markdown
Collaborator

@RinZ27 before i review you changes, please make sure our CI is healthy! 🙏

@RinZ27 RinZ27 changed the title [CLEAN] fix(server): sanitize repo name to prevent argument injection (rebased) fix(server): sanitize repo name to prevent argument injection May 4, 2026
@RinZ27 RinZ27 force-pushed the fix/sanitize-repo-name-rebased-v2 branch from 019f1d7 to ba4b704 Compare May 5, 2026 09:57
@RinZ27

RinZ27 commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

@magyargergo ready for another look whenever you're free.

@github-actions

github-actions Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
8672 8671 0 1 403s

✅ All 8671 tests passed

1 test(s) skipped — expand for details
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 77.96% 25730/33004 N/A% 🟢 ███████████████░░░░░
Branches 66.47% 16265/24468 N/A% 🟢 █████████████░░░░░░░
Functions 83.14% 2590/3115 N/A% 🟢 ████████████████░░░░
Lines 81.01% 23259/28710 N/A% 🟢 ████████████████░░░░

📋 View full run · Generated by CI

@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 7m 27s —— View job


PR #1305 — Full Security Review


Review Bar for This PR

  • The PR must not be superseded by stronger hardening already on main.
  • A user-supplied URL must not produce a dash-prefixed clone directory that git interprets as an option flag.
  • sanitizeRepoName() must itself be safe; callers must not need extra checks for ., .., or path separators.
  • parseRepoNameFromUrl() must not produce the string 'unknown' where callers expect either a safe name or null.
  • extractRepoName() must have one clear, non-redundant implementation path.
  • Deleted tests must not reduce coverage of documented security invariants.
  • No hidden bidi/Unicode controls in changed source or test files.

Current PR State

Field Value
Head SHA cdf7a12 (latest PR-specific commit; merge to main at ba4b704)
Status Open, not draft
Commits 3 total: 11a1ed3 (refactor), cdf7a12 (fix), ba4b704 (merge from main)
Changed files 4: git-clone.ts, git.ts, git-clone.test.ts, git.test.ts
CI ✅ typecheck pass, ✅ tests pass (8026/8026), ✅ E2E pass
Hidden bidi Unicode ❌ None found. Non-ASCII chars in comments are em-dashes (U+2014) and § — benign
Vercel Irrelevant — server-side security fix, no deployment preview needed
PR body vs code Matches. sanitizeRepoName, parseRepoNameFromUrl, extractRepoName delegation are all present

Staleness / Duplication Assessment

Verdict: no material duplication; PR remains relevant.

Commit 0844973 (already on main) hardened git-clone.ts by:

  • Adding buildCloneArgs with -- separator (CLI injection fix)
  • Adding cloneOrPull path-containment barrier
  • Adding validateGitUrl unconditional call
  • Adding assertRemoteMatchesRequestedUrl (wrong-repo pull protection)
  • Fixing the ReDoS in extractRepoName (charCode loop)
  • Making extractRepoName throw for any name not matching [a-zA-Z0-9._-]+

This PR's actual delta over main:

  1. Adds sanitizeRepoName() to storage/git.ts — new
  2. Fixes a ReDoS-vulnerable regex still present in parseRepoNameFromUrl on main (/\.git\/*$/i and /\/+$/) — genuine fix
  3. Makes parseRepoNameFromUrl call sanitizeRepoName instead of returning raw candidates
  4. Changes extractRepoName to delegate to parseRepoNameFromUrl instead of inlining the logic
  5. Changes the security boundary from REJECT to SANITIZE for unsafe URL segments

The PR is not stale — the ReDoS in parseRepoNameFromUrl (storage/git.ts) is a real residual gap on main.


Understanding of the Change

Original risk: A URL such as https://github.com/user/--upload-pack=payload.git has --upload-pack=payload as its last path segment. On main (before this PR), extractRepoName already threw for this because it doesn't match REPO_NAME_PATTERN. Additionally, buildCloneArgs already placed -- before the URL, preventing the URL itself from being git-flag-interpreted. So the original injection risk was doubly-covered on main.

What this PR adds: Centralizes URL-to-name parsing in storage/git.ts with sanitizeRepoName, and changes the policy at extractRepoName from fail-fast (throw) to permissive (return sanitized name). A URL with an unsafe last segment now clones into a directory named e.g. upload-pack_payload rather than causing the API to return an error. Additionally it genuinely fixes the ReDoS regex in parseRepoNameFromUrl.

What must remain unchanged: getCloneDir still validates via REPO_NAME_PATTERN at the boundary; buildCloneArgs still uses --; cloneOrPull still uses the path-containment barrier. All of these are unchanged.


Findings

[HIGH] sanitizeRepoName() passes . and .. through unmodified

  • Category: Path traversal / defensive coding
  • File: gitnexus/src/storage/git.ts:249
  • Issue: Both . and .. match the allowlist [a-zA-Z0-9._-], so sanitizeRepoName('.')'.' and sanitizeRepoName('..')'..'. extractRepoName does check for these afterward, but sanitizeRepoName is now a public export that getInferredRepoName chains through and that any future caller could use directly. If a new consumer calls sanitizeRepoName and relies on it to return a "safe" name, they get a path-traversal-capable string without warning.
  • Why it matters: The function's stated purpose is to return a name that is "safe for all FS". . and .. violate that contract silently. The current guard is in the caller, not the function.
  • Recommended fix: name.replace(/^-+/, '').replace(/[^a-zA-Z0-9_-]/g, '_') || 'unknown' — remove dots from the allowlist, or add an explicit guard: const safe = ...; if (safe === '.' || safe === '..') return 'unknown';
  • Blocks merge: Yes — sanitizeRepoName does not fulfill its stated contract.

Fix this →

[HIGH] parseRepoNameFromUrl() returns the string 'unknown' — not null — for all-dash inputs

  • Category: Registry/project-name semantics, identity collision
  • File: gitnexus/src/storage/git.ts:258-279
  • Issue: sanitizeRepoName('---')'unknown' (non-null string). parseRepoNameFromUrl('https://x.com/user/---.git') therefore returns 'unknown' — a string, not null. Callers that use ?? or || null checks for "no name found" will not detect this case, because 'unknown' is truthy. Specifically: getInferredRepoName in storage/git.ts:286 passes this through, and run-analyze.ts:182 uses it as name = inferred ?? path.basename(...). Since 'unknown' is not nullish, path.basename fallback is not triggered — the repo registers as 'unknown'. Any subsequent repo with the same edge-case URL pattern also registers as 'unknown', causing a silent registry collision.
  • Why it matters: The ?? null-coalescing pattern throughout run-analyze.ts and repo-manager.ts relies on null to mean "could not infer". Returning 'unknown' as a string breaks this contract and can silently rename/collide repos in the registry.
  • Recommended fix: Have parseRepoNameFromUrl return null when sanitization yields an empty result: return candidate ? (sanitizeRepoName(candidate) || null) : null. The || null turns 'unknown' back into null so callers' ?? chains work correctly. The 'unknown' fallback string can remain purely as a sentinel in sanitizeRepoName for direct callers that want a non-null guarantee.
  • Blocks merge: Yes.

Fix this →

[MEDIUM] Removal of "rejects shell metacharacters" test removes documented security intent

  • Category: Test coverage, security documentation
  • File: gitnexus/test/unit/git-clone.test.ts
  • Issue: Commit 11a1ed3 removed this test from git-clone.test.ts:
    it('rejects URLs with shell metacharacters in the last segment', () => {
      expect(() => extractRepoName('https://example.com/foo:repo;rm')).toThrow('valid repository name');
      expect(() => extractRepoName('https://example.com/foo:repo$x')).toThrow('valid repository name');
    });
    After this PR, these calls no longer throw — they return repo_rm and repo_x. While the sanitized names are safe (; and $ are replaced by _), the test removal loses explicit documentation that the server boundary handles these metacharacters. No equivalent positive test was added (e.g. expect(extractRepoName('https://example.com/foo:repo;rm')).toBe('repo_rm')).
  • Why it matters: Future maintainers have no test asserting what happens to ; or $ in URL segments. If sanitizeRepoName is ever loosened, there will be no regression guard.
  • Recommended fix: Add sanitization assertion tests for ;, $, whitespace, and \ in URL segments. They don't need to test for throws — testing the sanitized result is correct given the new policy — but they must exist.
  • Blocks merge: No — but should be addressed before merge.

[MEDIUM] Windows reserved names pass sanitizeRepoName unchanged

  • Category: Cross-platform filesystem compatibility
  • File: gitnexus/src/storage/git.ts:249
  • Issue: sanitizeRepoName('CON')'CON', sanitizeRepoName('NUL')'NUL', sanitizeRepoName('COM1')'COM1'. All match the allowlist. getCloneDir('CON') would also pass REPO_NAME_PATTERN. On Windows, creating a directory named CON or NUL under CLONE_ROOT will fail or corrupt at the OS level.
  • Why it matters: GitNexus's CI tests on 3 platforms. The PR body states "ensures compatibility with various file systems."
  • Recommended fix: Add a Windows reserved name blocklist to sanitizeRepoName, or add a test that asserts these names are handled (even if only to document the known gap).
  • Blocks merge: No — known gap in the original codebase, but the PR body's claim is overstated.

[LOW] Identity collision via _ substitution

  • Category: Registry semantics
  • File: gitnexus/src/storage/git.ts:249
  • Issue: repo<tag> and repo?tag both sanitize to repo_tag_ and repo_tag respectively. More critically, repo;rm and repo_rm both sanitize to repo_rm. Two repos with different (but similarly-structured) URLs can collide on the same clone directory name and registry entry. The assertRemoteMatchesRequestedUrl check in cloneOrPull will catch wrong-repo pulls for existing clones, but newly-cloned repos from colliding URLs will still land in the same directory on first use.
  • Why it matters: Not an injection risk, but a correctness/availability risk under adversarial URL crafting against shared infrastructure.
  • Blocks merge: No — existing assertRemoteMatchesRequestedUrl provides a strong mitigation layer.

Repo-Name Parsing / Sanitization Assessment

sanitizeRepoName (git.ts:245): Uses an allowlist [^a-zA-Z0-9._-]_. Strips leading dashes. Returns 'unknown' for empty result. Passes . and .. through — see HIGH finding above. Does not handle Windows reserved names.

parseRepoNameFromUrl (git.ts:258): Correctly uses a charCode loop instead of /\/+$/ regex — this is the genuine ReDoS fix. Strips .git case-insensitively. Splits on /[/:]/.pop() — O(n), not O(n²). Returns sanitizeRepoName(candidate) including the string 'unknown' — see HIGH finding above. Handles HTTPS, SSH git@host:, ssh://, git://, file:// forms correctly via the unified split approach.

extractRepoName (git-clone.ts:31): Cleanly delegates to parseRepoNameFromUrl. Correctly adds guards for null, '.', '..', 'unknown', and non-REPO_NAME_PATTERN names. No unreachable code or duplicate returns found. The behavior shift from REJECT to SANITIZE is intentional but should be tested more explicitly.


Clone Security Assessment

buildCloneArgs: ['clone', '--depth', '1', '--', url, targetDir]-- separator is present. Correctly prevents option-like URLs from being parsed as git flags. ✅

cloneOrPull: Path-containment barrier at entry using path.relative(CLONE_ROOT, safeTarget) — CodeQL-recognized idiom. validateGitUrl called unconditionally. assertRemoteMatchesRequestedUrl called before any pull. ✅

runGit: Uses spawn('git', args, ...) with args array. No shell: true. ✅

getCloneDir: Re-validates REPO_NAME_PATTERN at the boundary even after extractRepoName. ✅

Injection risk verdict: Git argument injection is doubly-prevented — by -- in buildCloneArgs (URL level) and by REPO_NAME_PATTERN check in extractRepoName/getCloneDir (directory name level). This PR adds a third layer (sanitization) which is additive, not a substitute.


Registry / Project-Name Assessment

getInferredRepoNameparseRepoNameFromUrlsanitizeRepoName. The chain now returns sanitized names, which is an improvement over the raw (possibly dash-prefixed) candidates on main. However, it can return the string 'unknown' (see HIGH finding), which breaks callers' ?? null-coalescing logic in run-analyze.ts:182, run-analyze.ts:371, and repo-manager.ts:479.

Explicit --name flag (handled as opts.name in run-analyze.ts:474) still wins over inferred names. Existing aliases are preserved via hasCustomAlias. Stable URLs like https://github.com/owner/my-repo.git produce identical names before and after this PR (my-repo). No AGENTS.md / MCP URI generation paths are changed.


Test Assessment

Added (git.test.ts): Tests for sanitizeRepoName (leading dash strip, unsafe char replacement, empty/all-dash → 'unknown') and parseRepoNameFromUrl (HTTPS, SSH, null input). These are meaningful and would have failed before the PR.

Added (git-clone.test.ts): Tests for strips leading dashes (HTTPS), sanitizes unsafe directory characters. These correctly document the new SANITIZE policy.

Removed (git-clone.test.ts): The "rejects shell metacharacters" test (;, $) was removed without a replacement sanitization assertion. Many explanatory comments documenting WHY specific security decisions were made were also stripped. These comments were valuable for future security reviewers — their removal is a maintainability regression.

Coverage gaps:

  • No test for sanitizeRepoName('.') or sanitizeRepoName('..') — which currently return . and .. (the HIGH finding)
  • No test for sanitizeRepoName('CON') or similar Windows reserved names
  • No positive assertion for what extractRepoName returns for ; or $ URLs

Hidden Unicode / Hygiene Assessment

  • Bidi controls (\x{202A}-\x{202E}, \x{2066}-\x{2069}): None found. ✅
  • Non-ASCII in executable code: None. Non-ASCII characters are em-dashes (U+2014) and § in comments. ✅
  • Unreachable code / duplicate returns: None found in current HEAD files. extractRepoName has one clean implementation path. ✅
  • Stale old code: The old inline trailing-slash loop and segment-split logic in extractRepoName has been cleanly removed. ✅
  • Import path: Fixed from '../storage/git' to '../storage/git.js' in cdf7a12. ✅

Elegance / Maintainability Assessment

The centralization of parsing and sanitization into storage/git.ts is architecturally correct. The sanitizeRepoName / parseRepoNameFromUrl / extractRepoName layering is clear. The simplification of parseRepoNameFromUrl (removing the SSH-branch conditional from 11a1ed3 and replacing it with unified split(/[/:]/).pop() in cdf7a12) is an improvement.

The main elegance concern is that sanitizeRepoName promises more than it delivers — its stated purpose includes cross-platform safety, but it passes ., .., and Windows reserved names through. The 'unknown' string return from a function typed as string | null (via parseRepoNameFromUrl) is a design smell that breaks null-coalescing callers.

Removing 50+ lines of explanatory security comments (the WHY behind specific test cases and barriers) reduces the future-maintainability value of the test suite.


Final Verdict

Not production-ready.

Two blocking issues prevent merge:

  1. sanitizeRepoName passes . and .. through. The function is a public export called by getInferredRepoName and intended as a safe-name guarantee. Returning . or .. violates that contract and creates a latent path-traversal bug for any future consumer that doesn't add its own caller-side check.

  2. parseRepoNameFromUrl returns the string 'unknown' instead of null for all-dash inputs. Every caller (getInferredRepoNamerun-analyze.ts, repo-manager.ts) uses ?? null-coalescing to detect "name could not be inferred." The string 'unknown' is truthy and non-null, so those callers silently name the project 'unknown' instead of falling back to path.basename(...). This can cause registry collisions for edge-case malformed remote URLs.

Both are small, targeted fixes. The core contribution of this PR (ReDoS fix in parseRepoNameFromUrl, sanitizeRepoName centralization, extractRepoName delegation) is sound and worth landing — these fixes address a genuine residual gap on main. Once the two blockers are resolved and the missing sanitization assertions are added for ;/$/./.., this PR can be merged.


@RinZ27 RinZ27 force-pushed the fix/sanitize-repo-name-rebased-v2 branch from 535d0ee to 7c781d5 Compare May 8, 2026 13:29
@RinZ27

RinZ27 commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

Rebased on main and addressed the security review findings.

  • Added explicit guards for . and .. in sanitizeRepoName to prevent path traversal.
  • Handled Windows reserved names (CON, NUL, etc.) during sanitization.
  • Updated parseRepoNameFromUrl to return null instead of 'unknown' to ensure correct null-coalescing for registry fallbacks.
  • Restored missing security tests and added new assertions for the sanitization logic. @magyargergo.

@github-actions

github-actions Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @magyargergo's task in 4m 2s —— View job


PR #1305 — Full Security Review


Review Bar for This PR

  • sanitizeRepoName() must never return ., .., empty string, or a Windows reserved device name — callers must not need extra traversal guards.
  • parseRepoNameFromUrl() must return null (not 'unknown') for all-dash/unparseable inputs so ?? null-coalescing in registry callers falls through correctly.
  • extractRepoName() must delegate cleanly with one implementation path and no stale code.
  • buildCloneArgs() must include -- before the URL; runGit() must use spawn() with no shell: true.
  • cloneOrPull() must enforce CLONE_ROOT path containment and call validateGitUrl() unconditionally.
  • ReDoS-prone regexes must not appear in URL suffix stripping or character sanitization.
  • All high blockers from the prior review must be verified fixed in code, not just in the author's comment.
  • CI (typecheck, lint, tests, CodeQL, Gitleaks, Dependency Review) must be green on the latest head.

Current PR State

Field Value
Head SHA 7c781d5 (single commit, force-rebased from prior multi-commit history)
Status Open, not draft
Commits 1
Changed files 4 (git-clone.ts, git.ts, git-clone.test.ts, git.test.ts)
Net diff +144 / −39
Branch hygiene ✅ Clean
Prior review comments Outdated — superseded by force-push; re-verified below
CodeQL ✅ success
Gitleaks ✅ success
Dependency Review ✅ success
Typecheck ✅ success
Lint / format ✅ success
macOS tests ✅ success
Ubuntu/coverage tests ✅ success
Windows tests ⏳ in progress at time of review
Docker Build & Push ⏳ in progress (irrelevant to security review)
Hidden Unicode (bidi) ✅ None found — all non-ASCII are em-dashes and §/ in comments
Vercel Irrelevant — server-side/storage change

Branch Hygiene Assessment

Branch hygiene verdict: clean feature/fix PR

Exactly 4 changed files, single focused commit. No lockfile churn, no workflow changes, no dependency bumps, no unrelated server or core changes. The prior multi-commit history was cleanly rebased away.


Understanding of the Change

Original risk: A URL like https://github.com/user/--upload-pack=payload.git extracts --upload-pack=payload as the repo name. If passed as the git clone target directory, git could interpret it as an option flag.

Pre-existing protections on main: buildCloneArgs already placed -- before the URL; extractRepoName already threw on names not matching REPO_NAME_PATTERN; cloneOrPull already enforced CLONE_ROOT containment. Injection was doubly blocked before this PR.

What this PR adds: Centralizes URL-to-name parsing in storage/git.ts via sanitizeRepoName + parseRepoNameFromUrl. Changes extractRepoName from pure fail-fast to sanitize-then-validate (unsafe segments now yield a sanitized clone-safe name instead of an API error). Genuinely fixes the ReDoS-prone /\.git\/*$/i and /\/+$/ regexes in parseRepoNameFromUrl that remained on main. Fixes the two high blockers the prior review raised. Adds explicit regression tests.

Policy change: Sanitize rather than reject. A URL like https://github.com/user/repo;rm.git now clones into repo_rm instead of throwing. This is intentional and documented in the tests.

Out of scope: Core indexing pipeline, UI, broad architectural changes, symlink-aware containment.


Findings

[MEDIUM] Comment overclaims Windows reserved-name-with-extension coverage — code and tests do not match

  • Category: Cross-platform filesystem safety / documentation correctness
  • File: gitnexus/src/storage/git.ts:275–279, gitnexus/test/unit/git.test.ts
  • Issue: The comment at line 275 reads: "Windows reserved names like CON, PRN, AUX, NUL, COM1-9, LPT1-9 cannot be used as directory names on Windows even if they have an extension." The implementation uses /^(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])$/i. The $ anchor means CON.txt, NUL.log, COM1.repo are not matched. Tracing sanitizeRepoName('CON.txt'): step 1 strips leading dashes (none) → 'CON.txt'; step 2 replaces unsafe chars (none, dots are in the allowlist) → 'CON.txt'; step 3 reserved.test('CON.txt')false → returns 'CON.txt'. parseRepoNameFromUrl('https://example.com/user/CON.txt') returns 'CON.txt'. getCloneDir('CON.txt') passes REPO_NAME_PATTERN (dots allowed). On Windows, creating ~/.gitnexus/repos/CON.txt will fail or produce undefined behavior at the OS level.
  • Why it matters: The comment is a correctness contract. The PR body claims "ensures compatibility with various file systems." The code delivers this for bare reserved names but not for reserved names with extensions. No tests cover CON.txt, NUL.log, COM1.repo. A future reviewer or caller of sanitizeRepoName will trust the comment.
  • Recommended fix (pick one):
    1. Widen the regex: const reserved = /^(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])(\..*)?$/i; — blocks reserved names with any extension.
    2. Narrow the comment: "Windows bare reserved device names (CON, NUL, COM1, LPT1–9) are blocked; names with extensions (CON.txt) are not currently handled." Add a TODO or tracked issue.
    3. Add tests asserting the current behavior so the gap is explicitly documented: expect(sanitizeRepoName('CON.txt')).toBe('CON.txt'); // known gap — tracked in issue #XXXX.
  • Blocks merge: No — the prior review classified this non-blocking and the core injection risk is unaffected. But the comment/code mismatch should be corrected before merge.

Fix this →

[LOW] Identity collision via _ substitution — known trade-off

  • Category: Registry semantics / correctness
  • File: gitnexus/src/storage/git.ts:271
  • Issue: repo;rm and repo_rm both sanitize to repo_rm. Two distinct remote URLs with similarly structured names can collide on the same clone directory and registry entry on first use.
  • Why it matters: Not an injection risk. assertRemoteMatchesRequestedUrl catches wrong-repo pulls on subsequent use. The first-use collision leaves both repos mapped to the same directory, which would be caught on the second clone attempt.
  • Recommended fix: Document as an accepted trade-off. The existing assertRemoteMatchesRequestedUrl is strong mitigation.
  • Blocks merge: No.

Clone Security Assessment

All clone security barriers are intact and unchanged by this PR:

Check Status
runGit uses spawn('git', args, {...}) — no shell: true
buildCloneArgs returns ['clone', '--depth', '1', '--', url, targetDir]
-- separator is before both URL and targetDir
validateGitUrl called unconditionally in cloneOrPull (both clone and pull paths)
assertRemoteMatchesRequestedUrl called before pull on existing clone
path.relative(CLONE_ROOT, safeTarget) containment check at entry
getCloneDir re-validates REPO_NAME_PATTERN as a secondary boundary
No new code execution paths
No shell metacharacters can reach a shell

Injection risk is triply prevented: validateGitUrl, -- separator via buildCloneArgs, and REPO_NAME_PATTERN validation via getCloneDir/extractRepoName.


Repo-Name Parsing / Sanitization Assessment

sanitizeRepoName (git.ts:268):

  • Strips leading dashes ✅
  • All-dash → 'unknown'
  • Empty string → 'unknown'
  • . and ..'unknown' ✅ (prior HIGH blocker — FIXED)
  • Replaces unsafe chars with _ using /[^a-zA-Z0-9._-]/g — simple character class, O(n), no ReDoS ✅
  • Windows bare reserved names (CON, NUL, COM1, LPT9) → 'unknown'
  • Windows reserved names with extensions (CON.txt, NUL.log): passes through — comment overclaims (see MEDIUM finding above)
  • Safe names (my-repo, repo.name, repo-1.2.3) preserved ✅
  • Trailing dots/spaces not explicitly handled — minor, not claimed

parseRepoNameFromUrl (git.ts:292):

  • Returns null for null/undefined/empty input ✅
  • Trailing slash strip is a charCode loop — O(n), no ReDoS ✅ (prior HIGH blocker — FIXED)
  • .git suffix strip uses .endsWith() + .slice() — O(n) ✅
  • Splits on /[/:]/.pop() — handles HTTPS, SSH (git@host:owner/repo), ssh://, git://, file:// correctly ✅
  • Returns null when sanitizeRepoName yields 'unknown' ✅ (prior HIGH blocker — FIXED)
  • Never returns the string 'unknown' to callers ✅
  • Normal URLs produce the correct names: my-repo, repo, etc. ✅

extractRepoName (git-clone.ts:32):

  • Single clean delegation path: const name = parseRepoNameFromUrl(url)
  • No stale duplicate parsing logic ✅
  • No unreachable returns ✅
  • Guards for null, '.', '..', 'unknown', and !REPO_NAME_PATTERN.test(name)
  • Import uses .js extension correctly ✅
  • Normal URLs return the expected name; adversarial URLs throw a clear error ✅

Registry / Inferred-Name Assessment

All three ?? null-coalescing callers correctly fall through to path.basename(...) when getInferredRepoName returns null:

File Line Pattern
run-analyze.ts 190 getInferredRepoName(repoPath) ?? path.basename(...)
run-analyze.ts 398 getInferredRepoName(repoPath) ?? path.basename(...)
repo-manager.ts 490 inferred ?? path.basename(resolveRepoIdentityRoot(...))

Since parseRepoNameFromUrl now returns null (not 'unknown') for all-dash/unparseable inputs, no repo can be silently registered as 'unknown' through getInferredRepoName. The ?? fallback chain works correctly. Explicit --name override wins in all callers. Existing custom aliases are preserved via hasCustomAlias. MCP URI/name generation is unaffected for normal repos.


Cross-Platform Filesystem Assessment

Case Handled?
Leading dashes ✅ stripped
Path separators /, \ in candidate split(/[/:]/) in parseRepoNameFromUrl ensures candidate has none
. and .. ✅ blocked to 'unknown'
Unsafe chars <>:"/|?*;$& ✅ replaced with _
Windows bare reserved names ✅ blocked
Windows reserved names with extension (CON.txt, NUL.log) ❌ not blocked — comment overclaims
Trailing dots/spaces ❌ not addressed — minor, not claimed in PR
Max filename length Out of scope — documented
Unicode normalization / confusables Out of scope

The PR body's claim "ensures compatibility with various file systems" is slightly overstated for Windows reserved-name-with-extension cases.


Test Assessment

Case Covered?
Leading dashes (--upload-pack, -repo)
All-dash → null (registry fallback)
sanitizeRepoName('.') and sanitizeRepoName('..')'unknown'
Windows bare reserved names (CON, PRN, AUX, NUL, COM1, LPT9)
Windows reserved names with extensions (CON.txt, NUL.log) ❌ missing
Unsafe chars <>:"
Shell metacharacters ; and $ ✅ (restored, now assert sanitized result)
Whitespace and backslash
Normal HTTPS URLs
Normal SSH/SCP URLs
Trailing slashes
ReDoS guard (10k trailing slashes, <500ms)
buildCloneArgs -- separator
cloneOrPull path containment
getCloneDir . / .. / path separators
validateGitUrl unconditional on pull path
assertRemoteMatchesRequestedUrl wrong-repo vector
Removed metacharacter throw-tests replaced with sanitization assertions

The test suite is thorough. The one gap is the missing assertions for CON.txt/NUL.log (either to document the current pass-through behavior, or to test the fixed regex if option 1 from the MEDIUM finding is taken).


GHAS / Hidden Unicode Assessment

Check Status
CodeQL ✅ success on 7c781d5
Gitleaks ✅ success
Dependency Review ✅ success
Bidi controls (\x{202A}–\x{202E}, \x{2066}–\x{2069}) ✅ None found
Non-ASCII in source ✅ Only em-dashes (U+2014), § (U+00A7), and (U+2192) in comments — all visible, benign
Non-ASCII in regex literals or string literals ✅ None
Prior ReDoS issue in parseRepoNameFromUrl ✅ Fixed — charCode loop + .endsWith() replaces polynomial regex

Elegance / Maintainability Assessment

The centralization of parsing and sanitization in storage/git.ts is architecturally correct and clean. The sanitizeRepoNameparseRepoNameFromUrlextractRepoName layering is clear and unambiguous. The single-responsibility delegation in extractRepoName (≤12 lines) eliminates the prior inline duplication. The 'unknown'-to-null conversion at the parseRepoNameFromUrl boundary preserves sanitizeRepoName's non-null sentinel contract while satisfying callers' ?? coalescing expectations — a good design decision.

The only maintainability concern is the comment at git.ts:275 that overclaims Windows reserved-name-with-extension coverage. A future maintainer reading that comment will assume CON.txt is blocked; it is not. This comment should be corrected.

Security comments in the test suite are good — the WHY behind the barriers is documented for both buildCloneArgs and cloneOrPull. The restored metacharacter tests are correctly updated to assert sanitized output rather than throwing behavior.


Final Verdict

production-ready with minor follow-ups

Both high-severity blockers from the prior review have been resolved: sanitizeRepoName now explicitly guards . and .., and parseRepoNameFromUrl returns null (not the string 'unknown') for all-dash/unparseable inputs, preserving correct ?? null-coalescing semantics for all registry callers. The core contribution — ReDoS fix, sanitizeRepoName centralization, clean extractRepoName delegation — is sound and addresses a genuine residual gap on main.

The one remaining item before merge is the comment/code mismatch for Windows reserved names with extensions: the comment at git.ts:275 explicitly says extension forms (CON.txt, NUL.log) are blocked, but the regex ^(CON|...)$ does not cover them. This should be resolved by either widening the regex to ^(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])(\..*)?$/i or correcting the comment and adding a test documenting the gap. Either path is a small, targeted change. The clone security boundary, registry semantics, ReDoS posture, and test coverage are all solid.


RinZ27 added a commit to RinZ27/GitNexus that referenced this pull request May 8, 2026
- Updated sanitizeRepoName to block Windows reserved names (CON, NUL, etc.) even when they have extensions (e.g., CON.txt).
- Corrected regex and added unit tests for these edge cases to resolve CI failures on Windows.
- Ref: abhigyanpatwari#1305 (comment)
Sanitizes the extracted repository name to prevent argument injection during git clone operations and ensures compatibility with various file systems.

1. Strips leading dashes to prevent git command-line argument injection.

2. Replaces unsafe directory characters with underscores.

3. Blocks path traversal segments ('.' and '..') and Windows reserved names.

4. Fixes ReDoS vulnerability in parseRepoNameFromUrl regex.

5. Added unit tests for sanitization and path traversal edge cases.
@RinZ27 RinZ27 force-pushed the fix/sanitize-repo-name-rebased-v2 branch from 24c3890 to ea9ed3b Compare May 9, 2026 13:06
RinZ27 added a commit to RinZ27/GitNexus that referenced this pull request May 9, 2026
- Updated sanitizeRepoName to block Windows reserved names (CON, NUL, etc.) even when they have extensions (e.g., CON.txt).
- Corrected regex and added unit tests for these edge cases to resolve CI failures on Windows.
- Ref: abhigyanpatwari#1305 (comment)
@github-actions

github-actions Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

✨ PR Autofix

Posted formatting / unused-import suggestions inline. Click Apply suggestion on each, or run locally: npm run lint:fix && npm run format.

{"schema":"gitnexus.pr-autofix/v1","state":"suggestions-posted","pr_number":1305,"changed_lines":13,"head_sha":"ea9ed3b03a0f36669e572b86c1939390aa562c1e","run_id":"25602068211"}

- Updated sanitizeRepoName to block Windows reserved names (CON, NUL, etc.) even when they have extensions (e.g., CON.txt).
- Corrected regex and added unit tests for these edge cases to resolve CI failures on Windows.
- Ref: abhigyanpatwari#1305 (comment)
@RinZ27 RinZ27 force-pushed the fix/sanitize-repo-name-rebased-v2 branch from ea9ed3b to 40ccda1 Compare May 9, 2026 13:46
@magyargergo magyargergo merged commit 6a89472 into abhigyanpatwari:main May 11, 2026
43 of 46 checks passed
@RinZ27 RinZ27 deleted the fix/sanitize-repo-name-rebased-v2 branch May 12, 2026 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants