fix(server): sanitize repo name to prevent argument injection#1305
Conversation
|
@RinZ27 is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
|
@RinZ27 before i review you changes, please make sure our CI is healthy! 🙏 |
019f1d7 to
ba4b704
Compare
|
@magyargergo ready for another look whenever you're free. |
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 8671 tests passed 1 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
|
Claude finished @magyargergo's task in 7m 27s —— View job PR #1305 — Full Security ReviewReview Bar for This PR
Current PR State
Staleness / Duplication AssessmentVerdict: no material duplication; PR remains relevant. Commit
This PR's actual delta over
The PR is not stale — the ReDoS in Understanding of the ChangeOriginal risk: A URL such as What this PR adds: Centralizes URL-to-name parsing in What must remain unchanged: Findings[HIGH]
|
535d0ee to
7c781d5
Compare
|
Rebased on main and addressed the security review findings.
|
|
Claude finished @magyargergo's task in 4m 2s —— View job PR #1305 — Full Security ReviewReview Bar for This PR
Current PR State
Branch Hygiene AssessmentBranch hygiene verdict: clean feature/fix PR Exactly 4 changed files, single focused commit. No lockfile churn, no workflow changes, no dependency bumps, no unrelated server or core changes. The prior multi-commit history was cleanly rebased away. Understanding of the ChangeOriginal risk: A URL like Pre-existing protections on What this PR adds: Centralizes URL-to-name parsing in Policy change: Sanitize rather than reject. A URL like Out of scope: Core indexing pipeline, UI, broad architectural changes, symlink-aware containment. Findings[MEDIUM] Comment overclaims Windows reserved-name-with-extension coverage — code and tests do not match
[LOW] Identity collision via
|
| Check | Status |
|---|---|
runGit uses spawn('git', args, {...}) — no shell: true |
✅ |
buildCloneArgs returns ['clone', '--depth', '1', '--', url, targetDir] |
✅ |
-- separator is before both URL and targetDir |
✅ |
validateGitUrl called unconditionally in cloneOrPull (both clone and pull paths) |
✅ |
assertRemoteMatchesRequestedUrl called before pull on existing clone |
✅ |
path.relative(CLONE_ROOT, safeTarget) containment check at entry |
✅ |
getCloneDir re-validates REPO_NAME_PATTERN as a secondary boundary |
✅ |
| No new code execution paths | ✅ |
| No shell metacharacters can reach a shell | ✅ |
Injection risk is triply prevented: validateGitUrl, -- separator via buildCloneArgs, and REPO_NAME_PATTERN validation via getCloneDir/extractRepoName.
Repo-Name Parsing / Sanitization Assessment
sanitizeRepoName (git.ts:268):
- Strips leading dashes ✅
- All-dash →
'unknown'✅ - Empty string →
'unknown'✅ .and..→'unknown'✅ (prior HIGH blocker — FIXED)- Replaces unsafe chars with
_using/[^a-zA-Z0-9._-]/g— simple character class, O(n), no ReDoS ✅ - Windows bare reserved names (CON, NUL, COM1, LPT9) →
'unknown'✅ - Windows reserved names with extensions (CON.txt, NUL.log): passes through — comment overclaims (see MEDIUM finding above)
- Safe names (
my-repo,repo.name,repo-1.2.3) preserved ✅ - Trailing dots/spaces not explicitly handled — minor, not claimed
parseRepoNameFromUrl (git.ts:292):
- Returns
nullfor null/undefined/empty input ✅ - Trailing slash strip is a charCode loop — O(n), no ReDoS ✅ (prior HIGH blocker — FIXED)
.gitsuffix strip uses.endsWith()+.slice()— O(n) ✅- Splits on
/[/:]/.pop()— handles HTTPS, SSH (git@host:owner/repo),ssh://,git://,file://correctly ✅ - Returns
nullwhensanitizeRepoNameyields'unknown'✅ (prior HIGH blocker — FIXED) - Never returns the string
'unknown'to callers ✅ - Normal URLs produce the correct names:
my-repo,repo, etc. ✅
extractRepoName (git-clone.ts:32):
- Single clean delegation path:
const name = parseRepoNameFromUrl(url)✅ - No stale duplicate parsing logic ✅
- No unreachable returns ✅
- Guards for
null,'.','..','unknown', and!REPO_NAME_PATTERN.test(name)✅ - Import uses
.jsextension correctly ✅ - Normal URLs return the expected name; adversarial URLs throw a clear error ✅
Registry / Inferred-Name Assessment
All three ?? null-coalescing callers correctly fall through to path.basename(...) when getInferredRepoName returns null:
| File | Line | Pattern |
|---|---|---|
run-analyze.ts |
190 | getInferredRepoName(repoPath) ?? path.basename(...) |
run-analyze.ts |
398 | getInferredRepoName(repoPath) ?? path.basename(...) |
repo-manager.ts |
490 | inferred ?? path.basename(resolveRepoIdentityRoot(...)) |
Since parseRepoNameFromUrl now returns null (not 'unknown') for all-dash/unparseable inputs, no repo can be silently registered as 'unknown' through getInferredRepoName. The ?? fallback chain works correctly. Explicit --name override wins in all callers. Existing custom aliases are preserved via hasCustomAlias. MCP URI/name generation is unaffected for normal repos.
Cross-Platform Filesystem Assessment
| Case | Handled? |
|---|---|
| Leading dashes | ✅ stripped |
Path separators /, \ in candidate |
✅ split(/[/:]/) in parseRepoNameFromUrl ensures candidate has none |
. and .. |
✅ blocked to 'unknown' |
Unsafe chars <>:"/|?*;$& |
✅ replaced with _ |
| Windows bare reserved names | ✅ blocked |
| Windows reserved names with extension (CON.txt, NUL.log) | ❌ not blocked — comment overclaims |
| Trailing dots/spaces | ❌ not addressed — minor, not claimed in PR |
| Max filename length | Out of scope — documented |
| Unicode normalization / confusables | Out of scope |
The PR body's claim "ensures compatibility with various file systems" is slightly overstated for Windows reserved-name-with-extension cases.
Test Assessment
| Case | Covered? |
|---|---|
Leading dashes (--upload-pack, -repo) |
✅ |
All-dash → null (registry fallback) |
✅ |
sanitizeRepoName('.') and sanitizeRepoName('..') → 'unknown' |
✅ |
| Windows bare reserved names (CON, PRN, AUX, NUL, COM1, LPT9) | ✅ |
| Windows reserved names with extensions (CON.txt, NUL.log) | ❌ missing |
Unsafe chars <>:" |
✅ |
Shell metacharacters ; and $ |
✅ (restored, now assert sanitized result) |
| Whitespace and backslash | ✅ |
| Normal HTTPS URLs | ✅ |
| Normal SSH/SCP URLs | ✅ |
| Trailing slashes | ✅ |
| ReDoS guard (10k trailing slashes, <500ms) | ✅ |
buildCloneArgs -- separator |
✅ |
cloneOrPull path containment |
✅ |
getCloneDir . / .. / path separators |
✅ |
validateGitUrl unconditional on pull path |
✅ |
assertRemoteMatchesRequestedUrl wrong-repo vector |
✅ |
| Removed metacharacter throw-tests replaced with sanitization assertions | ✅ |
The test suite is thorough. The one gap is the missing assertions for CON.txt/NUL.log (either to document the current pass-through behavior, or to test the fixed regex if option 1 from the MEDIUM finding is taken).
GHAS / Hidden Unicode Assessment
| Check | Status |
|---|---|
| CodeQL | ✅ success on 7c781d5 |
| Gitleaks | ✅ success |
| Dependency Review | ✅ success |
Bidi controls (\x{202A}–\x{202E}, \x{2066}–\x{2069}) |
✅ None found |
| Non-ASCII in source | ✅ Only em-dashes (U+2014), § (U+00A7), and → (U+2192) in comments — all visible, benign |
| Non-ASCII in regex literals or string literals | ✅ None |
Prior ReDoS issue in parseRepoNameFromUrl |
✅ Fixed — charCode loop + .endsWith() replaces polynomial regex |
Elegance / Maintainability Assessment
The centralization of parsing and sanitization in storage/git.ts is architecturally correct and clean. The sanitizeRepoName → parseRepoNameFromUrl → extractRepoName layering is clear and unambiguous. The single-responsibility delegation in extractRepoName (≤12 lines) eliminates the prior inline duplication. The 'unknown'-to-null conversion at the parseRepoNameFromUrl boundary preserves sanitizeRepoName's non-null sentinel contract while satisfying callers' ?? coalescing expectations — a good design decision.
The only maintainability concern is the comment at git.ts:275 that overclaims Windows reserved-name-with-extension coverage. A future maintainer reading that comment will assume CON.txt is blocked; it is not. This comment should be corrected.
Security comments in the test suite are good — the WHY behind the barriers is documented for both buildCloneArgs and cloneOrPull. The restored metacharacter tests are correctly updated to assert sanitized output rather than throwing behavior.
Final Verdict
production-ready with minor follow-ups
Both high-severity blockers from the prior review have been resolved: sanitizeRepoName now explicitly guards . and .., and parseRepoNameFromUrl returns null (not the string 'unknown') for all-dash/unparseable inputs, preserving correct ?? null-coalescing semantics for all registry callers. The core contribution — ReDoS fix, sanitizeRepoName centralization, clean extractRepoName delegation — is sound and addresses a genuine residual gap on main.
The one remaining item before merge is the comment/code mismatch for Windows reserved names with extensions: the comment at git.ts:275 explicitly says extension forms (CON.txt, NUL.log) are blocked, but the regex ^(CON|...)$ does not cover them. This should be resolved by either widening the regex to ^(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])(\..*)?$/i or correcting the comment and adding a test documenting the gap. Either path is a small, targeted change. The clone security boundary, registry semantics, ReDoS posture, and test coverage are all solid.
- Updated sanitizeRepoName to block Windows reserved names (CON, NUL, etc.) even when they have extensions (e.g., CON.txt). - Corrected regex and added unit tests for these edge cases to resolve CI failures on Windows. - Ref: abhigyanpatwari#1305 (comment)
Sanitizes the extracted repository name to prevent argument injection during git clone operations and ensures compatibility with various file systems.
1. Strips leading dashes to prevent git command-line argument injection.
2. Replaces unsafe directory characters with underscores.
3. Blocks path traversal segments ('.' and '..') and Windows reserved names.
4. Fixes ReDoS vulnerability in parseRepoNameFromUrl regex.
5. Added unit tests for sanitization and path traversal edge cases.
24c3890 to
ea9ed3b
Compare
- Updated sanitizeRepoName to block Windows reserved names (CON, NUL, etc.) even when they have extensions (e.g., CON.txt). - Corrected regex and added unit tests for these edge cases to resolve CI failures on Windows. - Ref: abhigyanpatwari#1305 (comment)
✨ PR AutofixPosted formatting / unused-import suggestions inline. Click Apply suggestion on each, or run locally: |
- Updated sanitizeRepoName to block Windows reserved names (CON, NUL, etc.) even when they have extensions (e.g., CON.txt). - Corrected regex and added unit tests for these edge cases to resolve CI failures on Windows. - Ref: abhigyanpatwari#1305 (comment)
ea9ed3b to
40ccda1
Compare
Summary
Sanitizes the extracted repository name to prevent argument injection during
git cloneoperations and ensures compatibility with various file systems.Motivation / context
When a user provides a repository URL that ends in a segment starting with a dash (e.g.,
https://github.com/user/--upload-pack=payload.git), the logic extracts--upload-pack=payloadas the repository name. This name is then used as the target directory in thegit clonecommand. Git interprets leading dashes in the target path as command-line flags, potentially leading to unintended behavior or security risks.Areas touched
gitnexus/(CLI / core / MCP server)Scope & constraints
In scope
<>:"/\|?*) to ensure cross-platform compatibility.src/storage/git.ts.Explicitly out of scope / not done here
Implementation notes
Applying the fix within a new
sanitizeRepoNamehelper instorage/git.tskeeps the core logic centralized. After looking at the duplicated extraction code, I decided to route bothgit-clone.tsand the registry inference path through this shared function. All leading dashes are now stripped to ensure the resulting name never starts with a dash, regardless of how many were provided in the URL.Testing & verification
cd gitnexus && npm testgit.test.tsandgit-clone.test.ts. Everything is green locally.Risk & rollout
Low risk. This only affects the naming of the local clone directory. Existing indexes are not affected unless they are re-analyzed from a URL that previously triggered the issue.
Checklist