Skip to content

fix(host-service): count untracked file lines in getStatus#3701

Merged
saddlepaddle merged 3 commits into
mainfrom
fabulous-orange
Apr 24, 2026
Merged

fix(host-service): count untracked file lines in getStatus#3701
saddlepaddle merged 3 commits into
mainfrom
fabulous-orange

Conversation

@saddlepaddle
Copy link
Copy Markdown
Collaborator

@saddlepaddle saddlepaddle commented Apr 24, 2026

Summary

  • v2 host-service getStatus pushed untracked (new, working-tree) files into unstaged with additions: 0, so the sidebar LOC delta in useDiffStats missed every newly-added file. Committed/staged new files were already correct via git diff --numstat.
  • Added countUntrackedFileLines(worktreePath, files) in git-helpers.ts and called it after the staged/unstaged loop. Reads each untracked file directly with realpath symlink-escape protection and a 1MB cap (skips multi-GB non-code files).
  • v1 desktop worker (apps/desktop/.../git-task-handlers.ts) already does this via applyUntrackedLineCount; this brings v2 to parity.

Why not git diff --no-index --numstat /dev/null <file>: it would spawn a subprocess per untracked file. Reading files directly is significantly cheaper for workspaces with many untracked files (e.g. fresh checkout with un-gitignored build artifacts). git add -N would mutate the index inside a read-only getStatus, which we don't want.

Linear: SUPER-472

Test plan

  • In a v2 workspace, create a brand-new file in the worktree (don't git-add it). Sidebar LOC delta increases by the file's line count.
  • git add a new file. Sidebar LOC reflects it (already worked, regression check).
  • Modify an existing tracked file. Sidebar LOC reflects the diff.
  • Symlink an untracked file pointing outside the worktree — line count is skipped, no error.
  • Drop a >1MB file in the worktree — additions stays 0, no hang.

Summary by cubic

Fixes v2 sidebar LOC and file list accuracy by counting untracked file lines and detecting/collapsing renames and copies in host-service getStatus. Brings v2 to parity with v1 and addresses Linear SUPER-472; the sidebar now shows old → new for renames.

  • Bug Fixes
    • Added countUntrackedFileLines(worktreePath, files); reads untracked files with realpath checks, a 1 MB cap, 64-file I/O concurrency, and NUL-byte sniffing to skip binaries; sets additions so useDiffStats includes new files.
    • Added rename/copy detection via a temp-index git diff -M -C to merge deleted+untracked into single renamed/copied entries with correct stats; also apply -M -C to staged diffs and propagate oldPath so staged renames collapse to one 0/0 row.

Written for commit b0fbff1. Summary will update on new commits.

Summary by CodeRabbit

  • New Features

    • UI now shows prior filenames for renamed/copied entries in the changes list (e.g., "old → new").
  • Bug Fixes

    • Improved detection of unstaged renames/copies so change lists are more accurate.
    • Better handling of staged renames/copies to reduce duplicate or misattributed entries.
    • More accurate line counting for untracked files with safeguards for large/non-file paths and binary detection.

Untracked files were pushed into `unstaged` with `additions: 0`, so the
sidebar LOC delta missed every newly-added file in the working tree
(`useDiffStats` sums against-base + staged + unstaged).
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 98bad7ae-fdbf-4a9a-9f9c-748e9f98951a

📥 Commits

Reviewing files that changed from the base of the PR and between 351dca0 and b0fbff1.

📒 Files selected for processing (2)
  • packages/host-service/src/trpc/router/git/git.ts
  • packages/host-service/src/trpc/router/git/utils/git-helpers.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/host-service/src/trpc/router/git/git.ts
  • packages/host-service/src/trpc/router/git/utils/git-helpers.ts

📝 Walkthrough

Walkthrough

Untracked-file handling and rename detection in Git status were reworked: staged numstat now runs with rename/copy collapse flags and sets oldPath when applicable; untracked paths are collected, their line counts are computed from the filesystem, and unstaged rename/copy detection is performed by simulating intent-to-add with a temporary index and merging detected renames into the unstaged results.

Changes

Cohort / File(s) Summary
Git status logic
packages/host-service/src/trpc/router/git/git.ts
Staged numstat invoked with -M -C to collapse renames/copies and staged ChangedFile entries set oldPath from file.from when different. Untracked paths are gathered, countUntrackedFileLines(...) is called, detectUnstagedRenames(...) is run with a hasDeletions flag, and detected renames are merged into the final unstaged list (consumed deleted/untracked entries removed, rename entries appended).
Git helpers: untracked line counting & rename detection
packages/host-service/src/trpc/router/git/utils/git-helpers.ts
Adds countUntrackedFileLines(worktreePath, files) to validate paths, skip non-files and >1MiB, NUL-sniff for binary, decode UTF‑8 and count newlines (concurrency-limited, per-file failures swallowed). Adds detectUnstagedRenames(git, worktreePath, untrackedPaths, hasDeletions) which copies the index to a temp GIT_INDEX_FILE, runs git add --intent-to-add against it, executes git diff (name-status + numstat with -M -C) to detect renames/copies, returns DetectedRename[], and attempts temp cleanup; introduces the DetectedRename interface.
UI: file row rename display
apps/desktop/src/renderer/.../FileRow.tsx
Derives oldBasename from file.oldPath for renamed/copied statuses and conditionally renders the old basename followed by an arrow before the current basename in the file row UI.

Sequence Diagram(s)

sequenceDiagram
  participant Client as "Host Service API"
  participant Router as "git.ts (router)"
  participant Helper as "git-helpers"
  participant FS as "Filesystem"
  participant Git as "git (CLI)"
  participant Temp as "Temp Index"

  Client->>Router: getStatus request
  Router->>Helper: run staged numstat (-M -C) & build staged entries
  Router->>Helper: collect untracked paths
  Helper->>FS: resolve worktree & validate untracked paths
  Helper->>FS: countUntrackedFileLines(untrackedPaths)
  FS-->>Helper: additions per untracked file
  Helper->>Temp: copy real index to temp GIT_INDEX_FILE
  Helper->>Temp: set GIT_INDEX_FILE env
  Helper->>Git: git add --intent-to-add ...untrackedPaths (temp index)
  Git-->>Helper: temp index updated
  Helper->>Git: git diff --name-status -M -C and git diff --numstat -M -C (temp index)
  Git-->>Helper: rename/copy detections + numstat info
  Helper->>Helper: merge detected renames into unstaged list (filter consumed entries)
  Helper->>Temp: cleanup temp index
  Helper-->>Router: merged unstaged list
  Router-->>Client: final status response (staged + merged unstaged)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through indexes, copies, and line,
I counted new files beneath the pine.
I played pretend—an index in the grass—
Then watched git tell where names had changed en masse.
A tiny rabbit, proud of this small sign.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately summarizes the main change: counting untracked file lines in getStatus to fix sidebar LOC tracking.
Description check ✅ Passed Description includes clear problem statement, solution, rationale, test plan, and Linear issue reference; covers all essential template sections with comprehensive detail.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fabulous-orange

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR fixes a LOC delta undercount in getStatus by adding countUntrackedFileLines, which reads each untracked file directly to populate the additions field that git diff --numstat cannot provide for unindexed files. The path-traversal guards (lexical pre-check + realpath post-check) and 1 MB cap are well-designed; two minor style concerns remain around unbounded concurrency and silent binary-file counting.

Confidence Score: 5/5

Safe to merge — all remaining findings are P2 style/quality suggestions that do not affect correctness or security.

The fix is targeted and correct: object references are shared between untrackedFiles and unstaged so in-place mutations propagate properly, the path-traversal guards are sound (lexical pre-check + double realpath check), and the line-counting formula handles empty files, LF, and CRLF correctly. Both open findings are P2 (unbounded concurrency and binary-file line noise) — neither causes wrong data to be persisted or crashes under normal use.

packages/host-service/src/trpc/router/git/utils/git-helpers.ts — the two P2 concerns live here but are non-blocking.

Important Files Changed

Filename Overview
packages/host-service/src/trpc/router/git/utils/git-helpers.ts Adds countUntrackedFileLines with path-traversal guards (double realpath check), 1 MB file-size cap, and correct CRLF-aware line-counting logic; two P2 concerns: unbounded Promise.all concurrency and silent binary-file line counting.
packages/host-service/src/trpc/router/git/git.ts Correctly separates untracked entries into a dedicated untrackedFiles array (sharing object references with unstaged) and calls countUntrackedFileLines to mutate additions in-place before returning.

Sequence Diagram

sequenceDiagram
    participant Client
    participant getStatus
    participant git
    participant countUntrackedFileLines
    participant fs

    Client->>getStatus: query(workspaceId)
    getStatus->>git: status()
    getStatus->>git: diff --numstat -z --cached (staged)
    getStatus->>git: diff --numstat -z (unstaged)
    git-->>getStatus: status.files, stagedNumstat, unstagedNumstat

    loop for each status.files entry
        alt index=="?" && working_dir=="?"
            getStatus->>getStatus: push entry to untrackedFiles[] + unstaged[]
        else tracked & modified
            getStatus->>getStatus: push to staged[] or unstaged[] with numstat
        end
    end

    getStatus->>countUntrackedFileLines: (worktreePath, untrackedFiles[])
    loop for each untracked file (Promise.all)
        countUntrackedFileLines->>fs: realpath(worktreePath)
        countUntrackedFileLines->>fs: realpath(absolutePath)
        countUntrackedFileLines->>fs: stat(fileReal)
        alt size <= 1MB && isFile
            countUntrackedFileLines->>fs: readFile(fileReal, utf-8)
            fs-->>countUntrackedFileLines: content
            countUntrackedFileLines->>countUntrackedFileLines: count lines → file.additions
        end
    end
    countUntrackedFileLines-->>getStatus: mutations applied in-place

    getStatus-->>Client: staged, unstaged (with additions), currentBranch, ...
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/host-service/src/trpc/router/git/utils/git-helpers.ts
Line: 227-250

Comment:
**Unbounded concurrency may exhaust file descriptors**

`Promise.all` launches one `realpath` + `stat` + `readFile` for every untracked file simultaneously. On a fresh checkout where build artifacts are not gitignored there could be thousands of untracked files, which can hit the process's open-file-descriptor limit (`ulimit -n`, typically 1024 on Linux). Individual file operations will throw and be swallowed by the empty `catch {}`, so additions silently stay at 0 rather than crashing, but the counts will be silently wrong. A concurrency limiter (e.g. `p-limit` with ~64 slots) would prevent this without meaningfully increasing wall-clock time.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/host-service/src/trpc/router/git/utils/git-helpers.ts
Line: 241-247

Comment:
**Binary files under 1 MB will be read and "counted"**

`readFile(fileReal, "utf-8")` succeeds for binary files; Node.js replaces invalid byte sequences with the Unicode replacement character (`\uFFFD`) rather than throwing. A `.png`, `.lock`, or compiled object file under the 1 MB cap will get a non-zero `additions` value that is meaningless. Checking `stats.size === 0` (already covered) doesn't help here; you may want to also guard against files that look binary, e.g. by inspecting the first few bytes for null bytes before counting lines. This is a LOC quality concern, not a correctness or security issue.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(host-service): count untracked file ..." | Re-trigger Greptile

Comment on lines +227 to +250
await Promise.all(
files.map(async (file) => {
try {
const absolutePath = resolve(worktreePath, file.path);
if (!isPathWithinWorktree(worktreePath, absolutePath)) return;

const fileReal = await realpath(absolutePath);
if (!isPathWithinWorktree(worktreeReal, fileReal)) return;

const stats = await stat(fileReal);
if (!stats.isFile() || stats.size > MAX_UNTRACKED_LINE_COUNT_SIZE) {
return;
}

const content = await readFile(fileReal, "utf-8");
file.additions =
content === ""
? 0
: content.endsWith("\n")
? content.split(/\r?\n/).length - 1
: content.split(/\r?\n/).length;
} catch {}
}),
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unbounded concurrency may exhaust file descriptors

Promise.all launches one realpath + stat + readFile for every untracked file simultaneously. On a fresh checkout where build artifacts are not gitignored there could be thousands of untracked files, which can hit the process's open-file-descriptor limit (ulimit -n, typically 1024 on Linux). Individual file operations will throw and be swallowed by the empty catch {}, so additions silently stay at 0 rather than crashing, but the counts will be silently wrong. A concurrency limiter (e.g. p-limit with ~64 slots) would prevent this without meaningfully increasing wall-clock time.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/trpc/router/git/utils/git-helpers.ts
Line: 227-250

Comment:
**Unbounded concurrency may exhaust file descriptors**

`Promise.all` launches one `realpath` + `stat` + `readFile` for every untracked file simultaneously. On a fresh checkout where build artifacts are not gitignored there could be thousands of untracked files, which can hit the process's open-file-descriptor limit (`ulimit -n`, typically 1024 on Linux). Individual file operations will throw and be swallowed by the empty `catch {}`, so additions silently stay at 0 rather than crashing, but the counts will be silently wrong. A concurrency limiter (e.g. `p-limit` with ~64 slots) would prevent this without meaningfully increasing wall-clock time.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +241 to +247
const content = await readFile(fileReal, "utf-8");
file.additions =
content === ""
? 0
: content.endsWith("\n")
? content.split(/\r?\n/).length - 1
: content.split(/\r?\n/).length;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Binary files under 1 MB will be read and "counted"

readFile(fileReal, "utf-8") succeeds for binary files; Node.js replaces invalid byte sequences with the Unicode replacement character (\uFFFD) rather than throwing. A .png, .lock, or compiled object file under the 1 MB cap will get a non-zero additions value that is meaningless. Checking stats.size === 0 (already covered) doesn't help here; you may want to also guard against files that look binary, e.g. by inspecting the first few bytes for null bytes before counting lines. This is a LOC quality concern, not a correctness or security issue.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/trpc/router/git/utils/git-helpers.ts
Line: 241-247

Comment:
**Binary files under 1 MB will be read and "counted"**

`readFile(fileReal, "utf-8")` succeeds for binary files; Node.js replaces invalid byte sequences with the Unicode replacement character (`\uFFFD`) rather than throwing. A `.png`, `.lock`, or compiled object file under the 1 MB cap will get a non-zero `additions` value that is meaningless. Checking `stats.size === 0` (already covered) doesn't help here; you may want to also guard against files that look binary, e.g. by inspecting the first few bytes for null bytes before counting lines. This is a LOC quality concern, not a correctness or security issue.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
packages/host-service/src/trpc/router/git/git.ts (1)

139-164: Shared object reference across unstaged and untrackedFiles is an implicit contract.

The same entry object is pushed into both arrays, so the subsequent countUntrackedFileLines(worktreePath, untrackedFiles) call mutates objects that are also referenced by unstaged. This is what makes the fix work, but a future reader could reasonably assume untrackedFiles is independent of unstaged and introduce a bug (e.g., by cloning, filtering, or reassigning). Consider either a short comment pinning the invariant, or building unstaged from the final untrackedFiles state after counting to make the data flow explicit.

♻️ Option: make the mutation explicit with a comment
 			const unstaged: ChangedFile[] = [];
 			const untrackedFiles: ChangedFile[] = [];
 			for (const file of status.files) {
 				const wd = file.working_dir;
 				if (file.index === "?" && wd === "?") {
 					const entry: ChangedFile = {
 						path: file.path,
 						status: "untracked",
 						additions: 0,
 						deletions: 0,
 					};
+					// Same object reference — mutations by countUntrackedFileLines
+					// below propagate into `unstaged`.
 					untrackedFiles.push(entry);
 					unstaged.push(entry);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/host-service/src/trpc/router/git/git.ts` around lines 139 - 164, The
code currently pushes the same object reference "entry" into both untrackedFiles
and unstaged, relying on countUntrackedFileLines(worktreePath, untrackedFiles)
to mutate those objects for unstaged; make this invariant explicit or remove the
implicit sharing: either add a short comment above the loop noting that
untrackedFiles and unstaged intentionally share object references so
countUntrackedFileLines can update line counts, or change the flow to keep
untrackedFiles independent (push a new object to untrackedFiles, call
countUntrackedFileLines(worktreePath, untrackedFiles), then build unstaged from
the updated untrackedFiles) and use mapGitStatus(file.working_dir) as before to
set status — reference symbols: untrackedFiles, unstaged, entry,
countUntrackedFileLines, mapGitStatus, worktreePath.
packages/host-service/src/trpc/router/git/utils/git-helpers.ts (2)

236-247: Sub-1MB binary files still contribute misleading LOC.

stats.isFile() + size cap filters out very large non-code files, but any binary file under 1 MiB (small PNG, compiled artifact, pdf, etc.) is still read as utf-8 and split on newlines, producing an essentially random "additions" count in the sidebar. A cheap guard is to bail when the first read chunk contains a NUL byte — matching how git itself detects binary content.

♻️ Proposed guard
 				const content = await readFile(fileReal, "utf-8");
+				// Match git's binary heuristic: NUL byte in the content.
+				if (content.includes("\0")) return;
 				file.additions =
 					content === ""
 						? 0
 						: content.endsWith("\n")
 							? content.split(/\r?\n/).length - 1
 							: content.split(/\r?\n/).length;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/host-service/src/trpc/router/git/utils/git-helpers.ts` around lines
236 - 247, The current logic uses stat(...) and readFile(fileReal, "utf-8")
which still miscounts small binary files; modify the flow in the block that sets
file.additions (using stat, fileReal, MAX_UNTRACKED_LINE_COUNT_SIZE,
file.additions) to first read a small binary buffer of the file (e.g., first few
KB) and check for a NUL (0x00) byte — if found, return/do not set additions;
only if no NUL is detected then read the file as UTF-8 and compute additions as
before. Ensure the NUL-check happens before any utf-8 decoding to avoid
miscounting binary files.

227-250: Unbounded parallel filesystem reads on untracked files.

Promise.all(files.map(...)) fans out one concurrent realpath + stat + readFile per untracked file with no concurrency cap. In pathological repos (e.g., a freshly-scaffolded workspace, build outputs that aren't gitignored yet, or a repo state after an unpacked archive), this can hit the per-process FD limit (EMFILE) or spike memory when many <1MB files are read simultaneously. Consider batching with a small concurrency limit (e.g., 8–16).

♻️ Simple batching example
-	await Promise.all(
-		files.map(async (file) => {
-			try {
-				const absolutePath = resolve(worktreePath, file.path);
-				if (!isPathWithinWorktree(worktreePath, absolutePath)) return;
-
-				const fileReal = await realpath(absolutePath);
-				if (!isPathWithinWorktree(worktreeReal, fileReal)) return;
-
-				const stats = await stat(fileReal);
-				if (!stats.isFile() || stats.size > MAX_UNTRACKED_LINE_COUNT_SIZE) {
-					return;
-				}
-
-				const content = await readFile(fileReal, "utf-8");
-				file.additions =
-					content === ""
-						? 0
-						: content.endsWith("\n")
-							? content.split(/\r?\n/).length - 1
-							: content.split(/\r?\n/).length;
-			} catch {}
-		}),
-	);
+	const CONCURRENCY = 16;
+	for (let i = 0; i < files.length; i += CONCURRENCY) {
+		await Promise.all(
+			files.slice(i, i + CONCURRENCY).map(async (file) => {
+				/* same body */
+			}),
+		);
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/host-service/src/trpc/router/git/utils/git-helpers.ts` around lines
227 - 250, The current Promise.all(files.map(...)) in git-helpers.ts performs
unbounded concurrent realpath/stat/readFile calls and can exhaust file
descriptors; replace this flat Promise.all fan-out with a bounded concurrency
approach (e.g., use a p-limit with concurrency 8–16 or implement an async worker
queue) to process the files array, keeping the same per-file logic
(resolve(worktreePath, file.path), isPathWithinWorktree checks, realpath, stat
with MAX_UNTRACKED_LINE_COUNT_SIZE guard, readFile, and setting file.additions)
and preserving the existing try/catch behavior for each file; reference the
async block around realpath/stat/readFile and variables worktreePath,
worktreeReal, isPathWithinWorktree, realpath, stat, readFile,
MAX_UNTRACKED_LINE_COUNT_SIZE, and file.additions when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/host-service/src/trpc/router/git/git.ts`:
- Around line 139-164: The code currently pushes the same object reference
"entry" into both untrackedFiles and unstaged, relying on
countUntrackedFileLines(worktreePath, untrackedFiles) to mutate those objects
for unstaged; make this invariant explicit or remove the implicit sharing:
either add a short comment above the loop noting that untrackedFiles and
unstaged intentionally share object references so countUntrackedFileLines can
update line counts, or change the flow to keep untrackedFiles independent (push
a new object to untrackedFiles, call countUntrackedFileLines(worktreePath,
untrackedFiles), then build unstaged from the updated untrackedFiles) and use
mapGitStatus(file.working_dir) as before to set status — reference symbols:
untrackedFiles, unstaged, entry, countUntrackedFileLines, mapGitStatus,
worktreePath.

In `@packages/host-service/src/trpc/router/git/utils/git-helpers.ts`:
- Around line 236-247: The current logic uses stat(...) and readFile(fileReal,
"utf-8") which still miscounts small binary files; modify the flow in the block
that sets file.additions (using stat, fileReal, MAX_UNTRACKED_LINE_COUNT_SIZE,
file.additions) to first read a small binary buffer of the file (e.g., first few
KB) and check for a NUL (0x00) byte — if found, return/do not set additions;
only if no NUL is detected then read the file as UTF-8 and compute additions as
before. Ensure the NUL-check happens before any utf-8 decoding to avoid
miscounting binary files.
- Around line 227-250: The current Promise.all(files.map(...)) in git-helpers.ts
performs unbounded concurrent realpath/stat/readFile calls and can exhaust file
descriptors; replace this flat Promise.all fan-out with a bounded concurrency
approach (e.g., use a p-limit with concurrency 8–16 or implement an async worker
queue) to process the files array, keeping the same per-file logic
(resolve(worktreePath, file.path), isPathWithinWorktree checks, realpath, stat
with MAX_UNTRACKED_LINE_COUNT_SIZE guard, readFile, and setting file.additions)
and preserving the existing try/catch behavior for each file; reference the
async block around realpath/stat/readFile and variables worktreePath,
worktreeReal, isPathWithinWorktree, realpath, stat, readFile,
MAX_UNTRACKED_LINE_COUNT_SIZE, and file.additions when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fe52e57e-5504-4fe1-bcd4-a99ddd4a9c46

📥 Commits

Reviewing files that changed from the base of the PR and between f59eeba and 85c8782.

📒 Files selected for processing (2)
  • packages/host-service/src/trpc/router/git/git.ts
  • packages/host-service/src/trpc/router/git/utils/git-helpers.ts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

🧹 Preview Cleanup Complete

The following preview resources have been cleaned up:

  • ✅ Neon database branch

Thank you for your contribution! 🎉

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/host-service/src/trpc/router/git/utils/git-helpers.ts">

<violation number="1" location="packages/host-service/src/trpc/router/git/utils/git-helpers.ts:227">
P1: Unbounded concurrency on file I/O: `Promise.all` opens `realpath` + `stat` + `readFile` for every untracked file simultaneously. In workspaces with thousands of untracked files (the PR description explicitly mentions this scenario), this will hit the OS file-descriptor limit (`EMFILE`). Because errors are swallowed by the empty `catch {}`, additions silently stay at `0` — defeating the purpose of this change. Use a concurrency limiter (e.g. `p-limit(64)`) to cap parallel file operations.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/host-service/src/trpc/router/git/utils/git-helpers.ts Outdated
Run git's real rename/copy detection over the working tree by copying
.git/index to a temp file, marking untracked files intent-to-add against
that copy, and running `git diff -M -C`. Real index is never mutated.

Merges matched deleted+untracked pairs into single rename entries with
correct additions/deletions, and renders `old → new` in the v2 sidebar
file row.

Catches:
- mv tracked → untracked (rename)
- cp tracked → untracked (copy, sourced from the tracked blob)
- mv with edits (rename with non-zero stats)

Falls back silently to the unmerged deleted+untracked listing on any
error (missing index, copy failure, intent-to-add reject).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/hooks/useChangesTab/components/ChangesFileList/components/FileRow/FileRow.tsx (1)

46-75: ⚠️ Potential issue | 🟡 Minor

Cross-directory renames render a misleading path.

dir comes from the new path, but oldBasename is only the leaf of oldPath. If a file was renamed src/foo.ts → lib/bar.ts, this renders as lib/ foo.ts → bar.ts, which reads as if the old file lived in lib/ too. For the case rename detection is most useful for (moves across directories), the old location is lost in the UI.

Either show the full old relative path, or split and render the old dir when it differs:

🛠️ One possible shape
-	const { dir, basename } = splitPath(file.path);
-	const oldBasename =
-		file.oldPath && (file.status === "renamed" || file.status === "copied")
-			? splitPath(file.oldPath).basename
-			: null;
+	const { dir, basename } = splitPath(file.path);
+	const old =
+		file.oldPath && (file.status === "renamed" || file.status === "copied")
+			? splitPath(file.oldPath)
+			: null;
-				{oldBasename && (
-					<span className="truncate text-muted-foreground">
-						{oldBasename}
-						<span className="px-1">→</span>
-					</span>
-				)}
+				{old && (
+					<span className="truncate text-muted-foreground">
+						{old.dir !== dir && old.dir}
+						{old.basename}
+						<span className="px-1">→</span>
+					</span>
+				)}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/`$workspaceId/components/WorkspaceSidebar/hooks/useChangesTab/components/ChangesFileList/components/FileRow/FileRow.tsx
around lines 46 - 75, oldBasename only shows the old path's leaf which loses
directory info for cross-directory renames; update the logic around oldBasename
(and where splitPath(file.oldPath) is used) to compute oldDir and oldRelPath
(e.g., splitPath(file.oldPath).dir and basename) and then render the old path's
directory when it differs from the new dir (or render the full old relative
path) inside the FileRow render path that currently uses dir and oldBasename;
ensure on the UI you show the old directory segment (or full old relative path)
before the arrow so renames like src/foo.ts → lib/bar.ts read correctly.
🧹 Nitpick comments (3)
packages/host-service/src/trpc/router/git/git.ts (1)

165-173: Independent work — can run in parallel.

countUntrackedFileLines only touches untrackedFiles[].additions on the filesystem and detectUnstagedRenames only shells out to git in a throwaway index; neither reads the other's output. A Promise.all would knock the added getStatus latency down noticeably on workspaces with many new files.

⚙️ Suggested tweak
-			await countUntrackedFileLines(worktreePath, untrackedFiles);
-
-			const hasDeletions = unstaged.some((f) => f.status === "deleted");
-			const renames = await detectUnstagedRenames(
-				git,
-				worktreePath,
-				untrackedFiles.map((f) => f.path),
-				hasDeletions,
-			);
+			const hasDeletions = unstaged.some((f) => f.status === "deleted");
+			const [, renames] = await Promise.all([
+				countUntrackedFileLines(worktreePath, untrackedFiles),
+				detectUnstagedRenames(
+					git,
+					worktreePath,
+					untrackedFiles.map((f) => f.path),
+					hasDeletions,
+				),
+			]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/host-service/src/trpc/router/git/git.ts` around lines 165 - 173,
countUntrackedFileLines and detectUnstagedRenames are independent and currently
run sequentially, adding unnecessary latency; modify the code around
countUntrackedFileLines(...) and detectUnstagedRenames(git, worktreePath,
untrackedFiles.map(...), hasDeletions) to execute them in parallel (e.g., via
Promise.all) while still preserving the computed hasDeletions and the
untrackedFiles input, and await both results before proceeding.
packages/host-service/src/trpc/router/git/utils/git-helpers.ts (2)

244-255: Consider a binary heuristic to skip non-text files.

Files under 1 MiB that are binary (images, small pdfs, sqlite dbs, .lock blobs, etc.) still get readFile(..., "utf-8") and split on \r?\n, producing an arbitrary additions based on how many \n bytes happen to appear. That contradicts the comment's own rationale ("LOC signal isn't useful"), and the value still ends up in the sidebar delta. A cheap NUL-byte sniff on the first read chunk (or a filename/extension skip list) would match the stated intent without adding meaningful cost.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/host-service/src/trpc/router/git/utils/git-helpers.ts` around lines
244 - 255, The current logic in git-helpers.ts reads any file under
MAX_UNTRACKED_LINE_COUNT_SIZE with readFile(..., "utf-8") and computes
file.additions by splitting on newlines, which miscounts binary blobs; before
calling readFile (after stat and before computing file.additions), perform a
cheap binary sniff (e.g., read the first chunk or N bytes and check for a NUL
byte) or match against a small binary-extension whitelist, and if the sniff
indicates binary, skip reading/parsing and set file.additions to 0 or undefined;
update the code paths around the stat -> readFile -> file.additions logic to use
this binary check so binary files are not treated as text.

307-317: git add --intent-to-add -- ...untrackedPaths can hit ARG_MAX on bulk-scaffolded repos.

For workspaces with a very large untracked set (e.g., first commit of a scaffolded project, or a node_modules-adjacent folder that isn't yet ignored), spreading every path into a single argv crosses the OS argument length limit (~128 KB on Linux) and git add fails with E2BIG. The whole function then silently returns [], so users lose unstaged rename detection in exactly the scenario where it matters most.

Two cheap mitigations:

  • Batch the paths into chunks (e.g., 500–1000 per git add invocation), or
  • Pipe paths via stdin with git add --intent-to-add --pathspec-from-file=- --pathspec-file-nul - and feed NUL-separated paths.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/host-service/src/trpc/router/git/utils/git-helpers.ts` around lines
307 - 317, The current call in git-helpers
(tempGit.raw(["add","--intent-to-add","--", ...untrackedPaths])) can exceed
ARG_MAX for large untrackedPaths; change the add invocation to feed
NUL-separated paths via stdin instead: call tempGit.raw with args
["add","--intent-to-add","--pathspec-from-file=-","--pathspec-file-nul","--"]
and pass a NUL-terminated Buffer of untrackedPaths (e.g.,
Buffer.from(untrackedPaths.join("\0") + "\0")) as stdin to avoid E2BIG; update
the code around tempGit, worktreePath and tempIndex usages to use this stdin
approach (or as a fallback implement chunked batches of paths if you prefer) so
the function no longer silently fails on very large untracked sets.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/`$workspaceId/components/WorkspaceSidebar/hooks/useChangesTab/components/ChangesFileList/components/FileRow/FileRow.tsx:
- Around line 46-75: oldBasename only shows the old path's leaf which loses
directory info for cross-directory renames; update the logic around oldBasename
(and where splitPath(file.oldPath) is used) to compute oldDir and oldRelPath
(e.g., splitPath(file.oldPath).dir and basename) and then render the old path's
directory when it differs from the new dir (or render the full old relative
path) inside the FileRow render path that currently uses dir and oldBasename;
ensure on the UI you show the old directory segment (or full old relative path)
before the arrow so renames like src/foo.ts → lib/bar.ts read correctly.

---

Nitpick comments:
In `@packages/host-service/src/trpc/router/git/git.ts`:
- Around line 165-173: countUntrackedFileLines and detectUnstagedRenames are
independent and currently run sequentially, adding unnecessary latency; modify
the code around countUntrackedFileLines(...) and detectUnstagedRenames(git,
worktreePath, untrackedFiles.map(...), hasDeletions) to execute them in parallel
(e.g., via Promise.all) while still preserving the computed hasDeletions and the
untrackedFiles input, and await both results before proceeding.

In `@packages/host-service/src/trpc/router/git/utils/git-helpers.ts`:
- Around line 244-255: The current logic in git-helpers.ts reads any file under
MAX_UNTRACKED_LINE_COUNT_SIZE with readFile(..., "utf-8") and computes
file.additions by splitting on newlines, which miscounts binary blobs; before
calling readFile (after stat and before computing file.additions), perform a
cheap binary sniff (e.g., read the first chunk or N bytes and check for a NUL
byte) or match against a small binary-extension whitelist, and if the sniff
indicates binary, skip reading/parsing and set file.additions to 0 or undefined;
update the code paths around the stat -> readFile -> file.additions logic to use
this binary check so binary files are not treated as text.
- Around line 307-317: The current call in git-helpers
(tempGit.raw(["add","--intent-to-add","--", ...untrackedPaths])) can exceed
ARG_MAX for large untrackedPaths; change the add invocation to feed
NUL-separated paths via stdin instead: call tempGit.raw with args
["add","--intent-to-add","--pathspec-from-file=-","--pathspec-file-nul","--"]
and pass a NUL-terminated Buffer of untrackedPaths (e.g.,
Buffer.from(untrackedPaths.join("\0") + "\0")) as stdin to avoid E2BIG; update
the code around tempGit, worktreePath and tempIndex usages to use this stdin
approach (or as a fallback implement chunked batches of paths if you prefer) so
the function no longer silently fails on very large untracked sets.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5e96475b-cd08-435b-80e5-fc2ef3e15ad6

📥 Commits

Reviewing files that changed from the base of the PR and between 85c8782 and 351dca0.

📒 Files selected for processing (3)
  • apps/desktop/src/renderer/routes/_authenticated/_dashboard/v2-workspace/$workspaceId/components/WorkspaceSidebar/hooks/useChangesTab/components/ChangesFileList/components/FileRow/FileRow.tsx
  • packages/host-service/src/trpc/router/git/git.ts
  • packages/host-service/src/trpc/router/git/utils/git-helpers.ts

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/host-service/src/trpc/router/git/utils/git-helpers.ts">

<violation number="1" location="packages/host-service/src/trpc/router/git/utils/git-helpers.ts:340">
P3: Do not silently swallow temp-dir cleanup failures; log a warning with context so cleanup issues are observable.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/host-service/src/trpc/router/git/utils/git-helpers.ts Outdated
- Cap parallel file I/O at 64 workers in countUntrackedFileLines so
  workspaces with thousands of untracked files don't EMFILE
- Sniff first 8KB for NUL bytes before counting lines so binary files
  under the 1MB cap (PNGs, lockfiles, compiled artifacts) don't get a
  meaningless utf-8 line count
- Log temp-dir cleanup failures instead of silently swallowing
- Add -M -C to staged numstat and propagate file.from as oldPath so
  staged renames collapse to a single 0/0 entry (matches the working-
  tree rename detection added in the previous commit)
@saddlepaddle saddlepaddle merged commit ba044f7 into main Apr 24, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant