[codex] add v2 end-to-end QA loop#4544
Conversation
📝 WalkthroughWalkthroughThis pull request adds three independent safety and performance improvements: Git pager environment sanitization to prevent simple-git conflicts, WebSocket close reason truncation to prevent protocol violations, and terminal resize batching to reduce layout thrashing. It also documents a comprehensive V2 end-to-end QA test plan with execution results and findings. ChangesGit Pager Environment Sanitization
WebSocket Close Reason Truncation
Terminal Resize Measurement Batching
V2 End-to-End QA Test Plan
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 Preview Deployment🔗 Preview Links
Preview updates automatically with new commits |
|
Capy auto-review is paused for this organization because the monthly auto-review limit has been reached. Increase the limit or turn it off in billing settings to resume automatic reviews. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/host-service/src/terminal/terminal.test.ts (1)
4-18: ⚡ Quick winConsider adding edge case tests for comprehensive coverage.
The current tests cover the main paths effectively. Adding tests for edge cases would further improve confidence:
- Empty string input
- Reason exactly at 123-byte boundary
- Multi-byte UTF-8 characters near the truncation boundary
- String that becomes empty after truncation (e.g., first character is very close to limit)
🧪 Suggested additional test cases
test("truncates long UTF-8 close reasons to the WebSocket byte limit", () => { const reason = `Terminal session "${"session-".repeat(40)}🔥" not found; create it before connecting.`; const safeReason = toSafeWebSocketCloseReason(reason); expect(Buffer.byteLength(safeReason, "utf8")).toBeLessThanOrEqual(123); expect(safeReason.endsWith("...")).toBe(true); }); + + test("handles empty string", () => { + expect(toSafeWebSocketCloseReason("")).toBe(""); + }); + + test("handles reason at exact byte limit", () => { + // "a" is 1 byte, create a 123-byte string + const reason = "a".repeat(123); + expect(toSafeWebSocketCloseReason(reason)).toBe(reason); + }); + + test("truncates multi-byte UTF-8 at boundary correctly", () => { + // Create string with 4-byte emoji characters near the limit + const reason = "🔥".repeat(50); // 200 bytes total + const safeReason = toSafeWebSocketCloseReason(reason); + + expect(Buffer.byteLength(safeReason, "utf8")).toBeLessThanOrEqual(123); + expect(safeReason.endsWith("...")).toBe(true); + // Verify it's still valid UTF-8 (no partial characters) + expect(() => Buffer.from(safeReason, "utf8")).not.toThrow(); + }); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/host-service/src/terminal/terminal.test.ts` around lines 4 - 18, Add unit tests for toSafeWebSocketCloseReason to cover edge cases: (1) an empty string input should return an empty string; (2) a reason whose UTF-8 byte length is exactly 123 bytes should be preserved unchanged; (3) a reason with multi-byte UTF-8 characters placed so truncation falls inside a multi-byte sequence should still produce a valid UTF-8 string and end with "..." when truncated; and (4) a very short limit case where truncation would remove all characters (e.g., single multi-byte character exceeding 123 bytes) should return "..." or the function's defined safe output. Target the test file that contains toSafeWebSocketCloseReason and add these cases alongside the existing tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/desktop/src/renderer/lib/terminal/terminal-runtime.ts`:
- Around line 181-201: The loop in flushQueuedResizes must isolate per-runtime
failures so one thrown error doesn't abort processing and leave entries
stranded; for each iteration (using runtimesWithQueuedResize, runtime, onResize,
measureAndResize, and MAX_RESIZE_MEASURES_PER_FRAME) ensure you remove the
runtime from runtimesWithQueuedResize up front (or in a finally), then wrap the
call to measureAndResize() and the subsequent onResize?.() in a try/catch that
catches/logs the error and continues to the next runtime; preserve the processed
count behavior and still call scheduleResizeFlush() if items remain so remaining
runtimes are retried.
---
Nitpick comments:
In `@packages/host-service/src/terminal/terminal.test.ts`:
- Around line 4-18: Add unit tests for toSafeWebSocketCloseReason to cover edge
cases: (1) an empty string input should return an empty string; (2) a reason
whose UTF-8 byte length is exactly 123 bytes should be preserved unchanged; (3)
a reason with multi-byte UTF-8 characters placed so truncation falls inside a
multi-byte sequence should still produce a valid UTF-8 string and end with "..."
when truncated; and (4) a very short limit case where truncation would remove
all characters (e.g., single multi-byte character exceeding 123 bytes) should
return "..." or the function's defined safe output. Target the test file that
contains toSafeWebSocketCloseReason and add these cases alongside the existing
tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 45666193-7f70-4282-b5e6-cbe9d3fb7048
📒 Files selected for processing (8)
apps/desktop/src/lib/trpc/routers/workspaces/utils/shell-env.test.tsapps/desktop/src/lib/trpc/routers/workspaces/utils/shell-env.tsapps/desktop/src/main/lib/host-service-coordinator.test.tsapps/desktop/src/main/lib/host-service-coordinator.tsapps/desktop/src/renderer/lib/terminal/terminal-runtime.tspackages/host-service/src/terminal/terminal.test.tspackages/host-service/src/terminal/terminal.tsplans/v2-end-to-end-qa-loop.md
| function flushQueuedResizes() { | ||
| resizeFlushRafId = null; | ||
| if (resizeFlushTimeoutId !== null) { | ||
| clearTimeout(resizeFlushTimeoutId); | ||
| resizeFlushTimeoutId = null; | ||
| } | ||
|
|
||
| let processed = 0; | ||
| for (const [runtime, onResize] of Array.from(runtimesWithQueuedResize)) { | ||
| runtimesWithQueuedResize.delete(runtime); | ||
| if (!runtime.container) continue; | ||
| const changed = measureAndResize(runtime); | ||
| if (changed) onResize?.(); | ||
| processed += 1; | ||
| if (processed >= MAX_RESIZE_MEASURES_PER_FRAME) break; | ||
| } | ||
|
|
||
| if (runtimesWithQueuedResize.size > 0) { | ||
| scheduleResizeFlush(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Isolate per-runtime measurement errors in the flush loop.
If measureAndResize throws for any one runtime (e.g., fitAddon.fit() failing on a partially torn-down terminal), the for loop aborts. Since resizeFlushRafId/resizeFlushTimeoutId are already cleared at the top of flushQueuedResizes, the remaining entries stay in runtimesWithQueuedResize with no scheduled flush — they will only be processed if another scheduleMeasureAndResize happens to be called later, and they continue to hold strong references in the meantime.
Wrap each iteration so one bad runtime can't strand the rest.
🛡️ Proposed fix
let processed = 0;
for (const [runtime, onResize] of Array.from(runtimesWithQueuedResize)) {
runtimesWithQueuedResize.delete(runtime);
if (!runtime.container) continue;
- const changed = measureAndResize(runtime);
- if (changed) onResize?.();
+ try {
+ const changed = measureAndResize(runtime);
+ if (changed) onResize?.();
+ } catch (err) {
+ console.error("terminal-runtime: measureAndResize failed", err);
+ }
processed += 1;
if (processed >= MAX_RESIZE_MEASURES_PER_FRAME) break;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function flushQueuedResizes() { | |
| resizeFlushRafId = null; | |
| if (resizeFlushTimeoutId !== null) { | |
| clearTimeout(resizeFlushTimeoutId); | |
| resizeFlushTimeoutId = null; | |
| } | |
| let processed = 0; | |
| for (const [runtime, onResize] of Array.from(runtimesWithQueuedResize)) { | |
| runtimesWithQueuedResize.delete(runtime); | |
| if (!runtime.container) continue; | |
| const changed = measureAndResize(runtime); | |
| if (changed) onResize?.(); | |
| processed += 1; | |
| if (processed >= MAX_RESIZE_MEASURES_PER_FRAME) break; | |
| } | |
| if (runtimesWithQueuedResize.size > 0) { | |
| scheduleResizeFlush(); | |
| } | |
| } | |
| function flushQueuedResizes() { | |
| resizeFlushRafId = null; | |
| if (resizeFlushTimeoutId !== null) { | |
| clearTimeout(resizeFlushTimeoutId); | |
| resizeFlushTimeoutId = null; | |
| } | |
| let processed = 0; | |
| for (const [runtime, onResize] of Array.from(runtimesWithQueuedResize)) { | |
| runtimesWithQueuedResize.delete(runtime); | |
| if (!runtime.container) continue; | |
| try { | |
| const changed = measureAndResize(runtime); | |
| if (changed) onResize?.(); | |
| } catch (err) { | |
| console.error("terminal-runtime: measureAndResize failed", err); | |
| } | |
| processed += 1; | |
| if (processed >= MAX_RESIZE_MEASURES_PER_FRAME) break; | |
| } | |
| if (runtimesWithQueuedResize.size > 0) { | |
| scheduleResizeFlush(); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/desktop/src/renderer/lib/terminal/terminal-runtime.ts` around lines 181
- 201, The loop in flushQueuedResizes must isolate per-runtime failures so one
thrown error doesn't abort processing and leave entries stranded; for each
iteration (using runtimesWithQueuedResize, runtime, onResize, measureAndResize,
and MAX_RESIZE_MEASURES_PER_FRAME) ensure you remove the runtime from
runtimesWithQueuedResize up front (or in a finally), then wrap the call to
measureAndResize() and the subsequent onResize?.() in a try/catch that
catches/logs the error and continues to the next runtime; preserve the processed
count behavior and still call scheduleResizeFlush() if items remain so remaining
runtimes are retried.
Greptile SummaryThis PR fixes three real bugs surfaced by an expanded renderer stress run and adds the QA plan that found them. All three fixes ship with targeted regression tests.
Confidence Score: 4/5Safe to merge; all three bug fixes are well-targeted and covered by new regression tests, with one minor batch-scheduling edge case worth a follow-up. The git env stripping and WebSocket close truncation are clean and correct. The batched resize queue is a sound architectural improvement, but the per-frame slot counter increments for hidden terminals even though they exit measureAndResize immediately without doing any layout work — in a high-terminal-count font-change scenario this could delay visible terminals by extra frames. apps/desktop/src/renderer/lib/terminal/terminal-runtime.ts — the flushQueuedResizes batch counter, and apps/desktop/src/lib/trpc/routers/workspaces/utils/shell-env.ts — the missing JSDoc on the newly exported function.
|
| Filename | Overview |
|---|---|
| apps/desktop/src/renderer/lib/terminal/terminal-runtime.ts | Introduces a module-level RAF-batched resize queue to coalesce rapid terminal fit/visibility reads; hidden terminals consume per-frame batch slots even though measureAndResize returns early for them. |
| apps/desktop/src/lib/trpc/routers/workspaces/utils/shell-env.ts | Extracts pager-stripping logic into exported stripUnsafeSimpleGitEnv; extends coverage to GH_PAGER and LESS; the prior explanatory comment about git ≥ 2.50 was not preserved on the new export. |
| packages/host-service/src/terminal/terminal.ts | Adds toSafeWebSocketCloseReason to clamp close reason strings to the 123-byte RFC 6455 limit; wraps all four socket.close call sites; correctly handles multi-byte UTF-8. |
| apps/desktop/src/main/lib/host-service-coordinator.ts | Calls stripUnsafeSimpleGitEnv on the child-process environment for host-service spawn, ensuring pager vars are stripped consistently across both git call sites. |
| packages/host-service/src/terminal/terminal.test.ts | New test file covering short-reason passthrough and UTF-8 multi-byte truncation for toSafeWebSocketCloseReason. |
| apps/desktop/src/main/lib/host-service-coordinator.test.ts | Adds a test suite verifying that buildEnv strips all four pager env vars while preserving HOST_SERVICE_PORT and HOST_SERVICE_SECRET. |
| apps/desktop/src/lib/trpc/routers/workspaces/utils/shell-env.test.ts | Adds a test for getProcessEnvWithShellPath confirming all four pager vars are stripped while PATH is preserved. |
| plans/v2-end-to-end-qa-loop.md | New QA plan mapping 12 user journeys with happy paths, edge cases, implementation owners, runnable command sets, and a first-pass execution log including three bugs found and fixed. |
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
apps/desktop/src/renderer/lib/terminal/terminal-runtime.ts:188-196
**Hidden terminals consume per-frame batch slots unnecessarily**
`measureAndResize` returns `false` immediately for any container that fails `hostIsVisible` (zero `clientWidth`/`clientHeight`), meaning it never calls `fitAddon.fit()`. The `processed` counter still increments for these no-op entries, however, so they compete for the `MAX_RESIZE_MEASURES_PER_FRAME = 8` budget alongside terminals that actually need a layout read. In a font-change scenario with many background-tab terminals queued (e.g. the 128-terminal stress fixture), visible terminals could be pushed into later frames unnecessarily. Adding a `hostIsVisible` guard before `processed += 1` (or before calling `measureAndResize`) would restrict slot consumption to the expensive `fit()` path.
### Issue 2 of 2
apps/desktop/src/lib/trpc/routers/workspaces/utils/shell-env.ts:205-211
The prior comment block explaining the git ≥ 2.50 / `simple-git` "allowUnsafePager" restriction was removed when this function was extracted. Since `stripUnsafeSimpleGitEnv` is now exported and called from a second callsite (`host-service-coordinator.ts`), future contributors won't know why `GH_PAGER` and `LESS` are treated as unsafe without digging into git history. A brief JSDoc preserves that context.
```suggestion
/**
* Strips env vars that cause simple-git / git ≥ 2.50 to throw
* `GitPluginError: Use of "PAGER" is not permitted without enabling
* allowUnsafePager` on non-interactive callers. Programmatic git
* invocations have piped stdout so git skips paging entirely; the
* stripped keys are safe to remove from app-owned git environments.
*/
export function stripUnsafeSimpleGitEnv(
env: Record<string, string | undefined>,
): void {
for (const key of SIMPLE_GIT_UNSAFE_ENV_KEYS) {
delete env[key];
}
}
```
Reviews (1): Last reviewed commit: "fix(desktop): address v2 QA regressions" | Re-trigger Greptile
| let processed = 0; | ||
| for (const [runtime, onResize] of Array.from(runtimesWithQueuedResize)) { | ||
| runtimesWithQueuedResize.delete(runtime); | ||
| if (!runtime.container) continue; | ||
| const changed = measureAndResize(runtime); | ||
| if (changed) onResize?.(); | ||
| processed += 1; | ||
| if (processed >= MAX_RESIZE_MEASURES_PER_FRAME) break; | ||
| } |
There was a problem hiding this comment.
Hidden terminals consume per-frame batch slots unnecessarily
measureAndResize returns false immediately for any container that fails hostIsVisible (zero clientWidth/clientHeight), meaning it never calls fitAddon.fit(). The processed counter still increments for these no-op entries, however, so they compete for the MAX_RESIZE_MEASURES_PER_FRAME = 8 budget alongside terminals that actually need a layout read. In a font-change scenario with many background-tab terminals queued (e.g. the 128-terminal stress fixture), visible terminals could be pushed into later frames unnecessarily. Adding a hostIsVisible guard before processed += 1 (or before calling measureAndResize) would restrict slot consumption to the expensive fit() path.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/renderer/lib/terminal/terminal-runtime.ts
Line: 188-196
Comment:
**Hidden terminals consume per-frame batch slots unnecessarily**
`measureAndResize` returns `false` immediately for any container that fails `hostIsVisible` (zero `clientWidth`/`clientHeight`), meaning it never calls `fitAddon.fit()`. The `processed` counter still increments for these no-op entries, however, so they compete for the `MAX_RESIZE_MEASURES_PER_FRAME = 8` budget alongside terminals that actually need a layout read. In a font-change scenario with many background-tab terminals queued (e.g. the 128-terminal stress fixture), visible terminals could be pushed into later frames unnecessarily. Adding a `hostIsVisible` guard before `processed += 1` (or before calling `measureAndResize`) would restrict slot consumption to the expensive `fit()` path.
How can I resolve this? If you propose a fix, please make it concise.| export function stripUnsafeSimpleGitEnv( | ||
| env: Record<string, string | undefined>, | ||
| ): void { | ||
| for (const key of SIMPLE_GIT_UNSAFE_ENV_KEYS) { | ||
| delete env[key]; | ||
| } | ||
| } |
There was a problem hiding this comment.
The prior comment block explaining the git ≥ 2.50 /
simple-git "allowUnsafePager" restriction was removed when this function was extracted. Since stripUnsafeSimpleGitEnv is now exported and called from a second callsite (host-service-coordinator.ts), future contributors won't know why GH_PAGER and LESS are treated as unsafe without digging into git history. A brief JSDoc preserves that context.
| export function stripUnsafeSimpleGitEnv( | |
| env: Record<string, string | undefined>, | |
| ): void { | |
| for (const key of SIMPLE_GIT_UNSAFE_ENV_KEYS) { | |
| delete env[key]; | |
| } | |
| } | |
| /** | |
| * Strips env vars that cause simple-git / git ≥ 2.50 to throw | |
| * `GitPluginError: Use of "PAGER" is not permitted without enabling | |
| * allowUnsafePager` on non-interactive callers. Programmatic git | |
| * invocations have piped stdout so git skips paging entirely; the | |
| * stripped keys are safe to remove from app-owned git environments. | |
| */ | |
| export function stripUnsafeSimpleGitEnv( | |
| env: Record<string, string | undefined>, | |
| ): void { | |
| for (const key of SIMPLE_GIT_UNSAFE_ENV_KEYS) { | |
| delete env[key]; | |
| } | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/lib/trpc/routers/workspaces/utils/shell-env.ts
Line: 205-211
Comment:
The prior comment block explaining the git ≥ 2.50 / `simple-git` "allowUnsafePager" restriction was removed when this function was extracted. Since `stripUnsafeSimpleGitEnv` is now exported and called from a second callsite (`host-service-coordinator.ts`), future contributors won't know why `GH_PAGER` and `LESS` are treated as unsafe without digging into git history. A brief JSDoc preserves that context.
```suggestion
/**
* Strips env vars that cause simple-git / git ≥ 2.50 to throw
* `GitPluginError: Use of "PAGER" is not permitted without enabling
* allowUnsafePager` on non-interactive callers. Programmatic git
* invocations have piped stdout so git skips paging entirely; the
* stripped keys are safe to remove from app-owned git environments.
*/
export function stripUnsafeSimpleGitEnv(
env: Record<string, string | undefined>,
): void {
for (const key of SIMPLE_GIT_UNSAFE_ENV_KEYS) {
delete env[key];
}
}
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
Adds a v2-only end-to-end QA loop plan covering the user journeys documented in Superset docs and represented in the desktop, host-service, CLI, and MCP implementation surfaces.
The plan includes:
Validation
bun run lintbun --cwd packages/host-service test src/trpc/router/workspace-creation src/trpc/router/project/utils src/trpc/router/git/v2-diff-surfaces.integration.test.ts test/integration/workspace-create-delete.integration.test.ts test/integration/workspace-cleanup.integration.test.ts test/integration/git.integration.test.ts test/integration/bug-hunt-3.integration.test.tsbun --cwd packages/panes testbun --cwd apps/desktop test src/shared/workspace-run-definition.test.ts src/shared/context src/renderer/routes/_authenticated/_dashboard/utils/workspace-navigation.test.ts 'src/renderer/routes/_authenticated/_dashboard/v2-workspace'bun --cwd apps/desktop test src/renderer/lib/terminal src/main/terminal-host src/main/lib/host-service-coordinator.test.ts src/main/lib/terminal-escape-filter.test.tsbun --cwd packages/host-service test src/trpc/router/terminal src/trpc/router/attachments src/trpc/router/settings src/trpc/router/config src/trpc/router/notificationsbun test packages/cli/src/lib/resolve-auth.test.tsbun --cwd packages/mcp testbun --cwd packages/mcp-v2 typecheckSummary by cubic
Adds a v2 end-to-end QA loop plan and fixes regressions found during the run: sanitize git env for desktop and host-service, batch terminal resize work to keep the UI responsive, and clamp terminal WebSocket close reasons to spec.
New Features
plans/v2-end-to-end-qa-loop.mdwith scope, test data, and a journey matrix (onboarding, workspaces, panes/tabs, terminal, files/diff/git, setup/run/presets, ports/browser, chat/agents, tasks/automations, settings/auth/hosts).apps/desktop,packages/host-service,packages/panes,packages/cli,packages/mcp, andpackages/mcp-v2.tanstack-db.sqlite.Bug Fixes
PAGER,GIT_PAGER,GH_PAGER, andLESSfrom app-owned git environments in bothapps/desktopandpackages/host-serviceto preventsimple-gitfailures.Written for commit e1a9776. Summary will update on new commits.
Summary by CodeRabbit
Bug Fixes
New Features
Tests
Documentation