Skip to content

fix(desktop): 拡張機能ホストのメモリリークを修正してセッションリセット問題を解消#123

Merged
MocA-Love merged 4 commits intomainfrom
fix/issue-118-extension-host-memory-leaks
Apr 8, 2026
Merged

fix(desktop): 拡張機能ホストのメモリリークを修正してセッションリセット問題を解消#123
MocA-Love merged 4 commits intomainfrom
fix/issue-118-extension-host-memory-leaks

Conversation

@MocA-Love
Copy link
Copy Markdown
Owner

@MocA-Love MocA-Love commented Apr 8, 2026

概要

時間経過後にcodex/Claude Code拡張機能のセッションが1からになる問題(Issue #118)を修正します。

PCがスリープしていないにもかかわらずセッションがリセットされる原因として、拡張機能ホストワーカーの複数のメモリリークが蓄積し、内部的に再起動が発生していることが判明しました。

修正内容

P1: stdout/stderrリスナーの解放漏れ

  • ハンドラを名前付き関数に変更し、ワーカーのexit時に off() で明示的に解除
  • removeAllListeners() は他のリスナーも消えるため使用しない

P2: startup用メッセージリスナーの残留

  • ready/error/タイムアウトで解決した後に startup 用の onMessage リスナーを off() で明示解除
  • 常時受信する本流の handleWorkerMessage リスナーはそのまま残す
  • settled フラグで二重resolveも防止

P3: htmlStoreのクリア漏れ

  • clearTrackedWebviewsForWorkspace() ヘルパーを追加し、ワーカーexit時に関連する全viewIdのHTMLを一括削除
  • disposeWebview tRPCミューテーションでも clearWebviewHtml() を呼ぶように修正

P5: タイマークリアの不確実性

  • HTML待機の setIntervalsetTimeout のクリアを finish() 関数に集約
  • settled フラグで二重resolveを防止

レビュー指摘対応: cleanupWorkerへの統合

  • exit/error 両ハンドラで共通の cleanupWorker(intentional) 関数を呼ぶよう統合
  • instance.process !== child ガードにより errorexit 連続発火時の二重実行を防止
  • error 時にも scheduleRestart が呼ばれるようになり、spawn失敗時にプロセスが復旧しない問題を解消

変更ファイル

  • apps/desktop/src/main/lib/vscode-shim/extension-host-manager.ts
  • apps/desktop/src/main/extension-host-worker/index.ts
  • apps/desktop/src/lib/trpc/routers/vscode-extensions/index.ts

Closes #118

- Name stdout/stderr data handlers and remove them on worker exit to
  prevent listener accumulation across restarts (P1)
- Explicitly off() the startup message listener after ready/error/timeout
  so it does not persist alongside the permanent handleWorkerMessage
  listener (P2)
- Add clearTrackedWebviewsForWorkspace() helper; call it on worker exit
  to purge htmlStore entries for the workspace (P3)
- Add clearWebviewHtml() call in disposeWebview tRPC mutation (P3)
- Consolidate HTML-wait timer cleanup into a single finish() function
  with a settled guard to prevent double-resolve (P5)

Closes #118
@MocA-Love MocA-Love merged commit 56f7900 into main Apr 8, 2026
MocA-Love pushed a commit that referenced this pull request Apr 13, 2026
…#3356)

* Gaps

* Link support

* feat(host-service,desktop): add PR URL parsing and cross-repo validation to V2 workspace modal

Move GitHub PR URL detection, `#123` shorthand normalization, and
cross-repo validation into the host-service `searchPullRequests`
endpoint so the V2 client stays thin. The client sends raw user input
and reacts to a `repoMismatch` field in the response. Also adds
debounce gap handling to avoid empty-state flash while typing.

* fix(host-service): use direct PR lookup for URL paste and #123 shorthand

When query is a PR number (from URL parsing or `#` shorthand), use
`octokit.pulls.get()` instead of `in:title` text search. The text
search only matched if the number appeared in the PR title.

* fix(host-service): direct PR lookup for numbers, broader text search, and tests

- Extract normalizePullRequestQuery into its own module for testability
- Use octokit.pulls.get() for bare numbers (e.g. "3130") not just #shorthand
- Remove `in:title` from text search so it also matches PR body
- Add 36 test cases covering: URL tabs (/files, /changes, /commits,
  /checks), query params, hash fragments, www prefix, http, cross-repo
  mismatch, #shorthand, bare numbers, non-PR URLs, GitHub Enterprise

* feat(host-service,desktop): unify GitHub query normalization for PRs and issues

Generalize normalizePullRequestQuery into normalizeGitHubQuery with a
`kind` parameter ("pull" | "issue"). Single regex matches both
/pull/:number and /issues/:number URLs.

- Wire into searchGitHubIssues: URL paste, #N shorthand, bare number
  direct lookup via octokit.issues.get(), cross-repo validation
- Filter out PRs from issues.get() (GitHub returns PRs as issues)
- Remove redundant `in:title,body` from issue text search
- Update GitHubIssueLinkCommand client: debounce gap handling,
  repoMismatch display
- Cross-entity fallback: wrong URL kind falls through to text search
- 50 tests covering PRs, issues, cross-entity, and edge cases

* fix: address PR review feedback (items 1-4)

1. Delay ctx.github() until after repoMismatch short-circuit in both
   searchPullRequests and searchGitHubIssues — avoids auth errors when
   the query is a cross-repo URL
2. Lowercase urlPath before comparing entity kind — fixes case-sensitive
   mismatch when regex captures "PULL" or "Issues" from uppercase URLs
3. Use isFetching instead of isLoading from useQuery in both client
   components — correctly reflects background refetch state
4. Use debouncedTrimmed for issue list heading instead of raw searchQuery
   — prevents "Results" label on whitespace-only input

* fix(desktop): match V1 issue search UX with client-side Fuse.js filtering

V2 was sending every keystroke to the GitHub search API which was slow
and couldn't match issue numbers reliably. V1 pre-fetches all open
issues once and does instant client-side fuzzy search with Fuse.js
(issue number weighted 3x, title weighted 2x).

Now V2 does the same: pre-fetch all open issues on popover open, Fuse.js
for text/number filtering, and only hits the server for URL paste and
#N shorthand (which need cross-repo validation and direct lookup).

* Revert "fix(desktop): match V1 issue search UX with client-side Fuse.js filtering"

This reverts commit 86c6151.

* fix(host-service): use search API for issue listing to avoid PR contamination

The no-query path used octokit.issues.listForRepo() which returns PRs
mixed with issues. With per_page: 30, most slots were consumed by PRs
that got filtered out client-side, leaving very few actual issues.

Switch to octokit.search.issuesAndPullRequests() with `is:issue is:open`
so GitHub filters server-side and the full page is real issues.

* refactor(host-service): collapse duplicate issue search paths into one query

Both the text search and no-query paths were doing the same search API
call with `is:issue is:open`. Merged into a single path that appends
the effective query when present.

* refactor(host-service): single search path for PRs, drop is:open from issues

Collapse PR text search + no-query list into one search API call (same
pattern as issues). Drop `is:open` from issue search so closed issues
are findable — useful when linking context for workspace creation.
Both endpoints now use one query: `repo:owner/name is:<type> <query>`.

* fix(desktop): align PR and issue link command labels and limits

- Drop stale "open" from empty states — search is no longer open-only
- Issue limit 20 → 30 to match PRs
- Issue heading shows result count like PRs
- Both default headings say "Recent" instead of implying open-only
- Consistent "Loading..." text

* Lint
MocA-Love pushed a commit that referenced this pull request Apr 13, 2026
…#3356)

* Gaps

* Link support

* feat(host-service,desktop): add PR URL parsing and cross-repo validation to V2 workspace modal

Move GitHub PR URL detection, `#123` shorthand normalization, and
cross-repo validation into the host-service `searchPullRequests`
endpoint so the V2 client stays thin. The client sends raw user input
and reacts to a `repoMismatch` field in the response. Also adds
debounce gap handling to avoid empty-state flash while typing.

* fix(host-service): use direct PR lookup for URL paste and #123 shorthand

When query is a PR number (from URL parsing or `#` shorthand), use
`octokit.pulls.get()` instead of `in:title` text search. The text
search only matched if the number appeared in the PR title.

* fix(host-service): direct PR lookup for numbers, broader text search, and tests

- Extract normalizePullRequestQuery into its own module for testability
- Use octokit.pulls.get() for bare numbers (e.g. "3130") not just #shorthand
- Remove `in:title` from text search so it also matches PR body
- Add 36 test cases covering: URL tabs (/files, /changes, /commits,
  /checks), query params, hash fragments, www prefix, http, cross-repo
  mismatch, #shorthand, bare numbers, non-PR URLs, GitHub Enterprise

* feat(host-service,desktop): unify GitHub query normalization for PRs and issues

Generalize normalizePullRequestQuery into normalizeGitHubQuery with a
`kind` parameter ("pull" | "issue"). Single regex matches both
/pull/:number and /issues/:number URLs.

- Wire into searchGitHubIssues: URL paste, #N shorthand, bare number
  direct lookup via octokit.issues.get(), cross-repo validation
- Filter out PRs from issues.get() (GitHub returns PRs as issues)
- Remove redundant `in:title,body` from issue text search
- Update GitHubIssueLinkCommand client: debounce gap handling,
  repoMismatch display
- Cross-entity fallback: wrong URL kind falls through to text search
- 50 tests covering PRs, issues, cross-entity, and edge cases

* fix: address PR review feedback (items 1-4)

1. Delay ctx.github() until after repoMismatch short-circuit in both
   searchPullRequests and searchGitHubIssues — avoids auth errors when
   the query is a cross-repo URL
2. Lowercase urlPath before comparing entity kind — fixes case-sensitive
   mismatch when regex captures "PULL" or "Issues" from uppercase URLs
3. Use isFetching instead of isLoading from useQuery in both client
   components — correctly reflects background refetch state
4. Use debouncedTrimmed for issue list heading instead of raw searchQuery
   — prevents "Results" label on whitespace-only input

* fix(desktop): match V1 issue search UX with client-side Fuse.js filtering

V2 was sending every keystroke to the GitHub search API which was slow
and couldn't match issue numbers reliably. V1 pre-fetches all open
issues once and does instant client-side fuzzy search with Fuse.js
(issue number weighted 3x, title weighted 2x).

Now V2 does the same: pre-fetch all open issues on popover open, Fuse.js
for text/number filtering, and only hits the server for URL paste and
#N shorthand (which need cross-repo validation and direct lookup).

* Revert "fix(desktop): match V1 issue search UX with client-side Fuse.js filtering"

This reverts commit 86c6151.

* fix(host-service): use search API for issue listing to avoid PR contamination

The no-query path used octokit.issues.listForRepo() which returns PRs
mixed with issues. With per_page: 30, most slots were consumed by PRs
that got filtered out client-side, leaving very few actual issues.

Switch to octokit.search.issuesAndPullRequests() with `is:issue is:open`
so GitHub filters server-side and the full page is real issues.

* refactor(host-service): collapse duplicate issue search paths into one query

Both the text search and no-query paths were doing the same search API
call with `is:issue is:open`. Merged into a single path that appends
the effective query when present.

* refactor(host-service): single search path for PRs, drop is:open from issues

Collapse PR text search + no-query list into one search API call (same
pattern as issues). Drop `is:open` from issue search so closed issues
are findable — useful when linking context for workspace creation.
Both endpoints now use one query: `repo:owner/name is:<type> <query>`.

* fix(desktop): align PR and issue link command labels and limits

- Drop stale "open" from empty states — search is no longer open-only
- Issue limit 20 → 30 to match PRs
- Issue heading shows result count like PRs
- Both default headings say "Recent" instead of implying open-only
- Consistent "Loading..." text

* Lint
MocA-Love added a commit that referenced this pull request Apr 15, 2026
…oModal

Two requests from the review:

1. **Reusable system prompt presets** that users can attach to new
   TODOs at creation time, managed from a Settings row at the bottom
   of the Agent Manager's left sidebar.
2. **TodoModal is too text-heavy** — simplified the copy so the form
   matches the visual density of the rest of the app.

Schema + migration
------------------

- `packages/local-db/src/schema/todo-prompt-presets.ts` (new): new
  `todo_prompt_presets` table (id, name, content, createdAt,
  updatedAt) with `name` and `updatedAt` indexes.
- `packages/local-db/src/schema/todo-sessions.ts`: new nullable
  `custom_system_prompt` column. Selected preset content is copied
  into this column at session create time so later preset edits do
  not retroactively change a session that has already run.
- `packages/local-db/src/schema/index.ts` + `schema.ts`: re-export
  the new table so drizzle-kit picks it up via the existing root.
- `packages/local-db/drizzle/0054_todo_prompt_presets.sql`:
  auto-generated migration (CREATE TABLE + ALTER TABLE ADD COLUMN).

Backend
-------

- `apps/desktop/src/main/todo-agent/types.ts`:
  - `todoCreateInputSchema` gains optional `customSystemPrompt`
    (trimmed, max 20k, empty→undefined).
  - `todoPresetCreateInputSchema` and `todoPresetUpdateInputSchema`
    new zod shapes for the CRUD endpoints.
- `apps/desktop/src/main/todo-agent/supervisor.ts`:
  - `runClaudeTurn` params gain `customSystemPrompt: string | null`.
  - When present it is threaded into the spawned claude args as
    `--append-system-prompt <content>`. This composes with the
    iteration prompt + `--resume` so every turn in the session
    inherits the steering without re-injecting it in every prompt.
  - The per-turn call site in `runSession` reads the session row
    at turn boundary and passes `currentSession.customSystemPrompt
    ?? null`.
- `apps/desktop/src/main/todo-agent/trpc-router.ts`:
  - `create` now persists `input.customSystemPrompt ?? null` on
    the new DB column.
  - `rerun` now copies `source.customSystemPrompt` into the clone
    so re-running preserves the steering.
  - New nested `todoAgent.presets` router with:
    * `list` query (orderBy updatedAt desc)
    * `create` mutation (inputs: name 1..120, content 1..20k)
    * `update` mutation (inputs: id + name + content)
    * `delete` mutation (inputs: id; returns ok boolean)
  - All mutations run against `localDb` via drizzle directly —
    presets are a tiny kv-ish table, no caching needed.

Renderer
--------

- `apps/desktop/src/renderer/features/todo-agent/TodoManager/
  PresetsDialog/PresetsDialog.tsx` (new):
  - Full `Dialog` at 960×80vh with a 2-column layout: list of
    presets on the left, edit form on the right.
  - "新規プリセット" button at the top of the sidebar resets the
    draft state and clears selection.
  - Selecting a row populates the draft; editing flips a
    `dirty` flag that gates the save button.
  - Save routes to `create` or `update` depending on whether the
    draft has an id; success toast on both paths.
  - Delete uses the inline "本当に削除 / キャンセル" confirm
    pattern already established in the SessionRow kebab menu.
- `apps/desktop/src/renderer/features/todo-agent/TodoManager/
  TodoManager.tsx`:
  - `presetsDialogOpen` state + mounted `<PresetsDialog>` as a
    sibling Dialog inside the existing outer Dialog so it stacks
    on top of the Manager the way `<TodoModal>` does.
  - Left sidebar gains a `shrink-0 border-t` footer row with a
    "設定 / プリセット" button using `HiMiniCog6Tooth`. Clicking
    it opens PresetsDialog. The row mirrors the compact ghost-
    link styling of the existing row controls.

TodoModal simplification + preset picker
----------------------------------------

- Removed the 5-line `DialogDescription` entirely. Users reached
  the feature through the button's tooltip; the modal body needs
  to carry only what is actionable.
- Title placeholder: "例: Issue #123 のログインリダイレクト問題を
  修正" → "例: Issue #123 を修正" (half the width, same intent).
- Replaced the two-line "new worktree" card with a single-row
  label that renders as a checkbox-styled button: "新しい
  worktree を作成して実行" with a sparkle icon on the right.
  Description text was the biggest offender; cut entirely. The
  disabled state still shows via the muted opacity treatment.
- Description placeholder: long sentence → "やってほしい作業を
  書く".
- Goal: "(任意)" moved into a compact `text-[10px]` suffix on
  the label; placeholder shortened to "完了条件(空欄可)".
  Textarea rows dropped from 3 to 2.
- Verify: same treatment. Placeholder → "例: bun test". Removed
  the two-line explanation block below it entirely.
- New "システムプロンプト (任意)" row hosts a `PresetPicker`
  trigger that renders the selected preset name + an inline
  clear (×) button when set. Dropdown shows the full preset
  list with name + first ~2 lines of content as a preview, plus
  a "選択を解除" footer row and a hint when no presets exist.
  Selected preset content is read at submit time and passed as
  `customSystemPrompt` to the create mutation.
- All form inputs gained `rounded-md` to match the rest of the
  app.

Verified
--------
- `bun run typecheck` in apps/desktop — clean.
MocA-Love pushed a commit that referenced this pull request Apr 16, 2026
…#3467)

* Create doc

* docs(desktop): finalize v2 launch context plan

Replace initial draft with V2-greenfield architecture: structured
AgentLaunchSpec (system/user/attachments ContentPart[]), per-agent
contextPromptTemplate on ResolvedAgentConfig, Uint8Array over IPC,
vendor-aligned (AI SDK v3, Anthropic cache_control, Continue.dev
contributor metadata). CLI agents keep disk + path-ref pattern;
chat agents get structured passthrough with Files API as phase 6.

* feat(desktop/context): add v2 launch context types and fixtures

Step 1 of the v2 launch-context composition plan. Defines the core
discriminated types (LaunchSource, ContextSection, LaunchContext,
AgentLaunchSpec, ContentPart with Uint8Array data) and the canonical
multi-source + prompt-only fixtures that the composer and
buildLaunchSpec tests will share.

No runtime code yet — types and fixtures only.

* feat(desktop/context): add composer with dedup, ordering, failure tolerance

Step 2 of the v2 launch-context composition plan.

buildLaunchContext parallel-resolves sources via a contributor registry,
dedups URL/id-kinds (attachments never dedup), preserves input order
within a kind, applies the default kind group order at the end,
tolerates per-source failures (populated on failures[]), and enforces a
10s per-contributor timeout.

taskSlug derivation: first internal-task section wins, falling back to
first github-issue. 12 tests pass.

* feat(desktop/context): add six contributors and default registry

Step 3 of the v2 launch-context composition plan. One contributor per
LaunchSource kind, each with Continue.dev-style metadata (displayName,
description, requiresQuery), its own co-located test file, and a
consistent 404 -> null (non-fatal) pattern for fetch-based kinds:

- userPrompt        -- trims, returns null on empty
- attachment        -- file or image ContentPart by mediaType
- agentInstructions -- system-scoped, cacheControl: ephemeral
- githubIssue       -- title + body markdown, meta.taskSlug from slug
- githubPr          -- title + branch + body markdown
- internalTask      -- title + description, meta.taskSlug

Also adds composer.integration.test.ts covering the real registry
end-to-end against the multi-source fixture. 41 tests green.

* feat(shared): generic renderPromptTemplate + context prompt variables

Step 4 of the v2 launch-context composition plan.

- Extract renderPromptTemplate(template, Record<string, string>) as the
  generic primitive; existing renderTaskPromptTemplate is now a shim
  (same API, same behavior — callers unchanged).
- Add AGENT_CONTEXT_PROMPT_VARIABLES (userPrompt, tasks, issues, prs,
  attachments, agentInstructions) + getSupportedContextPromptVariables +
  validateContextPromptTemplate.
- Ship default context templates for markdown (codex/cursor/custom) and
  Claude (XML-wrapped user-request) — both for system + user scopes.
- Collapse runs of 3+ newlines to 2 so empty variables produce clean
  output. Empty-string values substitute in (not treated as missing).

16 tests green; no consumer breakage.

* feat(agents): add contextPromptTemplate {system, user} to agent configs

Step 5 of the v2 launch-context composition plan.

Extends the agent config surface so V2 launches can render structured
context into per-agent system/user templates:

- packages/shared/agent-definition: required contextPromptTemplateSystem
  and contextPromptTemplateUser fields on BaseAgentDefinition;
  createTerminalAgentDefinition fills defaults with the markdown
  templates from step 4.
- packages/shared/builtin-terminal-agents: Claude terminal ships the
  Claude-XML defaults; other builtins inherit markdown defaults.
- packages/shared/agent-catalog: BUILTIN_CHAT_AGENT (Claude-based
  superset-chat) ships the Claude-XML defaults.
- packages/local-db/schema/zod: add both fields to AGENT_PRESET_FIELDS,
  agentPresetOverrideSchema, agentCustomDefinitionSchema (optional).
- apps/desktop/shared/utils/agent-settings: thread through
  TERMINAL_OVERRIDE_FIELDS, CHAT_OVERRIDE_FIELDS, AgentPresetPatch,
  CustomAgentDefinitionPatch, resolveAgentConfig (both branches),
  applyCustomAgentDefinitionPatch, createOverrideEnvelopeWithPatch.
- apps/desktop/test-setup: update the mocked @superset/local-db schema
  (the Bun test workaround for drizzle-orm/sqlite-core) so tests see
  the same shape as runtime.

New tests: contextPromptTemplate resolution for claude terminal,
codex markdown defaults, superset-chat claude defaults, terminal and
chat override replacement, custom terminal fallback to markdown. 113
tests green across context + agent-settings suites.

* chore: biome format pass

* feat(desktop/context): user-prompt source takes ContentPart[] (multimodal)

Future-proofs the user-prompt LaunchSource for an eventual rich-editor
input (interleaved text, inline images, inline files). The rest of the
pipeline was already ContentPart[]-native; this removes the last narrow
string-only call site.

- LaunchSource["user-prompt"]: { text: string } -> { content: ContentPart[] }
- userPromptContributor: normalizes (drops empty text parts, trims bookend
  whitespace), returns null when nothing remains, passes file/image parts
  through untouched.
- Adds userPromptFromText(text) helper for plain-string callers so
  modal/cli/task flows don't repeat the [{ type: "text", text }] boilerplate.
- Three new tests: multimodal text+image+text preservation, whitespace-only
  content returns null, empty text parts dropped between non-empty ones.

* refactor(desktop/context): drop agent-instructions source — harnesses handle it

Agent harnesses (Claude CLI, Codex, Cursor Agent) discover their own
conventions files natively from the worktree — no injection needed from
our side. V1 confirms: zero references to AGENTS.md/CLAUDE.md as
injected context. Only the agent itself reads them.

Removing this also gets us closer to the "no Electron IPC in V2" rule —
the composer no longer needs to read files from disk.

- Drop {kind: "agent-instructions"} from LaunchSource union.
- Delete agentInstructionsContributor + its test.
- Remove readAgentInstructions from ResolveCtx; update the three
  contributor test stubs that referenced it.
- Drop "agent-instructions" from the composer KIND_ORDER and
  sourceIdentity switch.
- Drop the AGENTS.md sample from the multi-source fixture + the
  composer integration test.
- Remove "agentInstructions" from AGENT_CONTEXT_PROMPT_VARIABLES.
- System default templates are now empty strings (chat agents get no
  system context yet; future host-service-backed path can fill later).

54 tests green across context + agent-settings; 16 green in shared.

* feat(desktop/context): add buildLaunchSpec (LaunchContext -> AgentLaunchSpec)

Step 6 of the v2 launch-context composition plan.

buildLaunchSpec(ctx, agentConfig):
- Returns null for "none" agent or missing config (V1-parity semantics).
- Pre-renders per-kind markdown sub-blocks ({{tasks}}/{{issues}}/{{prs}}/
  {{attachments}}) and a flattened {{userPrompt}} text variable from the
  LaunchContext sections.
- Fills the agent's contextPromptTemplate{System,User} (from step 5)
  into ContentPart[] arrays.
- Collects non-text parts (attachment-kind files/images + inline
  non-text parts from user-prompt) into the structured attachments[]
  field — chat agents see them as proper content parts, terminal
  adapters will flatten to disk refs in step 7.

Also fixes the multi-source fixture to match what contributors actually
emit (`# Title\n\nBody` markdown bodies) so the new snapshot tests
exercise a realistic LaunchContext shape.

15 tests green (2 inline snapshots for Claude-XML + codex-markdown
rendering of the canonical multi-source fixture). 69 tests green total
across context + agent-settings.

* Lint

* refactor(agents): drop Claude XML default template — markdown is enough

V1 never wrapped prompts in XML; V1 has shipped with bare markdown/text
forever. Shipping an XML-only Claude default was speculative and added a
per-agent divergence without evidence.

- Remove DEFAULT_CLAUDE_CONTEXT_PROMPT_TEMPLATE_SYSTEM / _USER.
- Claude terminal + superset-chat ship the default markdown templates
  (same as codex/cursor/custom).
- Users can still override per-agent in settings if XML wrapping helps
  their use case — defaults stay neutral.

Also ships scripts/demo-launch-spec.ts for local template iteration:
run `bun run scripts/demo-launch-spec.ts [agent...]` to eyeball what
buildLaunchSpec produces for canonical inputs.

66 tests green across context + agent-settings; 14 green in shared.

* feat(desktop/context): preserve inline order for rich-editor user prompts

buildLaunchSpec used to flatten the user-prompt section's ContentPart[]
to a single text blob. That lost inline ordering when a rich editor
produces text + image + text — the image landed in spec.attachments
disconnected from its position.

Now:
- Split the agent's user template on {{userPrompt}}; render each half's
  other placeholders raw (no trim / no newline collapse) so whitespace
  around the placeholder is preserved.
- Splice the user-prompt section's ContentPart[] in at the split
  position, keeping [text, image, text] ordering intact.
- Merge adjacent text parts, then collapse 3+ newlines to 2 and trim
  document boundaries in a final pass.
- spec.attachments now carries only *explicit* attachment-kind sections;
  inline non-text parts from user-prompt stay inline in spec.user.
- Inline parts still appear in the {{attachments}} list so CLI agents
  reading just the flattened text get a file-path reference.

Chat agents: hand spec.user straight to the Anthropic/AI SDK user
message content[] — model sees the image between the text chunks.
Terminal adapters (step 7) will flatten file/image parts to
`![filename](.superset/attachments/...)` markdown refs at their inline
position, then write files to disk.

Demo script gets two new scenarios exercising the rich-editor flow:
text+image+text alone, and the same with a linked issue. 67 tests
green across context + agent-settings.

* feat(desktop/context): add buildAgentLaunchRequest — V2 spec to V1 launch bridge

Step 7 (scoped): hand V2's AgentLaunchSpec off to V1's battle-tested
terminal-adapter / chat-adapter without building new execution
infrastructure. Bytes-over-IPC / SuperJSON transformer work is deferred
to a follow-up.

buildAgentLaunchRequest(spec, agentConfig, { workspaceId, source }):
- Returns null for agentId "none" or disabled agents (V1 parity).
- Assigns collision-safe filenames across all binary parts (inline in
  spec.user + explicit spec.attachments). Uses the same sanitize +
  dedup algorithm V1 already uses so nothing drifts.
- Flattens spec.user to markdown text with file/image parts rendered
  as `![filename](.superset/attachments/filename)` at their inline
  positions — rich-editor ordering survives into the CLI prompt.
- Converts Uint8Array binary data to base64 data URLs at this boundary
  (V1 wire format). Internal plumbing stays on Uint8Array.
- Chat: initialPrompt = flattened text, taskSlug/model flow through.
- Terminal: command = buildPromptCommandFromAgentConfig(flattened text),
  or the non-prompt command when the prompt is empty.

10 new tests cover: "none" short-circuit, terminal command rendering,
chat initialPrompt/taskSlug, inline image path-ref correctness, explicit
attachment filename preservation, filename dedup across user + attachments,
base64 data-URL format, workspaceId/source passthrough.

77 tests green across context + agent-settings.

* feat(desktop): add useEnqueueAgentLaunch hook

Step 8 of the v2 launch-context composition plan.

Thin wrapper around V1's useWorkspaceInitStore.addPendingTerminalSetup
for the V2 submit flow. Called after host-service.workspaceCreation
resolves the real workspaceId. V1's terminal-adapter / chat-adapter
pick up the pending setup when the workspace mounts and execute the
launch — no new adapter code needed.

- buildPendingSetup(args) — pure function, rewrites launchRequest
  workspaceId to the real id and assembles the PendingTerminalSetup.
- useEnqueueAgentLaunch() — React hook wrapper that calls into the
  zustand action.
- Null launchRequest is a no-op (nothing to enqueue, e.g. agent "none").

Tests cover: null short-circuit, workspaceId rewrite, projectId
passthrough, initialCommands shape, non-workspaceId field preservation.
5 tests green; typecheck clean.

* feat(desktop/v2): wire v2 workspace launch into pending page

Step 9 of the v2 launch-context composition plan. Closes Gaps 4, 5 in
V2_WORKSPACE_MODAL_GAPS.md for the "fork" intent (plain prompt + local
host). Gaps 3 (AI branch name) and 6 (create-from-PR) remain as
follow-ups.

What runs now:
- Modal submit → pending row → pending page fires createWorkspace.
- On success, buildForkAgentLaunch runs the V2 pipeline:
    sources <- pending row (user-prompt, linked issues/PRs/tasks, attachments)
    buildLaunchContext → buildLaunchSpec → buildAgentLaunchRequest
- useEnqueueAgentLaunch stashes the V1-shaped AgentLaunchRequest in
  useWorkspaceInitStore. V1's terminal-adapter / chat-adapter pick it
  up when the workspace mounts and execute the launch — no new adapter
  code needed.

New file buildForkAgentLaunch.ts is a pure helper: builds sources from
a PendingWorkspaceRow, stubs ResolveCtx from the same row's metadata,
runs the pipeline, returns an AgentLaunchRequest or null.

Phase 1 gap: issue / PR / task bodies are not fetched over HTTP yet —
host-service lacks a body endpoint. The resolver returns empty bodies,
so agents see title + URL + task-slug metadata only. Full-body
injection is a follow-up once host-service grows getIssueContent /
getPullRequestContent / getInternalTaskContent.

13 new tests cover: empty sources → null, no-agent → null, prompt-only
terminal launch via default agent, taskSlug derivation, attachment
passthrough, source-kind ordering. 88 tests green across pending +
context + agent-settings suites.

* Lint

* docs(desktop): add v2 launch context reference doc

Post-phase-1 reference: what shipped, manual + automated test plan,
known gaps, prioritized follow-ups, and a file-layout map. Lives in
apps/desktop/docs/ per AGENTS.md rule 7 (architecture docs). The
original plan stays in plans/ since phases 2-6 are still unshipped.

* chore(debug): add [v2-launch] console logs across the launch pipeline

Temporary logs for manual testing:
- pending page: what buildForkAgentLaunch returned + enqueue inputs.
- useEnqueueAgentLaunch: stash / null-short-circuit.
- WorkspaceInitEffects: every handleTerminalSetup + dispatch branch,
  launchAgentViaOrchestrator invocation.

Grep devtools console on "[v2-launch]" to trace a full submit.
Remove or soften once the dispatch path is dialed in.

* docs(desktop): document pending-row-as-bus launch dispatch

V2 must own its own launch dispatch. V1's WorkspaceInitEffects →
orchestrator → terminal-adapter path writes panes into V1's useTabsStore,
which V2 doesn't render from, so launches dispatched through V1 land
invisibly for V2 workspaces.

Documents the replacement: pending-row-as-bus. Pending page produces
terminalLaunch / chatLaunch fields on the collection-backed pending row;
V2 workspace page mount-effect consumes them, opens a pane in the
@superset/panes store, and wires PTY via workspaceTrpc.

This mirrors the pattern V2 preset execution already uses
(useV2PresetExecution): live-query a record, open a pane, call
workspaceTrpc.terminal.ensureSession. Zero V1 primitives, zero new
host-service work, and leaves a clean migration path to host-owned
terminal launch when phase 5 ships.

Adds a blocking follow-up (#0) for the dispatch rewrite; marks
useEnqueueAgentLaunch + buildAgentLaunchRequest for removal.

* feat(desktop/v2): rewrite launch dispatch as pending-row-as-bus

The original step-8/9 wire-up stashed an AgentLaunchRequest in V1's
useWorkspaceInitStore, expecting V1's WorkspaceInitEffects to dispatch.
V1's orchestrator writes panes into useTabsStore — which V2 never
renders from — so launches landed invisibly for V2 workspaces.

This rewrite keeps V2 self-contained. After host-service.create
resolves, the pending page runs the composer pipeline and stashes a
terminalLaunch or chatLaunch on the pending row. The V2 workspace
page's new useConsumePendingLaunch mount-effect live-queries that row,
opens a pane in @superset/panes, and drives PTY via workspaceTrpc.
Same pattern as useV2PresetExecution.

Changes:
- Schema: pendingWorkspaceSchema gains optional terminalLaunch and
  chatLaunch fields, cleared to null once consumed.
- buildForkAgentLaunch returns a PendingLaunchBuild union (terminal
  with attachmentsToWrite / chat with inline initialFiles) instead of
  the V1 AgentLaunchRequest shape.
- dispatchForkLaunch: new pending-page helper that runs the composer,
  writes attachments to .superset/attachments/ via workspaceTrpc
  .filesystem.writeFile for the terminal path, and applies the launch
  field to the pending row.
- useConsumePendingLaunch: new V2-workspace-page mount effect. Reads
  row by workspaceId, opens pane in V2 store, calls workspaceTrpc
  .terminal.ensureSession with initialCommand for terminal launches,
  clears the field.
- ChatPaneData gains a transient launchConfig slot. ChatPane and
  WorkspaceChatInterface thread initialLaunchConfig +
  onConsumeLaunchConfig through. After the V2 chat runtime auto-sends
  the initial message, it clears the pane's launchConfig.
- Rip out useEnqueueAgentLaunch hook, buildAgentLaunchRequest, and
  the debug logs in WorkspaceInitEffects.

23 tests green for buildForkAgentLaunch / buildLaunchSourcesFromPending;
type-check clean in the touched surface area.

See apps/desktop/docs/V2_LAUNCH_CONTEXT.md "Dispatch architecture".

* docs(desktop): add V2_LAUNCH_TEST_PLAN.md

Structured manual test checklist for the V2 launch dispatch pipeline:
terminal + chat happy paths, pending-row lifecycle, failure paths,
source-mapping edge cases, custom agents, cross-pane behavior, V1
regression.

Paired with copy-pasteable fixtures on ~/Desktop/v2-launch-test-artifacts/
(trace.log, notes.md, sample.png, prompts.txt, README) for drag-and-
drop testing.

* chore(debug): add url probe + submit logs for v2 attachment flow

Logs the blob/data URLs we get from the PromptInput provider at
submit time, then does a fetch() probe on each URL before
storeAttachments runs. Lets us see whether the URL is already dead
when useSubmitWorkspace fires — which would confirm a pre-submit
revocation (as opposed to a race inside storeAttachments itself).

Not a fix. Remove once the root cause is nailed down.

* fix(desktop/v2): pass converted files through PromptInput onSubmit

Root cause of the "Failed to fetch" attachment toast: the
PromptInput library calls clearComposer() before invoking onSubmit,
which revokes all blob: URLs stored in the provider. Our
useSubmitWorkspace was reading attachments back from the provider
via takeFiles() after that — so it got file entries whose URLs had
just been invalidated.

The library already does the blob→data-URL conversion itself and
passes the converted files into onSubmit's message arg. Use them
directly:

- useSubmitWorkspace now takes `files: SubmitAttachment[]` as an
  explicit argument. Drops the `useProviderAttachments()` dependency.
- handlePromptSubmit receives `{text, files}` from PromptInput and
  forwards the files.
- The existing Cmd+Enter keyboard fallback calls handleCreate()
  without files (unchanged behavior for the no-attachments path; the
  PromptInput's own Enter handler takes the file-carrying path).

* refactor(desktop): use dexie for the pending-attachment store

The prior hand-rolled IDB wrapper had two transaction-lifecycle bugs:

1. storeAttachments opened a readwrite transaction, then awaited
   fetch() on each file before calling store.put() — IDB auto-commits
   when the event loop yields with no pending requests, so the first
   put() fired against a finished transaction ("The transaction has
   finished.").
2. The same file (150+ lines of raw IDB callback plumbing) is exactly
   the shape of code where this class of bug keeps reappearing as
   the flow evolves.

Swap to Dexie 4 — the de-facto IndexedDB wrapper for apps (~11.9k⭐,
actively maintained, typed, handles transaction lifecycle correctly).

- storeAttachments: resolve blobs async outside any tx, then
  bulkPut() in one shot.
- loadAttachments / clearAttachments: where("key").startsWith(prefix).
- File collapses from ~150 to ~90 lines, no raw transactions, no
  cursor dance.

Behavior is identical from the caller's side. Schema version 1;
Dexie will open the existing database transparently (same DB name).

* chore(debug): add verbose [v2-launch] logs to dispatch + consume paths

Traces:
- dispatchForkLaunch start / built / chatLaunch-applied /
  terminalLaunch-applied
- useConsumePendingLaunch tick (live-query fires) + whether
  terminalKey / chatKey are already consumed
- consumeTerminalLaunch ensureSession + addTab + clear
- consumeChatLaunch addTab + clear

Grep devtools on "[v2-launch]" through the full submit -> open-workspace
flow. Lets us pin where dispatch stalls when no pane appears.

Temporary — remove once the end-to-end flow is nailed down.

* fix(desktop/v2): replace Buffer with browser-native base64 in renderer

Electron renderer doesn't expose Node's `Buffer` global (nodeIntegration
off). The fork-launch dispatch path and buildForkAgentLaunch were both
using `Buffer.from(...).toString("base64")` / `Buffer.from(base64, "base64")`
for binary <-> base64 conversion, which ReferenceError'd at runtime.

Swap to standards-based `btoa` / `atob` + a small byte <-> binary-string
helper. Works in renderer and Bun alike.

Applies to:
- dataUrlAttachmentToBytes (buildForkAgentLaunch.ts) — decode
  attachment data URL into Uint8Array.
- toBase64DataUrl (buildForkAgentLaunch.ts) — encode chat-bound files
  for ChatLaunchConfig.initialFiles.
- writeAttachmentsToWorktree (dispatchForkLaunch.ts) — encode bytes
  for host-service filesystem.writeFile's base64 content variant.

* docs(desktop): capture v2-launch footgun backlog

Seven items we caught during manual testing and intentionally deferred:

1. Deep solve for binary transport (blob URL / base64 fragility)
2. Reload-mid-launch spawns duplicate PTY (key terminalId off pending row)
3. Silent failure in consume hook — add toast
4. joinPath assumes POSIX — breaks for Windows hosts (phase 5)
5. Dexie schema coupling with pre-existing IDB store
6. PendingTerminalLaunch.attachmentNames unused by consumer
7. Remove [v2-launch] debug logs once flow is stable

Tracked in V2_LAUNCH_CONTEXT.md "Known footguns to revisit". None
are blocking phase-1 behavior; all have notes on the proper fix.

* feat(desktop/v2): toast on silent launch-dispatch failures

Seven silent swallow points across the launch path now surface a
toast so the user knows why the agent didn't auto-launch instead of
seeing "nothing happened":

- dispatchForkLaunch: buildForkAgentLaunch throw -> "Couldn't prepare
  agent launch" (description = error message).
- dispatchForkLaunch: buildForkAgentLaunch returned null AND user
  gave meaningful input -> warning "Workspace created but no agent
  launched" with hint to enable one in settings. Silent for the
  "fresh empty workspace, no agent configured" case (expected).
- dispatchForkLaunch: host-service URL not resolved -> "Couldn't
  reach host service".
- dispatchForkLaunch: writeAttachmentsToWorktree throw -> warning
  "Attachments didn't save to the workspace; agent will launch
  without files".
- writeAttachmentsToWorktree: missing worktreePath -> throw instead
  of silent return so the outer catch's toast fires.
- consumeTerminalLaunch: defensive bail -> "Couldn't open agent
  pane" (shouldn't happen, but defensive).
- consumeTerminalLaunch: ensureSession throw -> "Couldn't start
  agent terminal" with error message.
- pending page: loadAttachments throw in fork intent -> warning
  "Couldn't load saved attachments" (non-fatal, workspace still
  creates).

All keep their [v2-launch] console.warn/log so trace survives alongside
the toast.

* lint

* fix(desktop): address PR review — real issues only

Addresses the non-stale, non-debatable feedback from review bots:

- Prototype-chain substitution in prompt templates (agent-prompt-
  template.ts + buildLaunchSpec.ts): {{toString}} and similar now
  stay intact. Use Object.hasOwn() instead of `variables[key] ??`.
- renderTaskPromptTemplate no longer picks up generic 3+-newline
  collapsing — task-flow output matches V1 exactly: own-property
  substitution + trim only.
- buildLaunchSpec.renderUserTemplate tolerates whitespace in the
  placeholder: {{ userPrompt }} / {{userPrompt}} / {{  userPrompt  }}
  all match.
- Pending page's fork dispatch fetches agent configs imperatively
  via trpcUtils.settings.getAgentPresets.fetch() instead of reading
  from a useQuery hook — eliminates the race where a not-yet-
  resolved query silently skipped the dispatch and lost the launch
  for a successful workspace create.
- Drop ContextSection.scope field. It was never read (buildLaunchSpec
  ignored it); no contributor populated anything but "user" after we
  removed agent-instructions. Cleaner type + future re-introduction
  when a real system-scope consumer lands (phase 6 host-side
  instructions injection).

Tests: 54 context-suite passing, 14 shared-suite passing; desktop
typecheck clean in touched areas.

* docs(desktop): capture body-fetching gaps observed in manual test

Claude currently sees title-only for linked issues / PRs / tasks —
no bodies. Documents the gap, what V1 did (Electron IPC to
projects.getIssueContent), why we can't reuse it for V2 (no Electron
in V2 rule), and proposes the host-service procedures + stub swap.

Also covers:
- Empty `Branch:` in PR block — pending-row schema doesn't carry
  branch; fix via getPullRequestContent body fetch.
- Sanitization helpers to extract from V1 into a shared util.
- Attach-as-file vs inline-in-prompt decision (V1 attached,
  current V2 inlines — keeping inline for phase 1).

Ordered work plan at the bottom: getIssueContent first, then PR,
then internal-task (requires scoping). Acceptance criteria shows
the expected prompt shape after the fixes land.

* Update PR notes

* lint

* feat(desktop/v2): attachment framing + PR checkout hint + gap doc rewrite

Prompt refinements from manual testing:

- buildLaunchSpec {{attachments}} block now includes a short framing
  header: "Attached files — read them to understand the request."
  Cues the agent to actually use the files rather than treating them
  as passive metadata. Only appears when there are files.

- githubPr contributor says "Branch `X` is checked out in this
  workspace — commits you make continue this PR." Confirms to the
  agent that the worktree is on the PR's branch, so it shouldn't
  create a new branch or open a new PR.

- V2_LAUNCH_CONTEXT_GAPS.md rewritten with locked design decisions:
  bodies inline in prompt (no file writes for linked context), no
  truncation, no sanitization, PR checkout is true. Work plan:
  host-service getIssueContent → getPullRequestContent → task body
  API → swap stubs. Target prompt shape included.

54 tests green; 2 snapshots updated for new PR format.

* feat(desktop/context): explicit kind labels in contributor headers

Agents shouldn't guess whether a section is a task, issue, or PR from
context clues. Each contributor now prefixes its heading with the kind:

- `# GitHub Issue #123 — Auth middleware stores tokens in plaintext`
- `# Task TASK-42 — Refactor auth middleware`
- `# PR #200 — Rewrite auth middleware`

PR phrasing also clarified: "This PR is checked out in this workspace
on branch `fix/auth-encryption`. Commits you make here will be added
to this PR."

54 tests green; 2 snapshots updated.

* feat(desktop/v2): fetch issue + PR bodies via host-service, task via cloud API

The launch prompt now includes full bodies for linked GitHub issues,
PRs, and internal tasks instead of title-only stubs.

Host-service (packages/host-service):
- getGitHubPullRequestContent: new procedure wrapping octokit.pulls.get.
  Returns body, branch, baseBranch, author, isDraft, timestamps.
  (getGitHubIssueContent already existed.)

Client (apps/desktop pending page):
- buildForkAgentLaunch accepts an optional hostServiceClient. When
  provided, the issue + PR resolvers call getGitHubIssueContent /
  getGitHubPullRequestContent for full bodies. Falls back to
  pending-row title-only if the call fails (non-fatal).
- Task resolver calls apiTrpcClient.task.byId (Superset cloud API,
  same source as the task view) for description. Falls back to
  title-only on failure.
- dispatchForkLaunch threads the host-service client through.

Contributors (already landed earlier this session):
- GitHub issue header: `# GitHub Issue #N — Title`
- PR header: `# PR #N — Title` + "This PR is checked out in this
  workspace on branch `X`. Commits you make here will be added to
  this PR."
- Task header: `# Task ID — Title`
- Attachments block: framing header cueing the agent to read the
  files.

77 tests green. Typecheck clean.

* chore(desktop): fix biome warning + stale doc comment in fork launch

- internalTask.test.ts: replace `TASK.description!` non-null
  assertion with `if (TASK.description)` guard (biome
  lint/style/noNonNullAssertion).
- buildForkAgentLaunch.ts: update stale docstring that claimed
  bodies aren't fetched yet — they are, via host-service and the
  cloud task API.

77 tests green, biome clean, typecheck clean.

* fix(host-service): shell out to gh CLI for issue/PR content (V1 parity)

host-service's octokit path needs a GitHub token from
providers.credentials.getToken("github.com") — which most users don't
have set up (requires GITHUB_TOKEN env or git credential helper config
for github.com). Result: getGitHubIssueContent / getGitHubPullRequestContent
silently 500'd, buildForkAgentLaunch fell back to title-only, and the
agent received empty bodies for linked issues/PRs.

V1's projects.getIssueContent shells out to `gh issue view` via the
user's `gh auth login` — that works out of the box.

Port the same approach:

- New packages/host-service/src/trpc/router/workspace-creation/utils/exec-gh.ts
  — promisified execFile("gh", ...) with user shell env so PATH
  resolves on macOS GUI contexts.
- getGitHubIssueContent now calls `gh issue view <n> --repo owner/name
  --json number,title,body,url,state,author,createdAt,updatedAt`.
- getGitHubPullRequestContent calls `gh pr view <n> --repo owner/name
  --json number,title,body,url,state,author,headRefName,baseRefName,isDraft,...`.
- Zod-validate the JSON output before returning.
- Normalize state to lowercase (gh returns "OPEN"/"CLOSED" uppercase).

Drops the Octokit dependency on these two procedures. Other host-service
paths that still use ctx.github() unchanged.

* fix typecheck

* clean up
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant