Skip to content

feat(providers): add GitHub Copilot community provider (builtIn: false)#1351

Open
popemkt wants to merge 15 commits intocoleam00:devfrom
popemkt:emdash/add-copilot-2er
Open

feat(providers): add GitHub Copilot community provider (builtIn: false)#1351
popemkt wants to merge 15 commits intocoleam00:devfrom
popemkt:emdash/add-copilot-2er

Conversation

@popemkt
Copy link
Copy Markdown

@popemkt popemkt commented Apr 22, 2026

Context — this is the follow-up @Wirasm requested when closing #1111.
When closing #1111, @Wirasm wrote: "Please feel free to readd this behind the community provider registry."
This PR does exactly that: Copilot as a community provider (builtIn: false), registered through the same seam merged for Pi in #1270 and extended in #1297.

Closes #1115.

Summary

  • This is tested with my own copilot subscription
  • Problem: GitHub Copilot shipped a first-party SDK (@github/copilot-sdk) in 2025 that drives the Copilot CLI through a supported JSON-RPC bridge. Archon had no way to use it — the only path to Copilot would have been screen-scraping the interactive TUI.
  • Why it matters: Copilot CLI gives users access to gpt-5, gpt-5-mini, and cross-family models (Claude via Copilot) under an existing GitHub billing relationship. This is the second community provider after Pi (feat(providers): add Pi community provider (@mariozechner/pi-coding-agent) #1270) and validates that the community-provider seam scales.
  • What changed: New packages/providers/src/community/copilot/ wrapping @github/copilot-sdk, registered via registerCopilotProvider() with builtIn: false. Seven phased commits plus one hermeticity fix on the resolver test: initial scaffold → tool restrictions → MCP → skills → structured output → hardening fixes from CodeRabbit/Devin → sub-agents → hermetic vendor-path test. Nine of the thirteen capability flags are wired end-to-end.
  • What did NOT change (scope boundary): No hooks, no fallback model, no sandbox, no promotion to builtIn: true. Those remain deliberate follow-ups (plan preserved at .claude/archon/plans/copilot-provider-parity.md). Claude and Codex code paths untouched. Pi provider's behavior is preserved byte-for-byte via re-exports.

UX Journey

Before

  User                            Archon                     AI Client
  ────                            ──────                     ─────────
  workflow.yaml:                  load workflow
    provider: copilot ─────────▶  resolve provider
                                  ❌ UnknownProviderError
  load failure ◀───────────────

Copilot was reachable only by screen-scraping the interactive Copilot TUI — no supported path from a workflow.

After

  User                            Archon                     Copilot CLI
  ────                            ──────                     ───────────
  $ copilot login ──────────────────────────────────────▶   GitHub OAuth
  workflow.yaml:                  load workflow
    provider: copilot ─────────▶  registry resolves
    [+] effort: high              translate node options
    [+] allowed_tools: [...]      build SessionConfig
    [+] mcp: ./mcp.json           [+] expand $VARS
    [+] skills: [...]             [+] resolve dirs
                                  spawn via SDK ─────────▶  CopilotClient
                                  stream chunks ◀────────   sendAndWait
  sees reply ◀──────────────────  forward to platform
  resume conv ─────────────────▶  reuse sessionId ───────▶  resumes session

[+] marks fields newly accepted on workflow nodes when provider: copilot.

Architecture Diagram

Before

  @archon/workflows
        │
        │ IAgentProvider
        ▼
  @archon/providers ──┬── claude/                  ─── @anthropic-ai/claude-agent-sdk
  (registry)          ├── codex/                   ─── @openai/codex-sdk
                      └── community/pi/            === @mariozechner/pi-coding-agent

  registerCommunityProviders() — pi (builtIn: false)

After

  @archon/workflows
        │
        │ IAgentProvider  (unchanged contract)
        ▼
  @archon/providers ──┬── claude/                  ─── @anthropic-ai/claude-agent-sdk
  (registry)          ├── codex/                   ─── @openai/codex-sdk
                      ├── community/pi/            === @mariozechner/pi-coding-agent
                      ├── [+] community/copilot/   === @github/copilot-sdk
                      └── [+] shared/              ─── skills.ts, structured-output.ts
                                                       (extracted from pi/, pi re-exports)

  registerCommunityProviders() — pi + [+] copilot (both builtIn: false)

Shared extractions (no behavior change to Pi):

Extracted to Source Back-compat
shared/skills.ts::resolveSkillDirectories pi/options-translator.ts::resolvePiSkills Pi re-exports under the original name; all existing Pi skill tests pass unchanged
shared/structured-output.ts::augmentPromptForJsonSchema pi/provider.ts Pi re-exports; no Pi test changes
shared/structured-output.ts::tryParseStructuredOutput pi/event-bridge.ts Pi re-exports; existing Pi JSON-parse tests pass unchanged

MCP reuse (no extraction): Copilot imports Claude's existing public loadMcpConfig re-export directly. Env-var expansion and missing-var detection behave identically across providers.

Capability matrix

Field Copilot Notes
sessionResume Returns sessionId, reused on resume
envInjection Codebase env vars merged into spawned CLI environment
effortControl effort: low|medium|high|maxreasoningEffort
thinkingControl String form of thinking: accepted; object form warns (Claude-specific)
toolRestrictions allowed_toolsavailableTools, denied_toolsexcludedTools
mcp nodeConfig.mcp JSON → SessionConfig.mcpServers, $VAR expanded
skills skills: [name] → absolute dirs containing SKILL.md, missing names warn
structuredOutput Best-effort prompt-engineering (same pattern as Pi #1297)
agents nodeConfig.agentscustomAgents; Claude-specific fields warn per agent
hooks Copilot's SessionHooks event vocabulary diverges from Archon's hook schema; deferred
fallbackModel No native SDK field; one-shot retry would be adapter-level — deferred
costControl Not wired (no maxBudgetUsd enforcement)
sandbox Copilot's onPermissionRequest is policy control, not sandbox semantics

Label Snapshot

  • Risk: risk: low — purely additive new community provider behind builtIn: false; no existing code path changes behavior unless a workflow opts in via provider: copilot.
  • Size: size: L — ~1,128 lines added across 28 files (provider + 9 test files + docs + small shared extractions).
  • Scope: core (the new provider), dependencies (@github/copilot-sdk ^0.2.2), docs (getting-started AI-assistants page), tests (52 new tests).
  • Module: providers:community/copilot, providers:shared (new shared/skills + shared/structured-output), providers:registry (one new registration entry).

Change Metadata

  • Change type: feature
  • Primary scope: multi — primarily providers (new community provider package), with secondary touches to docs (getting-started page) and tests. No changes to core, workflows, isolation, git, adapters, server, web, cli, or paths aside from one config-loader line that adds a copilot key to the assistants config schema.

Linked Issue

Validation Evidence

bun run validate    # ← fully green end-to-end
  • Static analysis: bun run type-check — all 10 packages exit 0; bun run lint — 0 errors 0 warnings; bun run format:check — clean.
  • Tests: bun --filter @archon/providers test passes every per-file batch.
  • New Copilot tests (9 files, 52 tests): config parsing, binary resolver (with platform-correct .exe handling on win32), provider happy path + streaming + resume, tool restrictions, MCP translation, skills translation, structured output, agents translation, PR-review hardening. Plus 6 new tests in registry.test.ts for Copilot registration + capability flags + isModelCompatible.
  • Real end-to-end smoke against live Copilot CLI (Windows, v1.0.31) — three workflows passed:
    • e2e-copilot-smoke (basic prompt → COPILOT_OK) — 13.0s
    • effort: high reasoning (17×23 → 391) — 14.1s
    • denied_tools: [shell, write] passthrough — 13.7s

Security Impact

  • New permissions/capabilities? No — Copilot runs with the same process privileges as Claude / Codex / Pi.
  • New external network calls? Yes — the Copilot CLI calls GitHub's Copilot API. Equivalent surface to the other providers.
  • Secrets/tokens handling changed? Tokens read from the per-request env + process.env (COPILOT_GITHUB_TOKEN, GH_TOKEN, GITHUB_TOKEN) and passed to CopilotClient({ githubToken, ... }). Not persisted by Archon. useLoggedInUser: true is the default when no token env var is present.
  • File-system access scope changed? No — the Copilot CLI operates under cwd like other providers. enableConfigDiscovery: false is Archon's default, documented as a trust boundary in the docs.
  • Binary resolution hardening: isExecutableFile validates the resolved CLI path is a regular file with the exec bit set (posix), not a directory or non-executable.

Compatibility / Migration

  • Backward compatible? Yes — purely additive. Existing workflows untouched.
  • Config/env changes? Optional — users may add assistants.copilot.model to .archon/config.yaml and supply auth via copilot login or one of three env vars.
  • Database migration needed? No.
  • Shared-utility extractions are back-compat via re-exportspi/options-translator.ts, pi/provider.ts, and pi/event-bridge.ts re-export the extracted functions under their historical names.

Human Verification

What was personally validated beyond CI:

  • Verified scenarios (live Copilot CLI v1.0.31 on Windows, against my own Copilot subscription):
    • Basic prompt round-trip (e2e-copilot-smoke workflow) — full streaming response, sessionId returned.
    • Reasoning under effort: high — multi-digit arithmetic (17×23) returned 391; no truncation.
    • Tool restriction passthrough — denied_tools: [shell, write] honored, model declined to use blocked tools.
  • Edge cases checked:
    • Missing token env: useLoggedInUser: true fallback path engages cleanly.
    • Binary resolver precedence: env > config > vendor > PATH (with vendor winning over PATH in compiled-binary mode) — covered by hermetic test.
    • .exe suffix only applied on win32 — verified via os.platform() mock in binary-resolver.test.ts.
    • Structured-output fenced-code slip — fenced-block stripper recovers; total parse failure degrades to dag.structured_output_missing (not a hard error).
    • Pi back-compat — re-exports preserve every existing Pi test path; bun --filter @archon/providers test is green for the full Pi batch as well as the new Copilot batches.
  • What was not verified:
    • Live Copilot CLI on Linux or macOS — I only have a Windows Copilot subscription. The CLI is officially supported on all three; I'm relying on @github/copilot-sdk's cross-platform contract plus the os.platform()-aware tests.
    • Long-running session abandonment past the SDK's 24-hour ceiling — the dag-executor always supplies an AbortSignal, so this is the documented happy path; I did not artificially exhaust the ceiling.
    • Promotion to builtIn: true and the four deferred capabilities (hooks, fallback model, sandbox, costControl) — explicitly out of scope.

Side Effects / Blast Radius

  • Affected subsystems:
    • @archon/providers — new community/copilot/ directory (provider, config, binary resolver, capabilities, registration, 9 test files); new shared/ directory (skills + structured-output) extracted from Pi; one new registry entry; one new line in index.ts for the registration export.
    • @archon/providers Pi — three files trimmed (event-bridge.ts, options-translator.ts, provider.ts) to delegate to shared/, with re-exports preserving every public name and signature.
    • @archon/core config-loader — one additional key (copilot) accepted in the assistants config schema.
    • packages/docs-web — getting-started AI-assistants page gains install/auth instructions, a config example, a feature compatibility table, and a enableConfigDiscovery trust-boundary note.
    • bun.lock — pulls in @github/copilot-sdk@^0.2.2 and its transitive deps.
  • Potential unintended effects:
    • Pi provider behavior — the most plausible regression vector, mitigated by exact-name re-exports and the existing Pi test suite continuing to pass unchanged.
    • Workflow loading — workflows that reference provider: copilot now load successfully where they previously errored. No existing workflow references it.
    • Config schema — adding assistants.copilot to the schema does not invalidate any existing config (Zod additions to an object are non-breaking when the new key is optional).
  • Guardrails / monitoring for early detection:
    • The 52 new tests (parsing, resolver precedence, options translation, MCP env expansion, skills resolution, structured-output round-trips, agents translation, hardening cases) plus the 6 new registry.test.ts cases gate the Copilot path.
    • The capability flags in capabilities.ts are the public truth for what is wired — anything still false (hooks, fallbackModel, sandbox, costControl) will surface as a clean "capability not supported" warning at workflow load, not a runtime crash.
    • bun run validate (which CI runs) covers check:bundled, type-check, lint, format check, and the per-package test suites.

Rollback Plan

  • Fast rollback: revert the PR merge commit on dev. Eight commits, purely additive, no schema or interface changes — revert is clean.
  • User-side rollback: remove provider: copilot from workflow YAML or assistants.copilot from .archon/config.yaml.

Risks and Mitigations

  • Risk: @github/copilot-sdk is pre-1.0 (v0.2.2) — breaking changes possible on minor bumps.
    • Mitigation: pinned ^0.2.2. The 52 new Copilot tests plus type-check gate any API-surface drift.
  • Risk: Generator abandonment without an AbortSignal can leave a Copilot session holding resources until sendAndWait resolves (up to 24 h).
    • Mitigation: the dag-executor (primary caller) always threads an AbortSignal.
  • Risk: Best-effort structured output depends on model instruction-following, not SDK enforcement.
  • Risk: enableConfigDiscovery: true would let the Copilot CLI load repo-level config outside Archon's workflow validation surface.
    • Mitigation: defaults to false. Docs carry an explicit trust-boundary note.

Credits

Original first-class-provider attempt by @mhingston in #1111 — this PR re-implements the same capability behind the community provider registry per @Wirasm's guidance on close. The community-provider seam itself is @Wirasm's (#1270 / #1297). @mhingston is not continuing the work per their comment on #1111.

🤖 Includes AI-assisted development and review.

Summary by CodeRabbit

  • New Features

    • Added GitHub Copilot as a community AI assistant provider with session resume, MCP support, skills, agents, structured output, tool restrictions, and reasoning control.
    • New Copilot setup options: model selection, CLI path override, discovery flags, and multiple auth methods (CLI login or token env vars).
    • Copilot provider export/registration for easy discovery.
  • Documentation

    • Getting Started guide updated with Copilot setup, examples, auth guidance, and a feature support table.
  • Tests & CI

    • End-to-end and unit tests plus smoke/e2e workflows for Copilot.

popemkt and others added 8 commits April 21, 2026 21:48
New community provider wired through @github/copilot-sdk. Registered as
builtIn: false alongside Pi. Covers session resume, effort/thinking
controls, envInjection, and a config-aware binary resolver (env >
config > vendor > PATH, with vendor winning over PATH in binary mode).

Advanced workflow features (MCP, skills, tool restrictions, structured
output, fallback model, sandbox) are intentionally flagged false —
they are not wired to Archon's workflow surface yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Translates nodeConfig.allowed_tools/denied_tools to Copilot SessionConfig
availableTools/excludedTools. SDK enforces availableTools precedence when
both are set. Also refactors sendQuery to route all SessionConfig
construction through buildSessionConfig() with a ProviderWarning collector,
so subsequent parity phases (MCP, skills, structured output, agents,
fallbackModel) can plug in as single applyX() calls.

Flips COPILOT_CAPABILITIES.toolRestrictions = true.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reuses Claude's loadMcpConfig() to parse the MCP JSON file referenced by
nodeConfig.mcp, expand $ENV_VAR references, and assign the result to
Copilot's SessionConfig.mcpServers. Missing env vars surface as a
system warning chunk; IO/JSON errors propagate.

Flips COPILOT_CAPABILITIES.mcp = true.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracts Pi's resolvePiSkills into a provider-agnostic shared utility at
packages/providers/src/shared/skills.ts as resolveSkillDirectories. Pi
re-exports it under the historical name for back-compat.

Copilot's applySkills() maps nodeConfig.skills (names) → absolute skill
directory paths → SessionConfig.skillDirectories. Missing names surface
as a single system warning chunk.

Flips COPILOT_CAPABILITIES.skills = true.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot has no native JSON-mode equivalent to Claude's outputFormat or
Codex's outputSchema, so this mirrors Pi's best-effort approach: augment
the user prompt with a "respond with JSON matching this schema"
instruction, accumulate the assistant transcript, and parse it on the
terminal result chunk. Parse failure leaves structuredOutput unset — the
dag-executor's existing dag.structured_output_missing warning handles
downstream degradation.

Also extracts augmentPromptForJsonSchema and tryParseStructuredOutput
into packages/providers/src/shared/structured-output.ts. Pi re-exports
them under the original paths for back-compat.

Flips COPILOT_CAPABILITIES.structuredOutput = true.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Abort-guard: check abortSignal.aborted BEFORE awaiting sendAndWait,
  since addEventListener('abort', ...) is a no-op on already-aborted
  signals and would otherwise enter the 24h timeout path after cancel.
- Cleanup: wrap session.disconnect() and client.stop() in independent
  try/catch-log blocks in the finally so a cleanup throw can't replace
  a successful result or the friendly auth/model error.
- Fallback-content flag: flip sawAssistantContent = true when emitting
  the final-message fallback, so a session.error received on the same
  turn doesn't double-emit as a spurious system warning (Devin finding).
- Model trim: strip leading/trailing whitespace on requestOptions.model
  and copilotConfig.model before assigning to SessionConfig.model.
- Binary resolver: replace existsSync() with isExecutableFile() (isFile
  + exec bit on posix, isFile on win32) so env/config/vendor overrides
  fail early instead of passing a directory or non-executable to the
  SDK.
- Binary resolver test: per-test tmpdir + chmod 0o755, no more shared
  /tmp/.archon state.
- Docs: trust-boundary note that enableConfigDiscovery = true bypasses
  Archon's workflow validation surface for MCP/skills.
- Registry test: drop stale "Pi is currently the only community
  provider" comment now that Copilot is bundled too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Maps nodeConfig.agents (Record<name, AgentDef>) to Copilot
SessionConfig.customAgents. Direct pass-through for the fields Copilot's
CustomAgentConfig supports: name (from the map key), description, prompt,
and tools (allowlist — Copilot has no per-agent denylist).

Archon agent fields Copilot cannot represent (model, disallowedTools,
skills, maxTurns) surface as one consolidated system-warning chunk per
agent. We deliberately do NOT set SessionConfig.agent: Archon's
workflow model invokes sub-agents via the Task tool, not by switching
the active agent at session start.

Flips COPILOT_CAPABILITIES.agents = true.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the real-filesystem vendor binary with spyOn(isExecutableFile) +
path-fragment assertion, matching the sibling Codex resolver test. The
previous test hardcoded the posix binary name 'copilot' while the resolver
looks for 'copilot.exe' on win32, so the vendor branch never matched on
Windows and the test then leaked to the system-installed Copilot CLI via
PATH.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new GitHub Copilot community provider (implementation, tests, registration, and exports), shared utilities for structured-output and skill resolution, updates Pi to use shared utilities, expands core safe-assistant allowlist to include copilot (retaining model), documentation and E2E workflows for Copilot.

Changes

Cohort / File(s) Summary
Core config & docs
packages/core/src/config/config-loader.ts, packages/docs-web/src/content/docs/getting-started/ai-assistants.md
Added copilot to assistant safe-defaults allowlist (retain model); extended docs to cover Copilot setup, auth, configuration, and feature support.
Copilot provider core
packages/providers/src/community/copilot/provider.ts, packages/providers/src/community/copilot/registration.ts, packages/providers/src/community/copilot/capabilities.ts, packages/providers/src/community/copilot/config.ts, packages/providers/src/community/copilot/binary-resolver.ts, packages/providers/src/community/copilot/index.ts
New CopilotProvider class and supporting modules: session config mapping, event streaming → MessageChunks, structured-output integration, CLI binary resolution, config parsing, capabilities constant, registration and barrel exports.
Copilot tests
packages/providers/src/community/copilot/*.test.ts (provider.test, provider-hardening.test, config.test, binary-resolver.test, mcp-translation.test, skills-translation.test, agents-translation.test, tool-restrictions.test, structured-output.test)
Comprehensive Bun test suites covering binary resolution, config parsing, session translation (agents/skills/MCP/tool restrictions), streaming behavior, structured-output parsing, and hardening/cleanup cases.
Shared utilities
packages/providers/src/shared/structured-output.ts, packages/providers/src/shared/skills.ts
New shared helpers: prompt augmentation + best-effort JSON parse for structured output; skill-directory resolver returning resolved paths and missing names.
Pi provider adjustments
packages/providers/src/community/pi/event-bridge.ts, packages/providers/src/community/pi/options-translator.ts, packages/providers/src/community/pi/provider.ts
Replaced local structured-output and skill-resolution logic with imports/re-exports from shared modules (no behavioral change aside from sourcing).
Provider registry & exports
packages/providers/src/registry.ts, packages/providers/src/registry.test.ts, packages/providers/src/index.ts, packages/providers/package.json
Register Copilot in community providers, add re-exports and package export entry, add @github/copilot-sdk runtime dependency and test script updates.
Workflows
.archon/workflows/test-workflows/e2e-copilot-smoke.yaml, .archon/workflows/test-workflows/e2e-copilot-all-features.yaml
New GitHub workflow YAMLs: smoke and comprehensive E2E checks exercising connectivity, reasoning, tool restrictions, structured output, skills, MCP, and custom agent wiring.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Provider as CopilotProvider
    participant FS as File System (MCP/Skills)
    participant SDK as Copilot SDK (Client/Session)
    participant Queue as Async Chunk Queue

    User->>Provider: sendQuery(prompt, cwd, options)
    activate Provider

    Provider->>FS: load MCP config (if any)
    Provider->>FS: resolve skill directories (if any)
    Provider->>Provider: build SessionConfig (model, env, agents, tools, skills, mcp)
    Provider->>SDK: createSession(config) or resumeSession(id)
    activate SDK

    SDK-->>Provider: emit events (message_delta, reasoning, tool_start/complete, usage, error)
    Provider->>Queue: enqueue translated chunks (assistant, thinking, tool, tool_result, result)
    Provider->>Provider: tryParseStructuredOutput(transcript) (if requested)
    Provider->>Queue: emit final result (tokens/cost, structuredOutput if parsed)

    Provider->>SDK: disconnect() / stop()
    deactivate SDK

    Provider-->>User: async generator yielding MessageChunks
    deactivate Provider
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • PR #1195: Directly relates to core safe-assistant allowlist logic for toSafeAssistantDefaults() that this change extends to include copilot.
  • PR #1185: Introduced the provider-capabilities seam and static capability constants that Copilot integrates with (COPILOT_CAPABILITIES).
  • PR #1297: Refactored structured-output helpers into a shared module; this PR extracts and reuses those helpers across providers (Pi and Copilot).

Poem

🐇 I hopped through code to add a friend,

Copilot joins the provider blend.
Prompts and skills and MCP streams,
JSON dreams and CLI schemes.
A rabbit cheers—new features penned! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding GitHub Copilot as a community provider with builtIn: false, which is the primary feature of this PR.
Description check ✅ Passed The PR description is comprehensive and addresses the required template sections: Summary (5 bullets), UX Journey (Before/After), Architecture Diagram (Before/After with connection inventory), Label Snapshot, Change Metadata, Linked Issues, Validation Evidence (with specific test results), Security Impact (with token handling details), Compatibility/Migration (backward compatible with re-exports), Human Verification (live Windows CLI testing, edge cases), Side Effects/Blast Radius (subsystems and guardrails), Rollback Plan (clean revert), and Risks/Mitigations (pre-1.0 SDK, resource leaks, structured output).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (6)
packages/providers/src/community/copilot/config.ts (2)

30-65: Document the silent-drop fallback behavior.

parseCopilotConfig silently discards values with wrong types (e.g., model: 123, logLevel: 'verbose'). This is intentional — see the matching drops invalid values silently test — but without a comment at the function header a future reader will reasonably expect a thrown error on malformed YAML. Add a brief comment documenting the fallback, per the repo's "document fallback behavior with a comment when intentional and safe" rule.

✏️ Suggested doc comment
+/**
+ * Parse raw YAML-derived config into a typed `CopilotProviderDefaults`.
+ *
+ * Fallback behavior: fields with unexpected types (or `logLevel` outside the
+ * enumerated set) are silently omitted rather than throwing, matching the
+ * lenient parsing used by other provider config loaders. Callers see only
+ * well-typed fields in the result.
+ */
 export function parseCopilotConfig(raw: Record<string, unknown>): CopilotProviderDefaults {

As per coding guidelines: "Prefer throwing early with a clear error for unsupported or unsafe states — never silently swallow errors or broaden permissions; document fallback behavior with a comment when intentional and safe".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/providers/src/community/copilot/config.ts` around lines 30 - 65, Add
a brief doc comment above the parseCopilotConfig function describing that it
intentionally drops/masks invalid or mistyped input values (e.g., non-string
model or unsupported logLevel) rather than throwing, and reference the
CopilotProviderDefaults fallback behavior and the existing test "drops invalid
values silently" so future readers know this is intentional and safe; keep the
comment concise and include guidance that callers should validate upstream if
they need strict errors.

1-28: Consider narrowing the index signature.

The [key: string]: unknown on line 2 opens the interface to arbitrary extra keys, which weakens the type guarantee of parseCopilotConfig (callers can store anything on the result without TS complaining). If the intent is "forward-compat for unknown SDK fields", the parser itself doesn't propagate unknown keys — only the allowlisted ones — so the index signature may be over-permissive. Consider removing it unless there's a specific consumer that needs it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/providers/src/community/copilot/config.ts` around lines 1 - 28,
Remove the overly-broad index signature from the CopilotProviderDefaults
interface (the "[key: string]: unknown" entry) so callers cannot attach
arbitrary properties; keep only the explicit optional fields (model,
copilotCliPath, configDir, enableConfigDiscovery, useLoggedInUser, logLevel). If
forward-compatibility for unknown SDK fields is required, instead introduce a
separate explicit type (e.g., CopilotProviderRaw or Record<string, unknown>)
used only where raw/unvalidated config is handled and ensure parseCopilotConfig
continues to return the narrowed CopilotProviderDefaults type.
packages/providers/src/shared/structured-output.ts (1)

46-60: Consider distinguishing valid null/primitive JSON from parse failure.

JSON.parse('null'), JSON.parse('42'), and JSON.parse('"str"') all succeed and return non-object values. Since the contract returns unknown and callers treat any non-undefined result as "structured output available", a model emitting a literal null would be attached as the structured output rather than triggering the dag.structured_output_missing warning path. If structured output is expected to always be an object/array, consider rejecting non-object results here.

♻️ Optional tightening
   try {
-    return JSON.parse(cleaned);
+    const parsed: unknown = JSON.parse(cleaned);
+    // Schema augmentation asks for a JSON object; reject bare primitives/null
+    // so callers degrade through the existing missing-structured-output path.
+    if (parsed === null || typeof parsed !== 'object') return undefined;
+    return parsed;
   } catch {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/providers/src/shared/structured-output.ts` around lines 46 - 60, The
function tryParseStructuredOutput currently returns any JSON value (including
null/primitives); change it to only accept parsed values that are
objects/arrays: after parsing the cleaned input in tryParseStructuredOutput,
assign the result to a variable and return it only if the value is non-null and
typeof value === 'object' (arrays are fine since typeof is 'object'); otherwise
return undefined to signal "no structured output". Keep the existing trimming
and fence-stripping logic and the same error-catching behavior.
packages/providers/src/community/copilot/agents-translation.test.ts (1)

71-88: Make the omission tests prove SessionConfig was built.

Both tests fall back to {}, so they can pass if the provider exits before creating a session.

🧪 Proposed assertion hardening
-    const cfg = capturedSessionConfigs[0] ?? {};
+    expect(capturedSessionConfigs).toHaveLength(1);
+    const cfg = capturedSessionConfigs[0]!;
     expect(cfg.customAgents).toBeUndefined();
-    const cfg = capturedSessionConfigs[0] ?? {};
+    expect(capturedSessionConfigs).toHaveLength(1);
+    const cfg = capturedSessionConfigs[0]!;
     expect(cfg.customAgents).toBeUndefined();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/providers/src/community/copilot/agents-translation.test.ts` around
lines 71 - 88, The tests "'omits customAgents when nodeConfig.agents is absent'"
and "'omits customAgents when agents is an empty object'" currently allow a
false positive because they default to {} if no session was recorded; update
each test to assert that a SessionConfig was actually built by verifying
capturedSessionConfigs contains an entry (e.g., length > 0 or a concrete
SessionConfig field exists) before checking cfg.customAgents. Locate the
assertions around capturedSessionConfigs in the tests for
CopilotProvider.sendQuery and add a precondition that the provider created a
session (using capturedSessionConfigs or a known SessionConfig property) so the
subsequent expect(cfg.customAgents).toBeUndefined() proves SessionConfig
construction rather than an early exit.
packages/providers/src/community/copilot/tool-restrictions.test.ts (1)

76-82: Assert that a session config was captured before checking omitted fields.

The ?? {} fallback means this test passes even if sendQuery() never reaches createSession().

🧪 Proposed assertion hardening
     await drain(new CopilotProvider().sendQuery('hi', '/repo', undefined, { model: 'gpt-5' }));
 
-    const cfg = capturedSessionConfigs[0] ?? {};
+    expect(capturedSessionConfigs).toHaveLength(1);
+    const cfg = capturedSessionConfigs[0]!;
     expect(cfg.availableTools).toBeUndefined();
     expect(cfg.excludedTools).toBeUndefined();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/providers/src/community/copilot/tool-restrictions.test.ts` around
lines 76 - 82, The test currently uses a fallback (capturedSessionConfigs[0] ??
{}) which masks the case where createSession() was never called; update the test
to first assert that a session config was actually captured by checking
capturedSessionConfigs.length (or that capturedSessionConfigs[0] is defined)
before inspecting fields. Specifically, in the test that calls new
CopilotProvider().sendQuery('hi', '/repo', ...), add an assertion like
expect(capturedSessionConfigs.length).toBeGreaterThan(0) or
expect(capturedSessionConfigs[0]).toBeDefined() prior to reading cfg and then
keep the existing checks for cfg.availableTools and cfg.excludedTools.
packages/providers/src/community/copilot/skills-translation.test.ts (1)

111-112: Don’t let omission checks pass without a captured session config.

Line 111 and Line 157 fall back to {}, so these tests would still pass if sendQuery() stopped before createSession() and never produced a session config.

🧪 Proposed assertion hardening
-    const cfg = capturedSessionConfigs[0] ?? {};
+    expect(capturedSessionConfigs).toHaveLength(1);
+    const cfg = capturedSessionConfigs[0]!;
     expect(cfg.skillDirectories).toBeUndefined();
-    const cfg = capturedSessionConfigs[0] ?? {};
+    expect(capturedSessionConfigs).toHaveLength(1);
+    const cfg = capturedSessionConfigs[0]!;
     expect(cfg.skillDirectories).toBeUndefined();

Also applies to: 157-158

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/providers/src/community/copilot/skills-translation.test.ts` around
lines 111 - 112, The test is currently allowing a missing session config by
falling back to {} (const cfg = capturedSessionConfigs[0] ?? {}), so if
sendQuery() never reached createSession() the omission checks would still pass;
change the assertions to first verify a session config was captured (e.g.,
expect(capturedSessionConfigs.length).toBeGreaterThan(0) or
expect(capturedSessionConfigs).not.toHaveLength(0)), then assign without a
fallback (const cfg = capturedSessionConfigs[0]) and keep the existing
expect(cfg.skillDirectories).toBeUndefined(); apply the same change to the other
occurrence around the later assertion so both checks fail fast if no session was
created.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.archon/workflows/e2e-copilot-smoke.yaml:
- Around line 15-25: The test currently interpolates the AI-controlled value
"$simple.output" into output="..." which allows shell substitutions to execute;
change the assignment so the response is captured as raw data via a quoted
here-doc (use a single-quoted delimiter like <<'EOF' and read into the variable
with a raw read to avoid backslash or command expansion) instead of direct
interpolation, then run the existing rg check against that safe variable; update
the lines that reference output="$simple.output" and the subsequent checks to
use the here-doc-captured variable (and ensure you use a raw/read -r form so
backslashes and substitutions aren't processed).

In `@packages/providers/package.json`:
- Line 28: The package.json currently uses a caret range for the preview SDK
entry "@github/copilot-sdk": "^0.2.2", which will allow future 0.2.x changes
(some breaking); decide and update that dependency to the intended immutability
level — replace the caret with a tilde "~0.2.2" to allow only patch fixes within
0.2.x, or pin exactly to "0.2.2" for full stability, then commit with a note
stating the chosen policy for "@github/copilot-sdk".

In `@packages/providers/src/community/copilot/binary-resolver.test.ts`:
- Around line 38-50: The test deletes process.env.COPILOT_CLI_PATH in the
beforeEach and never restores it, risking cross-test pollution; capture the
original value into a variable (e.g., origCopilotCliPath) before tests run, then
in beforeEach clear or set process.env.COPILOT_CLI_PATH as the test needs, and
in afterEach restore it by assigning back origCopilotCliPath when defined or
deleting the env var if it was undefined; update the existing
beforeEach/afterEach around tmpRoot/archonHome to use origCopilotCliPath so the
environment is restored after each test.

In `@packages/providers/src/community/copilot/binary-resolver.ts`:
- Around line 89-123: The logs currently include full executable paths (e.g.,
envPath, configCopilotCliPath, vendorBinaryPath, fromPath) which may contain
PII; update the getLog().info calls inside binary-resolver (the branches using
envPath, configCopilotCliPath, vendorBinaryPath, fromPath and functions like
getVendorBinaryName(), resolveFromPath(), BUNDLED_IS_BINARY, isExecutableFile())
to avoid emitting full paths by default—log the resolution source and either the
basename/redacted path (use path.basename()) or omit the path entirely, and only
include the full binaryPath under a debug/verbose logging level so sensitive
home/username components are not logged at info level.
- Around line 16-21: The isExecutableFile function currently uses
_statSync(...).mode & 0o111 which only checks for any execute bit, not whether
the current process/user can execute the file; replace that check with
fs.accessSync(path, fs.constants.X_OK) to test executability for the current
user (keep the early return for !stat.isFile() and the Windows shortcut
returning true), and catch any thrown error from accessSync to return false;
reference isExecutableFile and _statSync to locate the change and use
fs.accessSync and fs.constants.X_OK for the permission check.

In `@packages/providers/src/community/copilot/mcp-translation.test.ts`:
- Around line 69-82: The afterEach block currently deletes
COPILOT_MCP_TEST_TOKEN unconditionally which can clobber a pre-existing env var;
update the test setup to save the original value (e.g., capture
process.env.COPILOT_MCP_TEST_TOKEN in beforeEach or at top-level) and then in
afterEach restore it (set process.env.COPILOT_MCP_TEST_TOKEN back to the saved
value or delete only if it was originally undefined). Modify the existing
beforeEach/afterEach helpers around applyMcpServers / workDir to use the saved
token variable so tests restore the original environment state.

In `@packages/providers/src/community/copilot/provider.ts`:
- Around line 281-308: buildFriendlyCopilotError currently prefers the thrown
error message and can miss actionable details in lastSessionError (e.g.,
session.error from the SDK); modify the function to check both rawMessage and
lastSessionError when classifying with isModelAccessError and the
authentication/login checks, and when constructing the returned Error include
the session detail (lastSessionError) alongside the primary message so users see
the SDK-provided actionable text; reference the rawMessage, lastSessionError,
isModelAccessError, and the authentication string checks in your changes.
- Around line 377-580: The generator doesn't cancel the Copilot SDK run if the
caller stops iterating early; hoist the session variable out of the background
IIFE (currently declared inside sendQuery's async block as `let session`) so the
outer generator can access it, and after the for-await loop (or in a finally
surrounding it) call `session?.abort()` (catch/log errors) to terminate
`session.sendAndWait()` when the generator is closed early; keep the existing
`onAbort`/`abortSignal` logic intact and ensure you don't replace the primary
error by swallowing abort errors (use getLog().warn).

---

Nitpick comments:
In `@packages/providers/src/community/copilot/agents-translation.test.ts`:
- Around line 71-88: The tests "'omits customAgents when nodeConfig.agents is
absent'" and "'omits customAgents when agents is an empty object'" currently
allow a false positive because they default to {} if no session was recorded;
update each test to assert that a SessionConfig was actually built by verifying
capturedSessionConfigs contains an entry (e.g., length > 0 or a concrete
SessionConfig field exists) before checking cfg.customAgents. Locate the
assertions around capturedSessionConfigs in the tests for
CopilotProvider.sendQuery and add a precondition that the provider created a
session (using capturedSessionConfigs or a known SessionConfig property) so the
subsequent expect(cfg.customAgents).toBeUndefined() proves SessionConfig
construction rather than an early exit.

In `@packages/providers/src/community/copilot/config.ts`:
- Around line 30-65: Add a brief doc comment above the parseCopilotConfig
function describing that it intentionally drops/masks invalid or mistyped input
values (e.g., non-string model or unsupported logLevel) rather than throwing,
and reference the CopilotProviderDefaults fallback behavior and the existing
test "drops invalid values silently" so future readers know this is intentional
and safe; keep the comment concise and include guidance that callers should
validate upstream if they need strict errors.
- Around line 1-28: Remove the overly-broad index signature from the
CopilotProviderDefaults interface (the "[key: string]: unknown" entry) so
callers cannot attach arbitrary properties; keep only the explicit optional
fields (model, copilotCliPath, configDir, enableConfigDiscovery,
useLoggedInUser, logLevel). If forward-compatibility for unknown SDK fields is
required, instead introduce a separate explicit type (e.g., CopilotProviderRaw
or Record<string, unknown>) used only where raw/unvalidated config is handled
and ensure parseCopilotConfig continues to return the narrowed
CopilotProviderDefaults type.

In `@packages/providers/src/community/copilot/skills-translation.test.ts`:
- Around line 111-112: The test is currently allowing a missing session config
by falling back to {} (const cfg = capturedSessionConfigs[0] ?? {}), so if
sendQuery() never reached createSession() the omission checks would still pass;
change the assertions to first verify a session config was captured (e.g.,
expect(capturedSessionConfigs.length).toBeGreaterThan(0) or
expect(capturedSessionConfigs).not.toHaveLength(0)), then assign without a
fallback (const cfg = capturedSessionConfigs[0]) and keep the existing
expect(cfg.skillDirectories).toBeUndefined(); apply the same change to the other
occurrence around the later assertion so both checks fail fast if no session was
created.

In `@packages/providers/src/community/copilot/tool-restrictions.test.ts`:
- Around line 76-82: The test currently uses a fallback
(capturedSessionConfigs[0] ?? {}) which masks the case where createSession() was
never called; update the test to first assert that a session config was actually
captured by checking capturedSessionConfigs.length (or that
capturedSessionConfigs[0] is defined) before inspecting fields. Specifically, in
the test that calls new CopilotProvider().sendQuery('hi', '/repo', ...), add an
assertion like expect(capturedSessionConfigs.length).toBeGreaterThan(0) or
expect(capturedSessionConfigs[0]).toBeDefined() prior to reading cfg and then
keep the existing checks for cfg.availableTools and cfg.excludedTools.

In `@packages/providers/src/shared/structured-output.ts`:
- Around line 46-60: The function tryParseStructuredOutput currently returns any
JSON value (including null/primitives); change it to only accept parsed values
that are objects/arrays: after parsing the cleaned input in
tryParseStructuredOutput, assign the result to a variable and return it only if
the value is non-null and typeof value === 'object' (arrays are fine since
typeof is 'object'); otherwise return undefined to signal "no structured
output". Keep the existing trimming and fence-stripping logic and the same
error-catching behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 726053e6-274e-4f7e-b850-15e38885d083

📥 Commits

Reviewing files that changed from the base of the PR and between ae2d936 and e0b57a8.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (27)
  • .archon/workflows/e2e-copilot-smoke.yaml
  • packages/core/src/config/config-loader.ts
  • packages/docs-web/src/content/docs/getting-started/ai-assistants.md
  • packages/providers/package.json
  • packages/providers/src/community/copilot/agents-translation.test.ts
  • packages/providers/src/community/copilot/binary-resolver.test.ts
  • packages/providers/src/community/copilot/binary-resolver.ts
  • packages/providers/src/community/copilot/capabilities.ts
  • packages/providers/src/community/copilot/config.test.ts
  • packages/providers/src/community/copilot/config.ts
  • packages/providers/src/community/copilot/index.ts
  • packages/providers/src/community/copilot/mcp-translation.test.ts
  • packages/providers/src/community/copilot/provider-hardening.test.ts
  • packages/providers/src/community/copilot/provider.test.ts
  • packages/providers/src/community/copilot/provider.ts
  • packages/providers/src/community/copilot/registration.ts
  • packages/providers/src/community/copilot/skills-translation.test.ts
  • packages/providers/src/community/copilot/structured-output.test.ts
  • packages/providers/src/community/copilot/tool-restrictions.test.ts
  • packages/providers/src/community/pi/event-bridge.ts
  • packages/providers/src/community/pi/options-translator.ts
  • packages/providers/src/community/pi/provider.ts
  • packages/providers/src/index.ts
  • packages/providers/src/registry.test.ts
  • packages/providers/src/registry.ts
  • packages/providers/src/shared/skills.ts
  • packages/providers/src/shared/structured-output.ts

Comment thread .archon/workflows/e2e-copilot-smoke.yaml Outdated
"dependencies": {
"@anthropic-ai/claude-agent-sdk": "^0.2.89",
"@archon/paths": "workspace:*",
"@github/copilot-sdk": "^0.2.2",
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

@github/copilot-sdk changelog breaking changes 0.2.x

💡 Result:

The GitHub Copilot SDK (github/copilot-sdk) 0.2.x series, starting with v0.2.0 (2026-03-20), v0.2.1 (2026-04-03), and v0.2.2 (2026-04-10), includes the following breaking changes documented in the official CHANGELOG.md and release notes: All SDKs: - autoRestart option deprecated and removed from client options (no effect now, remove references) [1,2,3] Node.js-specific (v0.2.1): - onElicitationRequest handler signature changed from (request, invocation) to single ElicitationContext argument. Access via context.sessionId and context.message [1,15] Python-specific (major API overhaul in v0.2.0): - CopilotClient constructor: CopilotClientOptions TypedDict replaced by dataclasses like ExternalServerConfig(url=...) or SubprocessConfig(cli_path=..., log_level=...) [1,2,3,16] - create_session and resume_session: SessionConfig/ResumeSessionConfig TypedDicts removed; use keyword arguments directly (e.g., on_permission_request=..., model=...). on_permission_request now required [1,2,3,16] - send and send_and_wait: MessageOptions TypedDict removed; first positional arg is prompt str, attachments/mode as kwargs [1,2,3,16] - MessageOptions, SessionConfig, ResumeSessionConfig no longer exported publicly [1,2,3] - Internal modules renamed: copilot.jsonrpc -> copilot._jsonrpc, etc. [1,2,3] - copilot.types module removed; import types from copilot, copilot.generated.session_events, copilot.session, copilot.tools, etc. [16,18] - PermissionRequest is now a dataclass (.kind attribute, not dict access) [16] Go-specific (v0.2.0): - Client.Start context cancel no longer kills CLI process; use client.Stop explicitly [1,2,3] - LogOptions.Ephemeral changed from bool to *bool; use copilot.Bool(true) [1,2,3] Other notes: - v0.2.0 described as "big update with broad API refinements" [1,2] - No comprehensive migration guide found; changes detailed in CHANGELOG with before/after code examples for Python [1,13,16]. - Issues mention additional compatibility notes, e.g., ModelCapabilities for BYOK in later previews [9].

Citations:


🏁 Script executed:

# Verify PR context to understand what the "pinned" claim refers to
git log --oneline --all | head -20

Repository: coleam00/Archon

Length of output: 1792


🏁 Script executed:

# Find recent changes to packages/providers/package.json related to copilot-sdk
git log --oneline -n 50 -- packages/providers/package.json

Repository: coleam00/Archon

Length of output: 138


🏁 Script executed:

# Get the PR description and context for the most recent change
git show e0b57a8f --stat

Repository: coleam00/Archon

Length of output: 50371


🏁 Script executed:

# Get the current state of the package.json file
cat -n packages/providers/package.json | grep -A 5 -B 5 "copilot-sdk"

Repository: coleam00/Archon

Length of output: 485


Confirm intent for ^0.2.2 version range on a public-preview SDK.

@github/copilot-sdk is in public preview with documented breaking changes across the 0.2.x series (v0.2.0, v0.2.1, v0.2.2): autoRestart removal, Node.js onElicitationRequest handler signature change, major Python API overhaul, and Go Client.Start behavior change. The caret range allows upgrades within >=0.2.2 <0.3.0, so future 0.2.x patches and their breaking changes will be auto-pulled, while 0.3.0 will remain excluded. If strict pinning was intended, consider ~0.2.2 or exact 0.2.2. Otherwise, this is fine—just confirm the range matches your intent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/providers/package.json` at line 28, The package.json currently uses
a caret range for the preview SDK entry "@github/copilot-sdk": "^0.2.2", which
will allow future 0.2.x changes (some breaking); decide and update that
dependency to the intended immutability level — replace the caret with a tilde
"~0.2.2" to allow only patch fixes within 0.2.x, or pin exactly to "0.2.2" for
full stability, then commit with a note stating the chosen policy for
"@github/copilot-sdk".

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged but skipping — for 0.x.y, npm semver makes ^0.2.2 and ~0.2.2 resolve to the same range (>=0.2.2 <0.3.0). And since bun.lock is committed, the installed version is exact regardless of the spec. The only meaningful tightening would be exact pin "0.2.2", which I don't think is worth losing patch fixes for. Will pin if 0.3.x lands with a known regression.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit.

Comment thread packages/providers/src/community/copilot/binary-resolver.test.ts
Comment thread packages/providers/src/community/copilot/binary-resolver.ts Outdated
Comment on lines +89 to +123
getLog().info({ binaryPath: envPath, source: 'env' }, 'copilot.binary_resolved');
return envPath;
}

if (configCopilotCliPath) {
if (!isExecutableFile(configCopilotCliPath)) {
throw new Error(
`assistants.copilot.copilotCliPath is set to "${configCopilotCliPath}" but it is not an executable file.\n` +
'Please verify the path in .archon/config.yaml points to the Copilot CLI executable (chmod +x if needed).'
);
}
getLog().info(
{ binaryPath: configCopilotCliPath, source: 'config' },
'copilot.binary_resolved'
);
return configCopilotCliPath;
}

if (BUNDLED_IS_BINARY) {
const vendorBinaryName = getVendorBinaryName();
if (vendorBinaryName) {
const vendorBinaryPath = join(getArchonHome(), COPILOT_VENDOR_DIR, vendorBinaryName);
if (isExecutableFile(vendorBinaryPath)) {
getLog().info(
{ binaryPath: vendorBinaryPath, source: 'vendor' },
'copilot.binary_resolved'
);
return vendorBinaryPath;
}
}
}

const fromPath = resolveFromPath();
if (fromPath && isExecutableFile(fromPath)) {
getLog().info({ binaryPath: fromPath, source: 'path' }, 'copilot.binary_resolved');
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid logging full local executable paths.

binaryPath can include usernames or home-directory details. Log the resolution source and a redacted/basename path instead, or omit the path unless debug logging explicitly needs it.

🔒 Proposed logging adjustment
-    getLog().info({ binaryPath: envPath, source: 'env' }, 'copilot.binary_resolved');
+    getLog().info({ source: 'env' }, 'copilot.binary_resolved');
@@
-      { binaryPath: configCopilotCliPath, source: 'config' },
+      { source: 'config' },
@@
-          { binaryPath: vendorBinaryPath, source: 'vendor' },
+          { source: 'vendor' },
@@
-    getLog().info({ binaryPath: fromPath, source: 'path' }, 'copilot.binary_resolved');
+    getLog().info({ source: 'path' }, 'copilot.binary_resolved');

As per coding guidelines, “Never log API keys, tokens … user message content, or PII.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
getLog().info({ binaryPath: envPath, source: 'env' }, 'copilot.binary_resolved');
return envPath;
}
if (configCopilotCliPath) {
if (!isExecutableFile(configCopilotCliPath)) {
throw new Error(
`assistants.copilot.copilotCliPath is set to "${configCopilotCliPath}" but it is not an executable file.\n` +
'Please verify the path in .archon/config.yaml points to the Copilot CLI executable (chmod +x if needed).'
);
}
getLog().info(
{ binaryPath: configCopilotCliPath, source: 'config' },
'copilot.binary_resolved'
);
return configCopilotCliPath;
}
if (BUNDLED_IS_BINARY) {
const vendorBinaryName = getVendorBinaryName();
if (vendorBinaryName) {
const vendorBinaryPath = join(getArchonHome(), COPILOT_VENDOR_DIR, vendorBinaryName);
if (isExecutableFile(vendorBinaryPath)) {
getLog().info(
{ binaryPath: vendorBinaryPath, source: 'vendor' },
'copilot.binary_resolved'
);
return vendorBinaryPath;
}
}
}
const fromPath = resolveFromPath();
if (fromPath && isExecutableFile(fromPath)) {
getLog().info({ binaryPath: fromPath, source: 'path' }, 'copilot.binary_resolved');
getLog().info({ source: 'env' }, 'copilot.binary_resolved');
return envPath;
}
if (configCopilotCliPath) {
if (!isExecutableFile(configCopilotCliPath)) {
throw new Error(
`assistants.copilot.copilotCliPath is set to "${configCopilotCliPath}" but it is not an executable file.\n` +
'Please verify the path in .archon/config.yaml points to the Copilot CLI executable (chmod +x if needed).'
);
}
getLog().info(
{ source: 'config' },
'copilot.binary_resolved'
);
return configCopilotCliPath;
}
if (BUNDLED_IS_BINARY) {
const vendorBinaryName = getVendorBinaryName();
if (vendorBinaryName) {
const vendorBinaryPath = join(getArchonHome(), COPILOT_VENDOR_DIR, vendorBinaryName);
if (isExecutableFile(vendorBinaryPath)) {
getLog().info(
{ source: 'vendor' },
'copilot.binary_resolved'
);
return vendorBinaryPath;
}
}
}
const fromPath = resolveFromPath();
if (fromPath && isExecutableFile(fromPath)) {
getLog().info({ source: 'path' }, 'copilot.binary_resolved');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/providers/src/community/copilot/binary-resolver.ts` around lines 89
- 123, The logs currently include full executable paths (e.g., envPath,
configCopilotCliPath, vendorBinaryPath, fromPath) which may contain PII; update
the getLog().info calls inside binary-resolver (the branches using envPath,
configCopilotCliPath, vendorBinaryPath, fromPath and functions like
getVendorBinaryName(), resolveFromPath(), BUNDLED_IS_BINARY, isExecutableFile())
to avoid emitting full paths by default—log the resolution source and either the
basename/redacted path (use path.basename()) or omit the path entirely, and only
include the full binaryPath under a debug/verbose logging level so sensitive
home/username components are not logged at info level.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping. The resolved path is the load-bearing piece of debug info for "which Copilot binary did Archon pick up?" — without it, the source: vendor|env|config|path label tells you the tier but not which CLI actually ran (someone could have multiple installs on PATH). Username-in-path is the weakest tier of PII, and the rest of the codebase already logs paths (worktree paths, repo paths, cwd, etc.) at info level. The CLAUDE.md rule is aimed at tokens / secrets / user message content, not ~/Users/foo/.archon/.... Happy to revisit if Pino structured logs ever become a shared / shipped surface.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit.

Comment thread packages/providers/src/community/copilot/mcp-translation.test.ts
Comment thread packages/providers/src/community/copilot/provider.ts Outdated
Comment thread packages/providers/src/community/copilot/provider.ts
Hoang Nguyen Gia added 2 commits April 22, 2026 18:45
Single conflict in packages/providers/package.json test script — resolved
as union (all pi/ tests from upstream including new provider-lazy-load.test.ts
plus the copilot/ test files from this branch).

Upstream's pi-provider lazy-load refactor (coleam00#1355) auto-merged cleanly with
this branch's pi → shared/ extraction because the extracted shared modules
have no Pi SDK deps, so the compiled-binary contract is preserved.
Follow-up to the upstream/dev merge — the Edit that resolved the test-script
conflict only rewrote the body, leaving the opening marker. JSON now parses,
prettier clean.
@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Apr 27, 2026

Hi @popemkt — thanks for opening this PR.

This repository uses a PR template at .github/pull_request_template.md with several required sections. A few of them appear to be empty or placeholder here:

  • UX Journey
  • Label Snapshot
  • Change Metadata
  • Human Verification
  • Side Effects / Blast Radius

Could you fill those out (even briefly)? The template helps reviewers understand scope, risk, and rollback — it speeds up review significantly.

If a section genuinely doesn't apply, just write "N/A" in it rather than leaving it blank.

@popemkt
Copy link
Copy Markdown
Author

popemkt commented Apr 27, 2026

Thanks @Wirasm — sorry for the rough edges. I've filled in the five missing sections directly in the PR description (rather than as a reply, so future readers see the full template):

  • UX Journey — before/after flow (Copilot was previously only reachable by screen-scraping the TUI; the after-diagram marks the newly accepted node fields with [+]).
  • Label Snapshotrisk: low, size: L, scope core/dependencies/docs/tests, module providers:community/copilot.
  • Change Metadatafeature, primary scope multi (mostly providers, with small docs and one-line core/config-loader touch).
  • Human Verification — explicit list of what I personally validated against my live Copilot subscription (Windows v1.0.31, three workflows) and what I did not verify (Linux/macOS native CLI, the SDK's 24h abandonment ceiling, the four deferred capabilities).
  • Side Effects / Blast Radius — affected subsystems, the Pi back-compat surface (re-exports preserve every existing name), and the test/capability-flag guardrails that gate the new path.

Also added a proper Linked Issue section (the Closes #1115 was previously only in the context blockquote) and tightened the Validation Evidence with the per-batch test details.

Let me know if any of those want more detail.

Quick wins from CodeRabbit's review pass — corrections only, no new
behavior. Live smoke (Linux + gpt-5-mini, Copilot CLI 1.0.36) passes
end-to-end after the bash hardening.

Provider runtime:
- binary-resolver: use accessSync(X_OK) instead of mode & 0o111 — owner
  vs. world execute bits are not the same thing for the current process.
- provider: classify against both thrown error and lastSessionError so
  session.error details (auth, model access) surface in the user-facing
  message.
- provider: abort the Copilot session on early generator close — prevents
  sendAndWait from running until the 24h ceiling when the consumer breaks
  out of the for-await loop.
- shared/structured-output: reject bare null / number / string JSON
  results so callers degrade through the structured_output_missing path
  instead of attaching a primitive.
- config: document the lenient parser's silent-drop fallback.

E2E smoke YAML (.archon/workflows/e2e-copilot-smoke.yaml):
- Wrap $simple.output in single quotes so model-controlled command
  substitution / backticks are treated as data, not executed. CodeRabbit's
  proposed heredoc would have broken YAML's literal-block indentation;
  the single-quote variant is a strict improvement over the previous
  double-quoted interpolation. Switched to grep -F for portability.

Tests:
- binary-resolver / mcp-translation: save and restore env vars in
  beforeEach/afterEach instead of unconditionally deleting.
- agents-, mcp-, skills-, tool-restrictions translation tests: replace
  capturedSessionConfigs[0] ?? {} with explicit toHaveLength(1) so omit-
  field tests can no longer pass when sendQuery exits before
  createSession.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@popemkt
Copy link
Copy Markdown
Author

popemkt commented Apr 27, 2026

Worked through the CodeRabbit pass. Pushed in bf8734b. Live smoke (Linux + gpt-5-mini, Copilot CLI 1.0.36) passes end-to-end after the bash hardening.

Applied (10)

# File Fix
🟠 e2e-copilot-smoke.yaml Single-quote $simple.output so command substitution / backticks in a model response are data, not exec. The proposed heredoc would have broken YAML literal-block indentation — single-quote is a strict improvement over the previous double-quote. rggrep -F for portability.
🟡 binary-resolver.ts accessSync(X_OK) instead of mode & 0o111.
🟡 binary-resolver.test.ts Save/restore COPILOT_CLI_PATH.
🟡 mcp-translation.test.ts Save/restore COPILOT_MCP_TEST_TOKEN.
🟡 provider.ts buildFriendlyCopilotError Classify against both thrown error and lastSessionError; combined message in user-facing text.
🟠 provider.ts sendQuery Hoisted session ref + outer try/finally; aborts SDK run if the consumer stops iterating before the queue closes.
nit agents-translation.test.ts (×7), tool-restrictions.test.ts (×5), skills-translation.test.ts (×4), mcp-translation.test.ts (×3) Replaced every capturedSessionConfigs[0] ?? {} with an explicit expect(...).toHaveLength(1) precondition. 19 sites in total.
nit config.ts Doc-comment on parseCopilotConfig describing the silent-drop fallback.
nit shared/structured-output.ts Reject null / number / string JSON results so callers degrade via structured_output_missing instead of attaching a primitive. Pi tests stay green (no Pi test asserts on bare-primitive parse).

Not applied (3) — see inline replies for the rationale

  • package.json ^0.2.2~0.2.2 — for 0.x.y, npm semver makes ^ and ~ produce the same range; with bun.lock committed the installed version is exact regardless. Dismissing.
  • binary-resolver.ts "don't log full paths" — the resolved path is the load-bearing debug info for "which Copilot did Archon pick up?" Username-in-path is the weakest tier of PII; the rest of the codebase already logs paths (worktree paths, repo paths, cwd). Stripping here just makes triage harder.
  • config.ts drop [key: string]: unknown index signature — style nit, not a bug, and the index signature is doing work for the YAML pass-through callers. Keeping for now.

Validation

  • bun --filter @archon/providers test — all batches green (Pi included via re-exports).
  • bun run validatecheck:bundled, type-check, lint, format:check all clean. Six pre-existing bun run test failures in @archon/server are unrelated (Bun.YAML.stringify is not a function on Bun 1.2.21 — also fails on the un-modified branch).
  • Live smoke against my authenticated Copilot subscription: workflow e2e-copilot-smoke passes, [assert] Completed (11ms), PASS: simple=COPILOT_OK.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/providers/src/community/copilot/provider.ts (1)

502-515: Pre-aborted signal short-circuit is correct, but session is created before the check.

Lines 438–440 create/resume the session before the abortSignal?.aborted check on line 506. If a caller passes an already-aborted signal, the SDK still performs the session round-trip first, then DOMException('AbortError') is thrown and the finally disconnects. Functionally fine, but consider checking abortSignal.aborted before createSession/resumeSession to avoid the wasted session creation. Optional.

Suggested early-abort placement
       try {
+        if (requestOptions?.abortSignal?.aborted) {
+          throw new DOMException('Copilot sendQuery aborted before start', 'AbortError');
+        }
         session = resumeSessionId
           ? await client.resumeSession(resumeSessionId, sessionConfig)
           : await client.createSession(sessionConfig);
         activeSession = session;
@@
-        const abortSignal = requestOptions?.abortSignal;
-        // `addEventListener('abort', ...)` is a no-op on an already-aborted
-        // signal, so short-circuit before handing the 24-hour sendAndWait
-        // path a signal that will never fire.
-        if (abortSignal?.aborted) {
-          throw new DOMException('Copilot sendQuery aborted before start', 'AbortError');
-        }
+        const abortSignal = requestOptions?.abortSignal;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/providers/src/community/copilot/provider.ts` around lines 502 - 515,
The pre-aborted signal check (abortSignal?.aborted) happens after
creating/resuming the session, causing unnecessary session creation when callers
pass an already-aborted signal; move the abortSignal?.aborted check to before
any call that creates a session (before createSession/resumeSession) and only
proceed to call createSession or resumeSession once the signal is confirmed not
aborted; keep the existing onAbort handler and abort cleanup (session.abort(),
finally disconnect) as-is so that runtime aborts are still handled correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/providers/src/community/copilot/provider.ts`:
- Around line 502-515: The pre-aborted signal check (abortSignal?.aborted)
happens after creating/resuming the session, causing unnecessary session
creation when callers pass an already-aborted signal; move the
abortSignal?.aborted check to before any call that creates a session (before
createSession/resumeSession) and only proceed to call createSession or
resumeSession once the signal is confirmed not aborted; keep the existing
onAbort handler and abort cleanup (session.abort(), finally disconnect) as-is so
that runtime aborts are still handled correctly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0fb20b38-e9d9-4759-88ac-3856eee6823a

📥 Commits

Reviewing files that changed from the base of the PR and between efd838e and bf8734b.

📒 Files selected for processing (10)
  • .archon/workflows/e2e-copilot-smoke.yaml
  • packages/providers/src/community/copilot/agents-translation.test.ts
  • packages/providers/src/community/copilot/binary-resolver.test.ts
  • packages/providers/src/community/copilot/binary-resolver.ts
  • packages/providers/src/community/copilot/config.ts
  • packages/providers/src/community/copilot/mcp-translation.test.ts
  • packages/providers/src/community/copilot/provider.ts
  • packages/providers/src/community/copilot/skills-translation.test.ts
  • packages/providers/src/community/copilot/tool-restrictions.test.ts
  • packages/providers/src/shared/structured-output.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/providers/src/community/copilot/config.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • .archon/workflows/e2e-copilot-smoke.yaml
  • packages/providers/src/community/copilot/agents-translation.test.ts

@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Apr 27, 2026

Thanks for putting this together, @popemkt!

Since this is a community provider (builtIn: false), I won't review or run it deeply myself. To merge with confidence I need evidence it works. Could you share:

  • Models tested — which Copilot models did you exercise?
  • Configs used — auth setup, extension config, any MCP/skills.
  • Results — what scenarios you tested (chat, tool use, structured output, etc.) and what worked.
  • Video evidence if possible — a short screen recording of an end-to-end run.

A smoke test workflow in .archon/workflows/ (like e2e-pi-smoke.yaml) with the run output captured would be ideal — it doubles as adoption docs.

Once that's in, happy to merge.

Resolves the conflict on packages/providers/src/community/pi/provider.ts.

- Upstream's coleam00#1284 (ModelRegistry) and our shared/structured-output
  extraction both touch the same region. Upstream removed the inline
  augmentPromptForJsonSchema call site that coleam00#1284 didn't itself need;
  our branch had moved that function to shared/. Resolution keeps the
  shared/ extraction (single source of truth for both Pi and Copilot)
  and re-exports it from pi/provider.ts under the original name so
  existing Pi callers and tests stay byte-for-byte.
- Drops the dead-code lookupPiModel/GetModelFn helper that was a stale
  leftover from an earlier merge attempt — never had a caller and was
  superseded by ModelRegistry upstream.
- Picks up coleam00#1431 — moves e2e-copilot-smoke.yaml under test-workflows/
  alongside the other e2e-*.yaml smokes.

Adds e2e-copilot-all-features smoke (mirrors e2e-minimax-smoke):
  basic chat (PONG) + effort: high (17×23 = 391) + denied_tools: [shell,
  write] (DENIED_OK) + output_format JSON (best-effort via shared/
  structured-output, parsed as {model, ok}). Single bash assert verifies
  all four end-to-end. Doubles as adoption docs.

Validation:
- bun run check:bundled, type-check, lint, format:check — all green.
- bun --filter @archon/providers test — fully green (Pi included).
- Live smoke (Linux + gpt-5-mini, Copilot CLI 1.0.36):
    e2e-copilot-smoke           → 12s, PASS
    e2e-copilot-all-features    → 25s, PASS (all four caps)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.archon/workflows/test-workflows/e2e-copilot-all-features.yaml (1)

22-24: Idle timeouts vs. expected runtime.

hello uses idle_timeout: 60000 (60s) and reasoning uses 90000 (90s) — fine — but the file header advertises ~45-90s total runtime. With four sequential-ish nodes (3 of the 4 fan out from hello, but assert is all_success), worst-case is ~3min before the assert fails. Consider tightening the header range or noting it's a worst-case ceiling, otherwise CI runners may flag the workflow as hung-looking on slow days.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.archon/workflows/test-workflows/e2e-copilot-all-features.yaml around lines
22 - 24, The workflow's advertised runtime range doesn't match the worst-case
sum of node idle_timeouts; update the header or the node timeouts so they align:
either shorten idle_timeout on the 'hello' (currently 60000) and/or 'reasoning'
(currently 90000) nodes to fit the advertised "~45-90s" ceiling, or expand the
header note to explicitly state the worst-case ceiling (≈3 minutes) when
accounting for sequential nodes and the final 'assert' with all_success
aggregation; reference and adjust the 'hello', 'reasoning', and 'assert' nodes
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.archon/workflows/test-workflows/e2e-copilot-all-features.yaml:
- Around line 35-46: The test labeled id: tool_restricted currently only asserts
the transcript contains "DENIED_OK" (the prompt in prompt: | and the
denied_tools: [shell, write] setting), but the comment wrongly claims it
verifies that the shell tool was not invoked; update the test to either (A)
change the comment to accurately state it only verifies prompt compliance (i.e.,
that the model output contains DENIED_OK) or (B) change the test to exercise
deny-list behavior by replacing the prompt with a task that requires shell
access (e.g., "list files in cwd") and then assert the model refused or the SDK
returned a denied-tool error, or (C) capture and assert the SDK request
payload/debug log to ensure excludedTools includes "shell"; refer to id:
tool_restricted, prompt, denied_tools and the transcript assertion around lines
113-118 when making the change.

---

Nitpick comments:
In @.archon/workflows/test-workflows/e2e-copilot-all-features.yaml:
- Around line 22-24: The workflow's advertised runtime range doesn't match the
worst-case sum of node idle_timeouts; update the header or the node timeouts so
they align: either shorten idle_timeout on the 'hello' (currently 60000) and/or
'reasoning' (currently 90000) nodes to fit the advertised "~45-90s" ceiling, or
expand the header note to explicitly state the worst-case ceiling (≈3 minutes)
when accounting for sequential nodes and the final 'assert' with all_success
aggregation; reference and adjust the 'hello', 'reasoning', and 'assert' nodes
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5e67ecc0-2171-426e-9928-aa0c8ebccee9

📥 Commits

Reviewing files that changed from the base of the PR and between bf8734b and 06c3f9e.

📒 Files selected for processing (4)
  • .archon/workflows/test-workflows/e2e-copilot-all-features.yaml
  • .archon/workflows/test-workflows/e2e-copilot-smoke.yaml
  • packages/docs-web/src/content/docs/getting-started/ai-assistants.md
  • packages/providers/src/community/pi/provider.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/providers/src/community/pi/provider.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/docs-web/src/content/docs/getting-started/ai-assistants.md

Comment on lines +35 to +46
# 3. denied_tools — the model is asked to do something it would normally
# use the shell tool for. With shell denied, it must decline or fall
# back to inline reasoning. The assert below checks the model did NOT
# invoke the shell tool by inspecting the result text for a refusal /
# inline-only marker.
- id: tool_restricted
prompt: |
You have NO shell access. Without running any tools, reply with exactly:
DENIED_OK
denied_tools: [shell, write]
idle_timeout: 60000
depends_on: [hello]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

tool_restricted assertion verifies prompt compliance, not deny-list enforcement.

The comment at lines 35-39 says the assert "checks the model did NOT invoke the shell tool", but the actual check at lines 113-118 only greps the final transcript for DENIED_OK. A model can:

  • emit DENIED_OK while attempting a shell call (which would be blocked by the SDK regardless), or
  • refuse correctly without emitting DENIED_OK (false negative).

In other words: a DENIED_OK substring proves the model followed the prompt; it doesn't prove excludedTools was wired through, which is the actual capability you're smoking. Two cheap improvements:

  1. Use a prompt that requires the shell tool to succeed (e.g., "list files in cwd"), and assert refusal/inability — this exercises the deny-list path.
  2. Or capture and assert on the SDK request payload via debug logging if available.

At minimum, soften the comment so the test claim matches what the assertion proves.

-  # 3. denied_tools — the model is asked to do something it would normally
-  #    use the shell tool for. With shell denied, it must decline or fall
-  #    back to inline reasoning. The assert below checks the model did NOT
-  #    invoke the shell tool by inspecting the result text for a refusal /
-  #    inline-only marker.
+  # 3. denied_tools — the model is told it has no shell access and asked
+  #    to reply with DENIED_OK. NOTE: this asserts the model followed the
+  #    prompt; SDK-level enforcement of `excludedTools` is covered by
+  #    tool-restrictions.test.ts. Treat this as a regression smoke for
+  #    the YAML→provider config pass-through, not for SDK enforcement.

Also applies to: 113-118

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.archon/workflows/test-workflows/e2e-copilot-all-features.yaml around lines
35 - 46, The test labeled id: tool_restricted currently only asserts the
transcript contains "DENIED_OK" (the prompt in prompt: | and the denied_tools:
[shell, write] setting), but the comment wrongly claims it verifies that the
shell tool was not invoked; update the test to either (A) change the comment to
accurately state it only verifies prompt compliance (i.e., that the model output
contains DENIED_OK) or (B) change the test to exercise deny-list behavior by
replacing the prompt with a task that requires shell access (e.g., "list files
in cwd") and then assert the model refused or the SDK returned a denied-tool
error, or (C) capture and assert the SDK request payload/debug log to ensure
excludedTools includes "shell"; refer to id: tool_restricted, prompt,
denied_tools and the transcript assertion around lines 113-118 when making the
change.

popemkt and others added 3 commits April 27, 2026 22:21
Extends the smoke from 4 to 7 live capabilities, plus a cleanup pass.

New nodes:
- setup_mcp_fixture / setup_skills_fixture (bash) — stage the
  fixtures at run time so they don't pollute the repo when idle.
- mcp_demo (AI, mcp: ./.archon/test-fixtures/copilot-mcp.json) —
  spawns the canonical @modelcontextprotocol/server-everything stdio
  MCP server and asks the model to call its add(2,3) tool.
- skills_demo (AI, skills: [copilot-smoke]) — uses a staged SKILL.md
  with a fixed marker.
- agents_demo (AI, inline agents: { smoke-helper: ... }) — asks the
  model to invoke a custom sub-agent via the Task tool and surface
  its marker.
- cleanup (bash, trigger_rule: all_done) — removes the fixtures
  regardless of pass/fail.

Assert hardening:
- Drops the redundant outer single quotes around $nodeId.output
  references. The DAG executor already shell-escapes for bash nodes
  (coleam00#591 + dag-executor.ts:282 substituteNodeOutputRefs with
  escapedForBash=true). Wrapping the engine's pre-quoted value in a
  second pair of quotes broke parsing on outputs containing
  apostrophes / parens (the `it's the "copilot-smoke" skill (listed
  in available skills)` case).
- Routes the human-readable results table to stderr so it surfaces in
  the terminal — successful bash node stdout is captured silently as
  the node's output for downstream substitution.
- Skills check is soft: assert non-empty + log a WARN if the model
  paraphrases. The deterministic proof is the Pino
  copilot.skills_resolved event (resolved:1, missing:[]). gpt-5-mini
  is known to paraphrase rather than emit exact SKILL.md markers, so
  failing the workflow on model-adherence flake would be a false
  negative for the wiring itself.

Live evidence on Linux + Copilot CLI 1.0.36 + gpt-5-mini:
- copilot.agents_registered { count: 1, names: ["smoke-helper"] }
- copilot.mcp_loaded { serverNames: ["everything"], missingVars: [] }
- copilot.skills_resolved { resolved: 1, missing: [] }
- mcp_demo returned `5` (model invoked the MCP add tool).
- agents_demo returned `COPILOT_AGENT_MARKER_OK`.
- skills_demo returned skill description chitchat — soft warn, expected.
- Workflow exit: PASS, ~110s end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skill assertion was previously soft (warn-not-fail) because gpt-5-mini
paraphrased the SKILL.md body instead of emitting the marker verbatim.
Agent assertion regressed in the prior strengthening pass — model
confabulated the skill marker rather than invoking the registered agent.

Two changes pin both checks deterministically:

1. Stronger SKILL.md and prompt — the description now carries an
   explicit "Use when..." trigger so the metadata scan picks the right
   invocation; the body has the directive in three places (frontmatter,
   ## Output, ## Behavior) so the lazy-loaded body is unambiguous; the
   skills_demo prompt explicitly tells the model to "invoke the
   copilot-smoke skill" so the SDK loads the body (Copilot lazy-loads
   skill bodies on invocation, like Claude's progressive disclosure).

2. Unguessable random tokens. Skill emits SK_a8f3kL2qZTOK; agent
   emits AG_n5k7HpT3wAGOK. The previous COPILOT_SKILL_MARKER_OK /
   COPILOT_AGENT_MARKER_OK pattern was guessable from training data —
   gpt-5-mini confabulated the skill marker even when only the agent
   was registered (no skills loaded for that node). Random tokens
   make confabulation arithmetically impossible: a token's presence
   in the response is mathematical proof the SDK actually loaded that
   capability's content into the model's context.

Also: agent renamed `smoke-helper` → `task-responder` so its name no
longer pattern-matches as a skill, and the kebab-case validator (added
in the loader) accepts it. agents_demo prompt now mandates Task-tool
invocation explicitly.

Live evidence on Linux + Copilot CLI 1.0.36 + gpt-5-mini, fresh DB:
  ── results ──
  hello       = PONG
  reasoning   = 391
  restricted  = DENIED_OK
  json.model  = gpt-5-mini, json.ok = true
  mcp_demo    = 5
  skills_demo = SK_a8f3kL2qZTOK
  agents_demo = ...AG_n5k7HpT3wAGOKAG_n5k7HpT3wAGOK
  PASS: all seven capabilities exercised end-to-end

Pino confirms wiring (visible in run output):
  copilot.skills_resolved   { resolved: 1, missing: [] }
  copilot.mcp_loaded        { serverNames: ["everything"], missingVars: [] }
  copilot.agents_registered { count: 1, names: ["task-responder"] }

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s in smoke

Doc clarification:
Unlike Claude (OAuth subscription OR ANTHROPIC_API_KEY) and Codex (OAuth OR
OPENAI_API_KEY), GitHub Copilot has only ONE auth model: GitHub OAuth bound
to a Copilot subscription. The four "options" previously listed as a flat
bullet list were misleading — they're different *delivery paths for the
same OAuth token*, not separate auth schemes.

The doc now leads with that fact and presents the four paths as a When-to-
Use table, with an explicit warning that GitHub Actions' workflow-scoped
${{ github.token }} does NOT carry Copilot scope.

Smoke quoting fix:
e2e-copilot-smoke.yaml had the same redundant-outer-single-quote pattern
we already fixed in e2e-copilot-all-features (coleam00#591 + dag-executor's
substituteNodeOutputRefs already shell-quotes for bash). Bumped
idle_timeout from 30s → 60s — gpt-5-mini occasionally pauses past the 30s
default on a cold session, leading to the smoke failing with empty output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@popemkt
Copy link
Copy Markdown
Author

popemkt commented Apr 27, 2026

Thanks @Wirasm — quick run-down on your four asks.

Video: my machine's codec installs are not cooperating, but the logs below are from the actual run.

Models tested: gpt-5-mini live on Linux + Copilot CLI 1.0.36, plus three earlier scenarios on Windows + CLI 1.0.31 (in PR description).

Configs: copilot login for auth (no env tokens). MCP / skills / agents staged at runtime by the smoke workflow itself — no fixtures committed, cleanup node strips them after.

Results: one DAG covers all 7 wired capabilities — chat, effort: high, denied_tools, output_format JSON, mcp:, skills:, agents:. All assertions are pretty self-evident, you can inspect the bash in the assert step in the all-features workflow file, skill and agent assertions use unguessable random tokens (SK_a8f3kL2qZTOK / AG_n5k7HpT3wAGOK), the model can't fake them.

── results ──
hello       = PONG
reasoning   = 391
restricted  = DENIED_OK
json.model  = gpt-5-mini, json.ok = true
mcp_demo    = ...Running the add call now.5
skills_demo = SK_a8f3kL2qZTOK
agents_demo = ...AG_n5k7HpT3wAGOKAG_n5k7HpT3wAGOK
──────────────
PASS: all seven capabilities exercised end-to-end

Smoke files: .archon/workflows/test-workflows/e2e-copilot-{smoke,all-features}.yaml (per #1431's grouping). Documented inline so they double as adoption docs.

Full run output — bun run cli workflow run e2e-copilot-all-features (~62s)
Dispatching workflow: **e2e-copilot-all-features**
{"level":30,"time":1777306037705,"pid":282416,"hostname":"fedora","module":"provider.claude","authMode":"global","msg":"using_global_auth"}
{"level":30,"time":1777306037714,"pid":282416,"hostname":"fedora","module":"workflow.executor","workflowName":"e2e-copilot-all-features","provider":"copilot","providerSource":"workflow definition","model":"gpt-5-mini","msg":"workflow_provider_resolved"}
{"level":30,"time":1777306037719,"pid":282416,"hostname":"fedora","module":"workflow.executor","workflowName":"e2e-copilot-all-features","workflowRunId":"72860ab9d699cdbb8e117153c8b91137","hasIssueContext":false,"issueContextLength":0,"msg":"workflow_starting"}
🚀 **Starting workflow**: `e2e-copilot-all-features`

> Copilot provider feature smoke — chat + effort + tool restrictions + structured output + MCP + skills + agents.
{"level":30,"time":1777306037734,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","workflowName":"e2e-copilot-all-features","nodeCount":11,"layerCount":4,"hasIssueContext":false,"issueContextLength":0,"msg":"dag_workflow_starting"}
{"level":30,"time":1777306037736,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"setup_skills_fixture","type":"bash","msg":"dag_node_started"}
{"level":30,"time":1777306037736,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"setup_mcp_fixture","type":"bash","msg":"dag_node_started"}
{"level":30,"time":1777306037738,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"hello","provider":"copilot","msg":"dag_node_started"}
[setup_skills_fixture] Started
[setup_mcp_fixture] Started
[hello] Started
{"level":30,"time":1777306037766,"pid":282416,"hostname":"fedora","module":"copilot-binary","binaryPath":"/stuff/WorkSpace/Archon/node_modules/.bin/copilot","source":"path","msg":"copilot.binary_resolved"}
{"level":30,"time":1777306037771,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"setup_skills_fixture","durationMs":35,"msg":"dag_node_completed"}
{"level":30,"time":1777306037771,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"setup_mcp_fixture","durationMs":35,"msg":"dag_node_completed"}
[setup_skills_fixture] Completed (35ms)
[setup_mcp_fixture] Completed (35ms)
{"level":40,"time":1777306043827,"pid":282416,"hostname":"fedora","module":"provider.claude","rateLimitInfo":{"status":"allowed","resetsAt":1777320600,"rateLimitType":"five_hour","overageStatus":"allowed","overageResetsAt":1777306200,"isUsingOverage":false},"msg":"claude.rate_limit_event"}
{"level":30,"time":1777306044469,"pid":282416,"hostname":"fedora","module":"service.title-generator","conversationDbId":"0053c6e4e8e0178b2c1dd9c9a593a260","title":"Run E2E Copilot All Features","msg":"title.generate_completed"}
P

ONG
{"level":30,"time":1777306051039,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"hello","durationMs":13301,"msg":"dag_node_completed"}
[hello] Completed (13.3s)
{"level":30,"time":1777306051044,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"reasoning","provider":"copilot","msg":"dag_node_started"}
{"level":30,"time":1777306051044,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"tool_restricted","provider":"copilot","msg":"dag_node_started"}
{"level":30,"time":1777306051044,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"structured","provider":"copilot","msg":"dag_node_started"}
{"level":30,"time":1777306051044,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"agents_demo","provider":"copilot","msg":"dag_node_started"}
{"level":30,"time":1777306051044,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"skills_demo","provider":"copilot","msg":"dag_node_started"}
[reasoning] Started
[tool_restricted] Started
[agents_demo] Started
[skills_demo] Started
[structured] Started
{"level":30,"time":1777306051071,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"mcp_demo","provider":"copilot","msg":"dag_node_started"}
{"level":30,"time":1777306051073,"pid":282416,"hostname":"fedora","module":"copilot-binary","binaryPath":"/stuff/WorkSpace/Archon/node_modules/.bin/copilot","source":"path","msg":"copilot.binary_resolved"}
[mcp_demo] Started
{"level":30,"time":1777306051083,"pid":282416,"hostname":"fedora","module":"copilot-binary","binaryPath":"/stuff/WorkSpace/Archon/node_modules/.bin/copilot","source":"path","msg":"copilot.binary_resolved"}
{"level":30,"time":1777306051084,"pid":282416,"hostname":"fedora","module":"copilot-binary","binaryPath":"/stuff/WorkSpace/Archon/node_modules/.bin/copilot","source":"path","msg":"copilot.binary_resolved"}
{"level":30,"time":1777306051085,"pid":282416,"hostname":"fedora","module":"provider.copilot","resolved":1,"missing":[],"msg":"copilot.skills_resolved"}
{"level":30,"time":1777306051086,"pid":282416,"hostname":"fedora","module":"provider.copilot","serverNames":["everything"],"missingVars":[],"msg":"copilot.mcp_loaded"}
{"level":30,"time":1777306051088,"pid":282416,"hostname":"fedora","module":"copilot-binary","binaryPath":"/stuff/WorkSpace/Archon/node_modules/.bin/copilot","source":"path","msg":"copilot.binary_resolved"}
{"level":30,"time":1777306051088,"pid":282416,"hostname":"fedora","module":"provider.copilot","count":1,"names":["task-responder"],"msg":"copilot.agents_registered"}
{"level":30,"time":1777306051091,"pid":282416,"hostname":"fedora","module":"copilot-binary","binaryPath":"/stuff/WorkSpace/Archon/node_modules/.bin/copilot","source":"path","msg":"copilot.binary_resolved"}
{"level":30,"time":1777306051092,"pid":282416,"hostname":"fedora","module":"copilot-binary","binaryPath":"/stuff/WorkSpace/Archon/node_modules/.bin/copilot","source":"path","msg":"copilot.binary_resolved"}
391
{"level":30,"time":1777306062470,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"reasoning","durationMs":11426,"msg":"dag_node_completed"}
[reasoning] Completed (11.4s)
DEN

IED

_OK
{"level":30,"time":1777306065092,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"tool_restricted","durationMs":14047,"msg":"dag_node_completed"}
[tool_restricted] Completed (14s)
{"model":"gpt-5-mini","ok":true}
{"level":30,"time":1777306065623,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"structured","durationMs":14579,"msg":"dag_node_completed"}
[structured] Completed (14.6s)
SK

_a

8

f

3

k

L

2

q

ZT

OK
{"level":30,"time":1777306076448,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"skills_demo","durationMs":25404,"msg":"dag_node_completed"}
[skills_demo] Completed (25.4s)
Inv

oking

 the

 task

-res

ponder

 sub

-agent

 to

 retrieve

 the

 fixed

 smoke

-test

 token

.

 Running

 it

 now

.

AG_n5k7HpT3wAGOK

AG

_n

5

k

7

Hp

T

3

w

AG

OK
{"level":30,"time":1777306088834,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"agents_demo","durationMs":37790,"msg":"dag_node_completed"}
[agents_demo] Completed (37.8s)
Comput

ing

 

2

 +

 

3

 using

 the

 MCP

 add

 tool

 on

 the

 everything

 server

.

 Running

 the

 add

 call

 now

.

5
{"level":30,"time":1777306100084,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"mcp_demo","durationMs":49039,"msg":"dag_node_completed"}
[mcp_demo] Completed (49s)
{"level":30,"time":1777306100089,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"assert","type":"bash","msg":"dag_node_started"}
[assert] Started
{"level":40,"time":1777306100107,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"assert","stderr":"── results ──\nhello       = PONG\nreasoning   = 391\nrestricted  = DENIED_OK\njson.model  = gpt-5-mini\njson.ok     = true\nmcp_demo    = Computing 2 + 3 using the MCP add tool on the everything server. Running the add call now.5\nskills_demo = SK_a8f3kL2qZTOK\nagents_demo = Invoking the task-responder sub-agent to retrieve the fixed smoke-test token. Running it now.AG_n5k7HpT3wAGOKAG_n5k7HpT3wAGOK\n──────────────\nPASS: all seven capabilities exercised end-to-end","msg":"bash_node_stderr"}
Bash node 'assert' stderr:

── results ──
hello       = PONG
reasoning   = 391
restricted  = DENIED_OK
json.model  = gpt-5-mini
json.ok     = true
mcp_demo    = Computing 2 + 3 using the MCP add tool on the everything server. Running the add call now.5
skills_demo = SK_a8f3kL2qZTOK
agents_demo = Invoking the task-responder sub-agent to retrieve the fixed smoke-test token. Running it now.AG_n5k7HpT3wAGOKAG_n5k7HpT3wAGOK
──────────────
PASS: all seven capabilities exercised end-to-end

{"level":30,"time":1777306100110,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"assert","durationMs":22,"msg":"dag_node_completed"}
[assert] Completed (22ms)
{"level":30,"time":1777306100114,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"cleanup","type":"bash","msg":"dag_node_started"}
[cleanup] Started
{"level":30,"time":1777306100129,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"cleanup","durationMs":15,"msg":"dag_node_completed"}
[cleanup] Completed (15ms)
{"level":30,"time":1777306100134,"pid":282416,"hostname":"fedora","module":"workflow.dag-executor","nodeCount":11,"anyCompleted":true,"anyFailed":false,"msg":"dag_workflow_finished"}
cleanup complete

Workflow completed successfully.
Full run output — bun run cli workflow run e2e-copilot-smoke (~11s)
Dispatching workflow: **e2e-copilot-smoke**
{"level":30,"time":1777306126378,"pid":284572,"hostname":"fedora","module":"provider.claude","authMode":"global","msg":"using_global_auth"}
{"level":30,"time":1777306126388,"pid":284572,"hostname":"fedora","module":"workflow.executor","workflowName":"e2e-copilot-smoke","provider":"copilot","providerSource":"workflow definition","model":"gpt-5-mini","msg":"workflow_provider_resolved"}
{"level":30,"time":1777306126393,"pid":284572,"hostname":"fedora","module":"workflow.executor","workflowName":"e2e-copilot-smoke","workflowRunId":"daf59b6774dcf92630100e98635a30c1","hasIssueContext":false,"issueContextLength":0,"msg":"workflow_starting"}
🚀 **Starting workflow**: `e2e-copilot-smoke`

> Smoke test for the GitHub Copilot community provider.
{"level":30,"time":1777306126416,"pid":284572,"hostname":"fedora","module":"workflow.dag-executor","workflowName":"e2e-copilot-smoke","nodeCount":2,"layerCount":2,"hasIssueContext":false,"issueContextLength":0,"msg":"dag_workflow_starting"}
{"level":30,"time":1777306126419,"pid":284572,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"simple","provider":"copilot","msg":"dag_node_started"}
[simple] Started
{"level":30,"time":1777306126440,"pid":284572,"hostname":"fedora","module":"copilot-binary","binaryPath":"/stuff/WorkSpace/Archon/node_modules/.bin/copilot","source":"path","msg":"copilot.binary_resolved"}
{"level":40,"time":1777306133883,"pid":284572,"hostname":"fedora","module":"provider.claude","rateLimitInfo":{"status":"allowed","resetsAt":1777314000,"rateLimitType":"five_hour","overageStatus":"allowed","overageResetsAt":1777593600,"isUsingOverage":false},"msg":"claude.rate_limit_event"}
{"level":30,"time":1777306134434,"pid":284572,"hostname":"fedora","module":"service.title-generator","conversationDbId":"c2a6d79b3ff57bd2a2a2a8d1be283c94","title":"E2E Copilot Smoke Test Run","msg":"title.generate_completed"}
COP

IL

OT

_OK
{"level":30,"time":1777306137192,"pid":284572,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"simple","durationMs":10773,"msg":"dag_node_completed"}
[simple] Completed (10.8s)
{"level":30,"time":1777306137199,"pid":284572,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"assert","type":"bash","msg":"dag_node_started"}
[assert] Started
{"level":30,"time":1777306137210,"pid":284572,"hostname":"fedora","module":"workflow.dag-executor","nodeId":"assert","durationMs":11,"msg":"dag_node_completed"}
[assert] Completed (11ms)
{"level":30,"time":1777306137215,"pid":284572,"hostname":"fedora","module":"workflow.dag-executor","nodeCount":2,"anyCompleted":true,"anyFailed":false,"msg":"dag_workflow_finished"}
PASS: simple=COPILOT_OK

Workflow completed successfully.

Side note while I have you: the docs page had four auth options listed flat. Tightened in ef16bac to make clear Copilot has one auth model (GitHub OAuth, subscription-billed); the four "options" are delivery paths for the same OAuth token. Worth a quick eyeball.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Just a reminder to add support for the GitHub Copilot SDK

2 participants