feat(providers): add GitHub Copilot community provider (builtIn: false)#1351
feat(providers): add GitHub Copilot community provider (builtIn: false)#1351popemkt wants to merge 15 commits intocoleam00:devfrom
Conversation
New community provider wired through @github/copilot-sdk. Registered as builtIn: false alongside Pi. Covers session resume, effort/thinking controls, envInjection, and a config-aware binary resolver (env > config > vendor > PATH, with vendor winning over PATH in binary mode). Advanced workflow features (MCP, skills, tool restrictions, structured output, fallback model, sandbox) are intentionally flagged false — they are not wired to Archon's workflow surface yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Translates nodeConfig.allowed_tools/denied_tools to Copilot SessionConfig availableTools/excludedTools. SDK enforces availableTools precedence when both are set. Also refactors sendQuery to route all SessionConfig construction through buildSessionConfig() with a ProviderWarning collector, so subsequent parity phases (MCP, skills, structured output, agents, fallbackModel) can plug in as single applyX() calls. Flips COPILOT_CAPABILITIES.toolRestrictions = true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reuses Claude's loadMcpConfig() to parse the MCP JSON file referenced by nodeConfig.mcp, expand $ENV_VAR references, and assign the result to Copilot's SessionConfig.mcpServers. Missing env vars surface as a system warning chunk; IO/JSON errors propagate. Flips COPILOT_CAPABILITIES.mcp = true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracts Pi's resolvePiSkills into a provider-agnostic shared utility at packages/providers/src/shared/skills.ts as resolveSkillDirectories. Pi re-exports it under the historical name for back-compat. Copilot's applySkills() maps nodeConfig.skills (names) → absolute skill directory paths → SessionConfig.skillDirectories. Missing names surface as a single system warning chunk. Flips COPILOT_CAPABILITIES.skills = true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot has no native JSON-mode equivalent to Claude's outputFormat or Codex's outputSchema, so this mirrors Pi's best-effort approach: augment the user prompt with a "respond with JSON matching this schema" instruction, accumulate the assistant transcript, and parse it on the terminal result chunk. Parse failure leaves structuredOutput unset — the dag-executor's existing dag.structured_output_missing warning handles downstream degradation. Also extracts augmentPromptForJsonSchema and tryParseStructuredOutput into packages/providers/src/shared/structured-output.ts. Pi re-exports them under the original paths for back-compat. Flips COPILOT_CAPABILITIES.structuredOutput = true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Abort-guard: check abortSignal.aborted BEFORE awaiting sendAndWait,
since addEventListener('abort', ...) is a no-op on already-aborted
signals and would otherwise enter the 24h timeout path after cancel.
- Cleanup: wrap session.disconnect() and client.stop() in independent
try/catch-log blocks in the finally so a cleanup throw can't replace
a successful result or the friendly auth/model error.
- Fallback-content flag: flip sawAssistantContent = true when emitting
the final-message fallback, so a session.error received on the same
turn doesn't double-emit as a spurious system warning (Devin finding).
- Model trim: strip leading/trailing whitespace on requestOptions.model
and copilotConfig.model before assigning to SessionConfig.model.
- Binary resolver: replace existsSync() with isExecutableFile() (isFile
+ exec bit on posix, isFile on win32) so env/config/vendor overrides
fail early instead of passing a directory or non-executable to the
SDK.
- Binary resolver test: per-test tmpdir + chmod 0o755, no more shared
/tmp/.archon state.
- Docs: trust-boundary note that enableConfigDiscovery = true bypasses
Archon's workflow validation surface for MCP/skills.
- Registry test: drop stale "Pi is currently the only community
provider" comment now that Copilot is bundled too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Maps nodeConfig.agents (Record<name, AgentDef>) to Copilot SessionConfig.customAgents. Direct pass-through for the fields Copilot's CustomAgentConfig supports: name (from the map key), description, prompt, and tools (allowlist — Copilot has no per-agent denylist). Archon agent fields Copilot cannot represent (model, disallowedTools, skills, maxTurns) surface as one consolidated system-warning chunk per agent. We deliberately do NOT set SessionConfig.agent: Archon's workflow model invokes sub-agents via the Task tool, not by switching the active agent at session start. Flips COPILOT_CAPABILITIES.agents = true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the real-filesystem vendor binary with spyOn(isExecutableFile) + path-fragment assertion, matching the sibling Codex resolver test. The previous test hardcoded the posix binary name 'copilot' while the resolver looks for 'copilot.exe' on win32, so the vendor branch never matched on Windows and the test then leaked to the system-installed Copilot CLI via PATH. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new GitHub Copilot community provider (implementation, tests, registration, and exports), shared utilities for structured-output and skill resolution, updates Pi to use shared utilities, expands core safe-assistant allowlist to include Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Provider as CopilotProvider
participant FS as File System (MCP/Skills)
participant SDK as Copilot SDK (Client/Session)
participant Queue as Async Chunk Queue
User->>Provider: sendQuery(prompt, cwd, options)
activate Provider
Provider->>FS: load MCP config (if any)
Provider->>FS: resolve skill directories (if any)
Provider->>Provider: build SessionConfig (model, env, agents, tools, skills, mcp)
Provider->>SDK: createSession(config) or resumeSession(id)
activate SDK
SDK-->>Provider: emit events (message_delta, reasoning, tool_start/complete, usage, error)
Provider->>Queue: enqueue translated chunks (assistant, thinking, tool, tool_result, result)
Provider->>Provider: tryParseStructuredOutput(transcript) (if requested)
Provider->>Queue: emit final result (tokens/cost, structuredOutput if parsed)
Provider->>SDK: disconnect() / stop()
deactivate SDK
Provider-->>User: async generator yielding MessageChunks
deactivate Provider
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (6)
packages/providers/src/community/copilot/config.ts (2)
30-65: Document the silent-drop fallback behavior.
parseCopilotConfigsilently discards values with wrong types (e.g.,model: 123,logLevel: 'verbose'). This is intentional — see the matchingdrops invalid values silentlytest — but without a comment at the function header a future reader will reasonably expect a thrown error on malformed YAML. Add a brief comment documenting the fallback, per the repo's "document fallback behavior with a comment when intentional and safe" rule.✏️ Suggested doc comment
+/** + * Parse raw YAML-derived config into a typed `CopilotProviderDefaults`. + * + * Fallback behavior: fields with unexpected types (or `logLevel` outside the + * enumerated set) are silently omitted rather than throwing, matching the + * lenient parsing used by other provider config loaders. Callers see only + * well-typed fields in the result. + */ export function parseCopilotConfig(raw: Record<string, unknown>): CopilotProviderDefaults {As per coding guidelines: "Prefer throwing early with a clear error for unsupported or unsafe states — never silently swallow errors or broaden permissions; document fallback behavior with a comment when intentional and safe".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/providers/src/community/copilot/config.ts` around lines 30 - 65, Add a brief doc comment above the parseCopilotConfig function describing that it intentionally drops/masks invalid or mistyped input values (e.g., non-string model or unsupported logLevel) rather than throwing, and reference the CopilotProviderDefaults fallback behavior and the existing test "drops invalid values silently" so future readers know this is intentional and safe; keep the comment concise and include guidance that callers should validate upstream if they need strict errors.
1-28: Consider narrowing the index signature.The
[key: string]: unknownon line 2 opens the interface to arbitrary extra keys, which weakens the type guarantee ofparseCopilotConfig(callers can store anything on the result without TS complaining). If the intent is "forward-compat for unknown SDK fields", the parser itself doesn't propagate unknown keys — only the allowlisted ones — so the index signature may be over-permissive. Consider removing it unless there's a specific consumer that needs it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/providers/src/community/copilot/config.ts` around lines 1 - 28, Remove the overly-broad index signature from the CopilotProviderDefaults interface (the "[key: string]: unknown" entry) so callers cannot attach arbitrary properties; keep only the explicit optional fields (model, copilotCliPath, configDir, enableConfigDiscovery, useLoggedInUser, logLevel). If forward-compatibility for unknown SDK fields is required, instead introduce a separate explicit type (e.g., CopilotProviderRaw or Record<string, unknown>) used only where raw/unvalidated config is handled and ensure parseCopilotConfig continues to return the narrowed CopilotProviderDefaults type.packages/providers/src/shared/structured-output.ts (1)
46-60: Consider distinguishing validnull/primitive JSON from parse failure.
JSON.parse('null'),JSON.parse('42'), andJSON.parse('"str"')all succeed and return non-object values. Since the contract returnsunknownand callers treat any non-undefinedresult as "structured output available", a model emitting a literalnullwould be attached as the structured output rather than triggering thedag.structured_output_missingwarning path. If structured output is expected to always be an object/array, consider rejecting non-object results here.♻️ Optional tightening
try { - return JSON.parse(cleaned); + const parsed: unknown = JSON.parse(cleaned); + // Schema augmentation asks for a JSON object; reject bare primitives/null + // so callers degrade through the existing missing-structured-output path. + if (parsed === null || typeof parsed !== 'object') return undefined; + return parsed; } catch {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/providers/src/shared/structured-output.ts` around lines 46 - 60, The function tryParseStructuredOutput currently returns any JSON value (including null/primitives); change it to only accept parsed values that are objects/arrays: after parsing the cleaned input in tryParseStructuredOutput, assign the result to a variable and return it only if the value is non-null and typeof value === 'object' (arrays are fine since typeof is 'object'); otherwise return undefined to signal "no structured output". Keep the existing trimming and fence-stripping logic and the same error-catching behavior.packages/providers/src/community/copilot/agents-translation.test.ts (1)
71-88: Make the omission tests proveSessionConfigwas built.Both tests fall back to
{}, so they can pass if the provider exits before creating a session.🧪 Proposed assertion hardening
- const cfg = capturedSessionConfigs[0] ?? {}; + expect(capturedSessionConfigs).toHaveLength(1); + const cfg = capturedSessionConfigs[0]!; expect(cfg.customAgents).toBeUndefined();- const cfg = capturedSessionConfigs[0] ?? {}; + expect(capturedSessionConfigs).toHaveLength(1); + const cfg = capturedSessionConfigs[0]!; expect(cfg.customAgents).toBeUndefined();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/providers/src/community/copilot/agents-translation.test.ts` around lines 71 - 88, The tests "'omits customAgents when nodeConfig.agents is absent'" and "'omits customAgents when agents is an empty object'" currently allow a false positive because they default to {} if no session was recorded; update each test to assert that a SessionConfig was actually built by verifying capturedSessionConfigs contains an entry (e.g., length > 0 or a concrete SessionConfig field exists) before checking cfg.customAgents. Locate the assertions around capturedSessionConfigs in the tests for CopilotProvider.sendQuery and add a precondition that the provider created a session (using capturedSessionConfigs or a known SessionConfig property) so the subsequent expect(cfg.customAgents).toBeUndefined() proves SessionConfig construction rather than an early exit.packages/providers/src/community/copilot/tool-restrictions.test.ts (1)
76-82: Assert that a session config was captured before checking omitted fields.The
?? {}fallback means this test passes even ifsendQuery()never reachescreateSession().🧪 Proposed assertion hardening
await drain(new CopilotProvider().sendQuery('hi', '/repo', undefined, { model: 'gpt-5' })); - const cfg = capturedSessionConfigs[0] ?? {}; + expect(capturedSessionConfigs).toHaveLength(1); + const cfg = capturedSessionConfigs[0]!; expect(cfg.availableTools).toBeUndefined(); expect(cfg.excludedTools).toBeUndefined();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/providers/src/community/copilot/tool-restrictions.test.ts` around lines 76 - 82, The test currently uses a fallback (capturedSessionConfigs[0] ?? {}) which masks the case where createSession() was never called; update the test to first assert that a session config was actually captured by checking capturedSessionConfigs.length (or that capturedSessionConfigs[0] is defined) before inspecting fields. Specifically, in the test that calls new CopilotProvider().sendQuery('hi', '/repo', ...), add an assertion like expect(capturedSessionConfigs.length).toBeGreaterThan(0) or expect(capturedSessionConfigs[0]).toBeDefined() prior to reading cfg and then keep the existing checks for cfg.availableTools and cfg.excludedTools.packages/providers/src/community/copilot/skills-translation.test.ts (1)
111-112: Don’t let omission checks pass without a captured session config.Line 111 and Line 157 fall back to
{}, so these tests would still pass ifsendQuery()stopped beforecreateSession()and never produced a session config.🧪 Proposed assertion hardening
- const cfg = capturedSessionConfigs[0] ?? {}; + expect(capturedSessionConfigs).toHaveLength(1); + const cfg = capturedSessionConfigs[0]!; expect(cfg.skillDirectories).toBeUndefined();- const cfg = capturedSessionConfigs[0] ?? {}; + expect(capturedSessionConfigs).toHaveLength(1); + const cfg = capturedSessionConfigs[0]!; expect(cfg.skillDirectories).toBeUndefined();Also applies to: 157-158
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/providers/src/community/copilot/skills-translation.test.ts` around lines 111 - 112, The test is currently allowing a missing session config by falling back to {} (const cfg = capturedSessionConfigs[0] ?? {}), so if sendQuery() never reached createSession() the omission checks would still pass; change the assertions to first verify a session config was captured (e.g., expect(capturedSessionConfigs.length).toBeGreaterThan(0) or expect(capturedSessionConfigs).not.toHaveLength(0)), then assign without a fallback (const cfg = capturedSessionConfigs[0]) and keep the existing expect(cfg.skillDirectories).toBeUndefined(); apply the same change to the other occurrence around the later assertion so both checks fail fast if no session was created.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.archon/workflows/e2e-copilot-smoke.yaml:
- Around line 15-25: The test currently interpolates the AI-controlled value
"$simple.output" into output="..." which allows shell substitutions to execute;
change the assignment so the response is captured as raw data via a quoted
here-doc (use a single-quoted delimiter like <<'EOF' and read into the variable
with a raw read to avoid backslash or command expansion) instead of direct
interpolation, then run the existing rg check against that safe variable; update
the lines that reference output="$simple.output" and the subsequent checks to
use the here-doc-captured variable (and ensure you use a raw/read -r form so
backslashes and substitutions aren't processed).
In `@packages/providers/package.json`:
- Line 28: The package.json currently uses a caret range for the preview SDK
entry "@github/copilot-sdk": "^0.2.2", which will allow future 0.2.x changes
(some breaking); decide and update that dependency to the intended immutability
level — replace the caret with a tilde "~0.2.2" to allow only patch fixes within
0.2.x, or pin exactly to "0.2.2" for full stability, then commit with a note
stating the chosen policy for "@github/copilot-sdk".
In `@packages/providers/src/community/copilot/binary-resolver.test.ts`:
- Around line 38-50: The test deletes process.env.COPILOT_CLI_PATH in the
beforeEach and never restores it, risking cross-test pollution; capture the
original value into a variable (e.g., origCopilotCliPath) before tests run, then
in beforeEach clear or set process.env.COPILOT_CLI_PATH as the test needs, and
in afterEach restore it by assigning back origCopilotCliPath when defined or
deleting the env var if it was undefined; update the existing
beforeEach/afterEach around tmpRoot/archonHome to use origCopilotCliPath so the
environment is restored after each test.
In `@packages/providers/src/community/copilot/binary-resolver.ts`:
- Around line 89-123: The logs currently include full executable paths (e.g.,
envPath, configCopilotCliPath, vendorBinaryPath, fromPath) which may contain
PII; update the getLog().info calls inside binary-resolver (the branches using
envPath, configCopilotCliPath, vendorBinaryPath, fromPath and functions like
getVendorBinaryName(), resolveFromPath(), BUNDLED_IS_BINARY, isExecutableFile())
to avoid emitting full paths by default—log the resolution source and either the
basename/redacted path (use path.basename()) or omit the path entirely, and only
include the full binaryPath under a debug/verbose logging level so sensitive
home/username components are not logged at info level.
- Around line 16-21: The isExecutableFile function currently uses
_statSync(...).mode & 0o111 which only checks for any execute bit, not whether
the current process/user can execute the file; replace that check with
fs.accessSync(path, fs.constants.X_OK) to test executability for the current
user (keep the early return for !stat.isFile() and the Windows shortcut
returning true), and catch any thrown error from accessSync to return false;
reference isExecutableFile and _statSync to locate the change and use
fs.accessSync and fs.constants.X_OK for the permission check.
In `@packages/providers/src/community/copilot/mcp-translation.test.ts`:
- Around line 69-82: The afterEach block currently deletes
COPILOT_MCP_TEST_TOKEN unconditionally which can clobber a pre-existing env var;
update the test setup to save the original value (e.g., capture
process.env.COPILOT_MCP_TEST_TOKEN in beforeEach or at top-level) and then in
afterEach restore it (set process.env.COPILOT_MCP_TEST_TOKEN back to the saved
value or delete only if it was originally undefined). Modify the existing
beforeEach/afterEach helpers around applyMcpServers / workDir to use the saved
token variable so tests restore the original environment state.
In `@packages/providers/src/community/copilot/provider.ts`:
- Around line 281-308: buildFriendlyCopilotError currently prefers the thrown
error message and can miss actionable details in lastSessionError (e.g.,
session.error from the SDK); modify the function to check both rawMessage and
lastSessionError when classifying with isModelAccessError and the
authentication/login checks, and when constructing the returned Error include
the session detail (lastSessionError) alongside the primary message so users see
the SDK-provided actionable text; reference the rawMessage, lastSessionError,
isModelAccessError, and the authentication string checks in your changes.
- Around line 377-580: The generator doesn't cancel the Copilot SDK run if the
caller stops iterating early; hoist the session variable out of the background
IIFE (currently declared inside sendQuery's async block as `let session`) so the
outer generator can access it, and after the for-await loop (or in a finally
surrounding it) call `session?.abort()` (catch/log errors) to terminate
`session.sendAndWait()` when the generator is closed early; keep the existing
`onAbort`/`abortSignal` logic intact and ensure you don't replace the primary
error by swallowing abort errors (use getLog().warn).
---
Nitpick comments:
In `@packages/providers/src/community/copilot/agents-translation.test.ts`:
- Around line 71-88: The tests "'omits customAgents when nodeConfig.agents is
absent'" and "'omits customAgents when agents is an empty object'" currently
allow a false positive because they default to {} if no session was recorded;
update each test to assert that a SessionConfig was actually built by verifying
capturedSessionConfigs contains an entry (e.g., length > 0 or a concrete
SessionConfig field exists) before checking cfg.customAgents. Locate the
assertions around capturedSessionConfigs in the tests for
CopilotProvider.sendQuery and add a precondition that the provider created a
session (using capturedSessionConfigs or a known SessionConfig property) so the
subsequent expect(cfg.customAgents).toBeUndefined() proves SessionConfig
construction rather than an early exit.
In `@packages/providers/src/community/copilot/config.ts`:
- Around line 30-65: Add a brief doc comment above the parseCopilotConfig
function describing that it intentionally drops/masks invalid or mistyped input
values (e.g., non-string model or unsupported logLevel) rather than throwing,
and reference the CopilotProviderDefaults fallback behavior and the existing
test "drops invalid values silently" so future readers know this is intentional
and safe; keep the comment concise and include guidance that callers should
validate upstream if they need strict errors.
- Around line 1-28: Remove the overly-broad index signature from the
CopilotProviderDefaults interface (the "[key: string]: unknown" entry) so
callers cannot attach arbitrary properties; keep only the explicit optional
fields (model, copilotCliPath, configDir, enableConfigDiscovery,
useLoggedInUser, logLevel). If forward-compatibility for unknown SDK fields is
required, instead introduce a separate explicit type (e.g., CopilotProviderRaw
or Record<string, unknown>) used only where raw/unvalidated config is handled
and ensure parseCopilotConfig continues to return the narrowed
CopilotProviderDefaults type.
In `@packages/providers/src/community/copilot/skills-translation.test.ts`:
- Around line 111-112: The test is currently allowing a missing session config
by falling back to {} (const cfg = capturedSessionConfigs[0] ?? {}), so if
sendQuery() never reached createSession() the omission checks would still pass;
change the assertions to first verify a session config was captured (e.g.,
expect(capturedSessionConfigs.length).toBeGreaterThan(0) or
expect(capturedSessionConfigs).not.toHaveLength(0)), then assign without a
fallback (const cfg = capturedSessionConfigs[0]) and keep the existing
expect(cfg.skillDirectories).toBeUndefined(); apply the same change to the other
occurrence around the later assertion so both checks fail fast if no session was
created.
In `@packages/providers/src/community/copilot/tool-restrictions.test.ts`:
- Around line 76-82: The test currently uses a fallback
(capturedSessionConfigs[0] ?? {}) which masks the case where createSession() was
never called; update the test to first assert that a session config was actually
captured by checking capturedSessionConfigs.length (or that
capturedSessionConfigs[0] is defined) before inspecting fields. Specifically, in
the test that calls new CopilotProvider().sendQuery('hi', '/repo', ...), add an
assertion like expect(capturedSessionConfigs.length).toBeGreaterThan(0) or
expect(capturedSessionConfigs[0]).toBeDefined() prior to reading cfg and then
keep the existing checks for cfg.availableTools and cfg.excludedTools.
In `@packages/providers/src/shared/structured-output.ts`:
- Around line 46-60: The function tryParseStructuredOutput currently returns any
JSON value (including null/primitives); change it to only accept parsed values
that are objects/arrays: after parsing the cleaned input in
tryParseStructuredOutput, assign the result to a variable and return it only if
the value is non-null and typeof value === 'object' (arrays are fine since
typeof is 'object'); otherwise return undefined to signal "no structured
output". Keep the existing trimming and fence-stripping logic and the same
error-catching behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 726053e6-274e-4f7e-b850-15e38885d083
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (27)
.archon/workflows/e2e-copilot-smoke.yamlpackages/core/src/config/config-loader.tspackages/docs-web/src/content/docs/getting-started/ai-assistants.mdpackages/providers/package.jsonpackages/providers/src/community/copilot/agents-translation.test.tspackages/providers/src/community/copilot/binary-resolver.test.tspackages/providers/src/community/copilot/binary-resolver.tspackages/providers/src/community/copilot/capabilities.tspackages/providers/src/community/copilot/config.test.tspackages/providers/src/community/copilot/config.tspackages/providers/src/community/copilot/index.tspackages/providers/src/community/copilot/mcp-translation.test.tspackages/providers/src/community/copilot/provider-hardening.test.tspackages/providers/src/community/copilot/provider.test.tspackages/providers/src/community/copilot/provider.tspackages/providers/src/community/copilot/registration.tspackages/providers/src/community/copilot/skills-translation.test.tspackages/providers/src/community/copilot/structured-output.test.tspackages/providers/src/community/copilot/tool-restrictions.test.tspackages/providers/src/community/pi/event-bridge.tspackages/providers/src/community/pi/options-translator.tspackages/providers/src/community/pi/provider.tspackages/providers/src/index.tspackages/providers/src/registry.test.tspackages/providers/src/registry.tspackages/providers/src/shared/skills.tspackages/providers/src/shared/structured-output.ts
| "dependencies": { | ||
| "@anthropic-ai/claude-agent-sdk": "^0.2.89", | ||
| "@archon/paths": "workspace:*", | ||
| "@github/copilot-sdk": "^0.2.2", |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
@github/copilot-sdk changelog breaking changes 0.2.x
💡 Result:
The GitHub Copilot SDK (github/copilot-sdk) 0.2.x series, starting with v0.2.0 (2026-03-20), v0.2.1 (2026-04-03), and v0.2.2 (2026-04-10), includes the following breaking changes documented in the official CHANGELOG.md and release notes: All SDKs: - autoRestart option deprecated and removed from client options (no effect now, remove references) [1,2,3] Node.js-specific (v0.2.1): - onElicitationRequest handler signature changed from (request, invocation) to single ElicitationContext argument. Access via context.sessionId and context.message [1,15] Python-specific (major API overhaul in v0.2.0): - CopilotClient constructor: CopilotClientOptions TypedDict replaced by dataclasses like ExternalServerConfig(url=...) or SubprocessConfig(cli_path=..., log_level=...) [1,2,3,16] - create_session and resume_session: SessionConfig/ResumeSessionConfig TypedDicts removed; use keyword arguments directly (e.g., on_permission_request=..., model=...). on_permission_request now required [1,2,3,16] - send and send_and_wait: MessageOptions TypedDict removed; first positional arg is prompt str, attachments/mode as kwargs [1,2,3,16] - MessageOptions, SessionConfig, ResumeSessionConfig no longer exported publicly [1,2,3] - Internal modules renamed: copilot.jsonrpc -> copilot._jsonrpc, etc. [1,2,3] - copilot.types module removed; import types from copilot, copilot.generated.session_events, copilot.session, copilot.tools, etc. [16,18] - PermissionRequest is now a dataclass (.kind attribute, not dict access) [16] Go-specific (v0.2.0): - Client.Start context cancel no longer kills CLI process; use client.Stop explicitly [1,2,3] - LogOptions.Ephemeral changed from bool to *bool; use copilot.Bool(true) [1,2,3] Other notes: - v0.2.0 described as "big update with broad API refinements" [1,2] - No comprehensive migration guide found; changes detailed in CHANGELOG with before/after code examples for Python [1,13,16]. - Issues mention additional compatibility notes, e.g., ModelCapabilities for BYOK in later previews [9].
Citations:
- 1: https://github.com/github/copilot-sdk/blob/main/CHANGELOG.md
- 2: https://github.com/github/copilot-sdk/releases
- 3: https://github.com/github/copilot-sdk/blob/40887393/CHANGELOG.md
- 4: github/copilot-sdk@5ecfc94
- 5: https://github.com/github/copilot-sdk/releases/tag/v0.2.1
- 6: Python: Migrate GitHub Copilot package to SDK 0.2.x microsoft/agent-framework#5107
🏁 Script executed:
# Verify PR context to understand what the "pinned" claim refers to
git log --oneline --all | head -20Repository: coleam00/Archon
Length of output: 1792
🏁 Script executed:
# Find recent changes to packages/providers/package.json related to copilot-sdk
git log --oneline -n 50 -- packages/providers/package.jsonRepository: coleam00/Archon
Length of output: 138
🏁 Script executed:
# Get the PR description and context for the most recent change
git show e0b57a8f --statRepository: coleam00/Archon
Length of output: 50371
🏁 Script executed:
# Get the current state of the package.json file
cat -n packages/providers/package.json | grep -A 5 -B 5 "copilot-sdk"Repository: coleam00/Archon
Length of output: 485
Confirm intent for ^0.2.2 version range on a public-preview SDK.
@github/copilot-sdk is in public preview with documented breaking changes across the 0.2.x series (v0.2.0, v0.2.1, v0.2.2): autoRestart removal, Node.js onElicitationRequest handler signature change, major Python API overhaul, and Go Client.Start behavior change. The caret range allows upgrades within >=0.2.2 <0.3.0, so future 0.2.x patches and their breaking changes will be auto-pulled, while 0.3.0 will remain excluded. If strict pinning was intended, consider ~0.2.2 or exact 0.2.2. Otherwise, this is fine—just confirm the range matches your intent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/providers/package.json` at line 28, The package.json currently uses
a caret range for the preview SDK entry "@github/copilot-sdk": "^0.2.2", which
will allow future 0.2.x changes (some breaking); decide and update that
dependency to the intended immutability level — replace the caret with a tilde
"~0.2.2" to allow only patch fixes within 0.2.x, or pin exactly to "0.2.2" for
full stability, then commit with a note stating the chosen policy for
"@github/copilot-sdk".
There was a problem hiding this comment.
Acknowledged but skipping — for 0.x.y, npm semver makes ^0.2.2 and ~0.2.2 resolve to the same range (>=0.2.2 <0.3.0). And since bun.lock is committed, the installed version is exact regardless of the spec. The only meaningful tightening would be exact pin "0.2.2", which I don't think is worth losing patch fixes for. Will pin if 0.3.x lands with a known regression.
There was a problem hiding this comment.
Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit.
| getLog().info({ binaryPath: envPath, source: 'env' }, 'copilot.binary_resolved'); | ||
| return envPath; | ||
| } | ||
|
|
||
| if (configCopilotCliPath) { | ||
| if (!isExecutableFile(configCopilotCliPath)) { | ||
| throw new Error( | ||
| `assistants.copilot.copilotCliPath is set to "${configCopilotCliPath}" but it is not an executable file.\n` + | ||
| 'Please verify the path in .archon/config.yaml points to the Copilot CLI executable (chmod +x if needed).' | ||
| ); | ||
| } | ||
| getLog().info( | ||
| { binaryPath: configCopilotCliPath, source: 'config' }, | ||
| 'copilot.binary_resolved' | ||
| ); | ||
| return configCopilotCliPath; | ||
| } | ||
|
|
||
| if (BUNDLED_IS_BINARY) { | ||
| const vendorBinaryName = getVendorBinaryName(); | ||
| if (vendorBinaryName) { | ||
| const vendorBinaryPath = join(getArchonHome(), COPILOT_VENDOR_DIR, vendorBinaryName); | ||
| if (isExecutableFile(vendorBinaryPath)) { | ||
| getLog().info( | ||
| { binaryPath: vendorBinaryPath, source: 'vendor' }, | ||
| 'copilot.binary_resolved' | ||
| ); | ||
| return vendorBinaryPath; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| const fromPath = resolveFromPath(); | ||
| if (fromPath && isExecutableFile(fromPath)) { | ||
| getLog().info({ binaryPath: fromPath, source: 'path' }, 'copilot.binary_resolved'); |
There was a problem hiding this comment.
Avoid logging full local executable paths.
binaryPath can include usernames or home-directory details. Log the resolution source and a redacted/basename path instead, or omit the path unless debug logging explicitly needs it.
🔒 Proposed logging adjustment
- getLog().info({ binaryPath: envPath, source: 'env' }, 'copilot.binary_resolved');
+ getLog().info({ source: 'env' }, 'copilot.binary_resolved');
@@
- { binaryPath: configCopilotCliPath, source: 'config' },
+ { source: 'config' },
@@
- { binaryPath: vendorBinaryPath, source: 'vendor' },
+ { source: 'vendor' },
@@
- getLog().info({ binaryPath: fromPath, source: 'path' }, 'copilot.binary_resolved');
+ getLog().info({ source: 'path' }, 'copilot.binary_resolved');As per coding guidelines, “Never log API keys, tokens … user message content, or PII.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| getLog().info({ binaryPath: envPath, source: 'env' }, 'copilot.binary_resolved'); | |
| return envPath; | |
| } | |
| if (configCopilotCliPath) { | |
| if (!isExecutableFile(configCopilotCliPath)) { | |
| throw new Error( | |
| `assistants.copilot.copilotCliPath is set to "${configCopilotCliPath}" but it is not an executable file.\n` + | |
| 'Please verify the path in .archon/config.yaml points to the Copilot CLI executable (chmod +x if needed).' | |
| ); | |
| } | |
| getLog().info( | |
| { binaryPath: configCopilotCliPath, source: 'config' }, | |
| 'copilot.binary_resolved' | |
| ); | |
| return configCopilotCliPath; | |
| } | |
| if (BUNDLED_IS_BINARY) { | |
| const vendorBinaryName = getVendorBinaryName(); | |
| if (vendorBinaryName) { | |
| const vendorBinaryPath = join(getArchonHome(), COPILOT_VENDOR_DIR, vendorBinaryName); | |
| if (isExecutableFile(vendorBinaryPath)) { | |
| getLog().info( | |
| { binaryPath: vendorBinaryPath, source: 'vendor' }, | |
| 'copilot.binary_resolved' | |
| ); | |
| return vendorBinaryPath; | |
| } | |
| } | |
| } | |
| const fromPath = resolveFromPath(); | |
| if (fromPath && isExecutableFile(fromPath)) { | |
| getLog().info({ binaryPath: fromPath, source: 'path' }, 'copilot.binary_resolved'); | |
| getLog().info({ source: 'env' }, 'copilot.binary_resolved'); | |
| return envPath; | |
| } | |
| if (configCopilotCliPath) { | |
| if (!isExecutableFile(configCopilotCliPath)) { | |
| throw new Error( | |
| `assistants.copilot.copilotCliPath is set to "${configCopilotCliPath}" but it is not an executable file.\n` + | |
| 'Please verify the path in .archon/config.yaml points to the Copilot CLI executable (chmod +x if needed).' | |
| ); | |
| } | |
| getLog().info( | |
| { source: 'config' }, | |
| 'copilot.binary_resolved' | |
| ); | |
| return configCopilotCliPath; | |
| } | |
| if (BUNDLED_IS_BINARY) { | |
| const vendorBinaryName = getVendorBinaryName(); | |
| if (vendorBinaryName) { | |
| const vendorBinaryPath = join(getArchonHome(), COPILOT_VENDOR_DIR, vendorBinaryName); | |
| if (isExecutableFile(vendorBinaryPath)) { | |
| getLog().info( | |
| { source: 'vendor' }, | |
| 'copilot.binary_resolved' | |
| ); | |
| return vendorBinaryPath; | |
| } | |
| } | |
| } | |
| const fromPath = resolveFromPath(); | |
| if (fromPath && isExecutableFile(fromPath)) { | |
| getLog().info({ source: 'path' }, 'copilot.binary_resolved'); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/providers/src/community/copilot/binary-resolver.ts` around lines 89
- 123, The logs currently include full executable paths (e.g., envPath,
configCopilotCliPath, vendorBinaryPath, fromPath) which may contain PII; update
the getLog().info calls inside binary-resolver (the branches using envPath,
configCopilotCliPath, vendorBinaryPath, fromPath and functions like
getVendorBinaryName(), resolveFromPath(), BUNDLED_IS_BINARY, isExecutableFile())
to avoid emitting full paths by default—log the resolution source and either the
basename/redacted path (use path.basename()) or omit the path entirely, and only
include the full binaryPath under a debug/verbose logging level so sensitive
home/username components are not logged at info level.
There was a problem hiding this comment.
Skipping. The resolved path is the load-bearing piece of debug info for "which Copilot binary did Archon pick up?" — without it, the source: vendor|env|config|path label tells you the tier but not which CLI actually ran (someone could have multiple installs on PATH). Username-in-path is the weakest tier of PII, and the rest of the codebase already logs paths (worktree paths, repo paths, cwd, etc.) at info level. The CLAUDE.md rule is aimed at tokens / secrets / user message content, not ~/Users/foo/.archon/.... Happy to revisit if Pino structured logs ever become a shared / shipped surface.
There was a problem hiding this comment.
Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit.
Single conflict in packages/providers/package.json test script — resolved as union (all pi/ tests from upstream including new provider-lazy-load.test.ts plus the copilot/ test files from this branch). Upstream's pi-provider lazy-load refactor (coleam00#1355) auto-merged cleanly with this branch's pi → shared/ extraction because the extracted shared modules have no Pi SDK deps, so the compiled-binary contract is preserved.
Follow-up to the upstream/dev merge — the Edit that resolved the test-script conflict only rewrote the body, leaving the opening marker. JSON now parses, prettier clean.
|
Hi @popemkt — thanks for opening this PR. This repository uses a PR template at
Could you fill those out (even briefly)? The template helps reviewers understand scope, risk, and rollback — it speeds up review significantly. If a section genuinely doesn't apply, just write "N/A" in it rather than leaving it blank. |
|
Thanks @Wirasm — sorry for the rough edges. I've filled in the five missing sections directly in the PR description (rather than as a reply, so future readers see the full template):
Also added a proper Linked Issue section (the Let me know if any of those want more detail. |
Quick wins from CodeRabbit's review pass — corrections only, no new
behavior. Live smoke (Linux + gpt-5-mini, Copilot CLI 1.0.36) passes
end-to-end after the bash hardening.
Provider runtime:
- binary-resolver: use accessSync(X_OK) instead of mode & 0o111 — owner
vs. world execute bits are not the same thing for the current process.
- provider: classify against both thrown error and lastSessionError so
session.error details (auth, model access) surface in the user-facing
message.
- provider: abort the Copilot session on early generator close — prevents
sendAndWait from running until the 24h ceiling when the consumer breaks
out of the for-await loop.
- shared/structured-output: reject bare null / number / string JSON
results so callers degrade through the structured_output_missing path
instead of attaching a primitive.
- config: document the lenient parser's silent-drop fallback.
E2E smoke YAML (.archon/workflows/e2e-copilot-smoke.yaml):
- Wrap $simple.output in single quotes so model-controlled command
substitution / backticks are treated as data, not executed. CodeRabbit's
proposed heredoc would have broken YAML's literal-block indentation;
the single-quote variant is a strict improvement over the previous
double-quoted interpolation. Switched to grep -F for portability.
Tests:
- binary-resolver / mcp-translation: save and restore env vars in
beforeEach/afterEach instead of unconditionally deleting.
- agents-, mcp-, skills-, tool-restrictions translation tests: replace
capturedSessionConfigs[0] ?? {} with explicit toHaveLength(1) so omit-
field tests can no longer pass when sendQuery exits before
createSession.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Worked through the CodeRabbit pass. Pushed in Applied (10)
Not applied (3) — see inline replies for the rationale
Validation
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/providers/src/community/copilot/provider.ts (1)
502-515: Pre-aborted signal short-circuit is correct, but session is created before the check.Lines 438–440 create/resume the session before the
abortSignal?.abortedcheck on line 506. If a caller passes an already-aborted signal, the SDK still performs the session round-trip first, thenDOMException('AbortError')is thrown and thefinallydisconnects. Functionally fine, but consider checkingabortSignal.abortedbeforecreateSession/resumeSessionto avoid the wasted session creation. Optional.Suggested early-abort placement
try { + if (requestOptions?.abortSignal?.aborted) { + throw new DOMException('Copilot sendQuery aborted before start', 'AbortError'); + } session = resumeSessionId ? await client.resumeSession(resumeSessionId, sessionConfig) : await client.createSession(sessionConfig); activeSession = session; @@ - const abortSignal = requestOptions?.abortSignal; - // `addEventListener('abort', ...)` is a no-op on an already-aborted - // signal, so short-circuit before handing the 24-hour sendAndWait - // path a signal that will never fire. - if (abortSignal?.aborted) { - throw new DOMException('Copilot sendQuery aborted before start', 'AbortError'); - } + const abortSignal = requestOptions?.abortSignal;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/providers/src/community/copilot/provider.ts` around lines 502 - 515, The pre-aborted signal check (abortSignal?.aborted) happens after creating/resuming the session, causing unnecessary session creation when callers pass an already-aborted signal; move the abortSignal?.aborted check to before any call that creates a session (before createSession/resumeSession) and only proceed to call createSession or resumeSession once the signal is confirmed not aborted; keep the existing onAbort handler and abort cleanup (session.abort(), finally disconnect) as-is so that runtime aborts are still handled correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/providers/src/community/copilot/provider.ts`:
- Around line 502-515: The pre-aborted signal check (abortSignal?.aborted)
happens after creating/resuming the session, causing unnecessary session
creation when callers pass an already-aborted signal; move the
abortSignal?.aborted check to before any call that creates a session (before
createSession/resumeSession) and only proceed to call createSession or
resumeSession once the signal is confirmed not aborted; keep the existing
onAbort handler and abort cleanup (session.abort(), finally disconnect) as-is so
that runtime aborts are still handled correctly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0fb20b38-e9d9-4759-88ac-3856eee6823a
📒 Files selected for processing (10)
.archon/workflows/e2e-copilot-smoke.yamlpackages/providers/src/community/copilot/agents-translation.test.tspackages/providers/src/community/copilot/binary-resolver.test.tspackages/providers/src/community/copilot/binary-resolver.tspackages/providers/src/community/copilot/config.tspackages/providers/src/community/copilot/mcp-translation.test.tspackages/providers/src/community/copilot/provider.tspackages/providers/src/community/copilot/skills-translation.test.tspackages/providers/src/community/copilot/tool-restrictions.test.tspackages/providers/src/shared/structured-output.ts
✅ Files skipped from review due to trivial changes (1)
- packages/providers/src/community/copilot/config.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- .archon/workflows/e2e-copilot-smoke.yaml
- packages/providers/src/community/copilot/agents-translation.test.ts
|
Thanks for putting this together, @popemkt! Since this is a community provider (
A smoke test workflow in Once that's in, happy to merge. |
Resolves the conflict on packages/providers/src/community/pi/provider.ts. - Upstream's coleam00#1284 (ModelRegistry) and our shared/structured-output extraction both touch the same region. Upstream removed the inline augmentPromptForJsonSchema call site that coleam00#1284 didn't itself need; our branch had moved that function to shared/. Resolution keeps the shared/ extraction (single source of truth for both Pi and Copilot) and re-exports it from pi/provider.ts under the original name so existing Pi callers and tests stay byte-for-byte. - Drops the dead-code lookupPiModel/GetModelFn helper that was a stale leftover from an earlier merge attempt — never had a caller and was superseded by ModelRegistry upstream. - Picks up coleam00#1431 — moves e2e-copilot-smoke.yaml under test-workflows/ alongside the other e2e-*.yaml smokes. Adds e2e-copilot-all-features smoke (mirrors e2e-minimax-smoke): basic chat (PONG) + effort: high (17×23 = 391) + denied_tools: [shell, write] (DENIED_OK) + output_format JSON (best-effort via shared/ structured-output, parsed as {model, ok}). Single bash assert verifies all four end-to-end. Doubles as adoption docs. Validation: - bun run check:bundled, type-check, lint, format:check — all green. - bun --filter @archon/providers test — fully green (Pi included). - Live smoke (Linux + gpt-5-mini, Copilot CLI 1.0.36): e2e-copilot-smoke → 12s, PASS e2e-copilot-all-features → 25s, PASS (all four caps) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.archon/workflows/test-workflows/e2e-copilot-all-features.yaml (1)
22-24: Idle timeouts vs. expected runtime.
hellousesidle_timeout: 60000(60s) andreasoninguses90000(90s) — fine — but the file header advertises~45-90stotal runtime. With four sequential-ish nodes (3 of the 4 fan out fromhello, but assert isall_success), worst-case is ~3min before the assert fails. Consider tightening the header range or noting it's a worst-case ceiling, otherwise CI runners may flag the workflow as hung-looking on slow days.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.archon/workflows/test-workflows/e2e-copilot-all-features.yaml around lines 22 - 24, The workflow's advertised runtime range doesn't match the worst-case sum of node idle_timeouts; update the header or the node timeouts so they align: either shorten idle_timeout on the 'hello' (currently 60000) and/or 'reasoning' (currently 90000) nodes to fit the advertised "~45-90s" ceiling, or expand the header note to explicitly state the worst-case ceiling (≈3 minutes) when accounting for sequential nodes and the final 'assert' with all_success aggregation; reference and adjust the 'hello', 'reasoning', and 'assert' nodes accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.archon/workflows/test-workflows/e2e-copilot-all-features.yaml:
- Around line 35-46: The test labeled id: tool_restricted currently only asserts
the transcript contains "DENIED_OK" (the prompt in prompt: | and the
denied_tools: [shell, write] setting), but the comment wrongly claims it
verifies that the shell tool was not invoked; update the test to either (A)
change the comment to accurately state it only verifies prompt compliance (i.e.,
that the model output contains DENIED_OK) or (B) change the test to exercise
deny-list behavior by replacing the prompt with a task that requires shell
access (e.g., "list files in cwd") and then assert the model refused or the SDK
returned a denied-tool error, or (C) capture and assert the SDK request
payload/debug log to ensure excludedTools includes "shell"; refer to id:
tool_restricted, prompt, denied_tools and the transcript assertion around lines
113-118 when making the change.
---
Nitpick comments:
In @.archon/workflows/test-workflows/e2e-copilot-all-features.yaml:
- Around line 22-24: The workflow's advertised runtime range doesn't match the
worst-case sum of node idle_timeouts; update the header or the node timeouts so
they align: either shorten idle_timeout on the 'hello' (currently 60000) and/or
'reasoning' (currently 90000) nodes to fit the advertised "~45-90s" ceiling, or
expand the header note to explicitly state the worst-case ceiling (≈3 minutes)
when accounting for sequential nodes and the final 'assert' with all_success
aggregation; reference and adjust the 'hello', 'reasoning', and 'assert' nodes
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5e67ecc0-2171-426e-9928-aa0c8ebccee9
📒 Files selected for processing (4)
.archon/workflows/test-workflows/e2e-copilot-all-features.yaml.archon/workflows/test-workflows/e2e-copilot-smoke.yamlpackages/docs-web/src/content/docs/getting-started/ai-assistants.mdpackages/providers/src/community/pi/provider.ts
✅ Files skipped from review due to trivial changes (1)
- packages/providers/src/community/pi/provider.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/docs-web/src/content/docs/getting-started/ai-assistants.md
| # 3. denied_tools — the model is asked to do something it would normally | ||
| # use the shell tool for. With shell denied, it must decline or fall | ||
| # back to inline reasoning. The assert below checks the model did NOT | ||
| # invoke the shell tool by inspecting the result text for a refusal / | ||
| # inline-only marker. | ||
| - id: tool_restricted | ||
| prompt: | | ||
| You have NO shell access. Without running any tools, reply with exactly: | ||
| DENIED_OK | ||
| denied_tools: [shell, write] | ||
| idle_timeout: 60000 | ||
| depends_on: [hello] |
There was a problem hiding this comment.
tool_restricted assertion verifies prompt compliance, not deny-list enforcement.
The comment at lines 35-39 says the assert "checks the model did NOT invoke the shell tool", but the actual check at lines 113-118 only greps the final transcript for DENIED_OK. A model can:
- emit
DENIED_OKwhile attempting a shell call (which would be blocked by the SDK regardless), or - refuse correctly without emitting
DENIED_OK(false negative).
In other words: a DENIED_OK substring proves the model followed the prompt; it doesn't prove excludedTools was wired through, which is the actual capability you're smoking. Two cheap improvements:
- Use a prompt that requires the shell tool to succeed (e.g., "list files in cwd"), and assert refusal/inability — this exercises the deny-list path.
- Or capture and assert on the SDK request payload via debug logging if available.
At minimum, soften the comment so the test claim matches what the assertion proves.
- # 3. denied_tools — the model is asked to do something it would normally
- # use the shell tool for. With shell denied, it must decline or fall
- # back to inline reasoning. The assert below checks the model did NOT
- # invoke the shell tool by inspecting the result text for a refusal /
- # inline-only marker.
+ # 3. denied_tools — the model is told it has no shell access and asked
+ # to reply with DENIED_OK. NOTE: this asserts the model followed the
+ # prompt; SDK-level enforcement of `excludedTools` is covered by
+ # tool-restrictions.test.ts. Treat this as a regression smoke for
+ # the YAML→provider config pass-through, not for SDK enforcement.Also applies to: 113-118
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.archon/workflows/test-workflows/e2e-copilot-all-features.yaml around lines
35 - 46, The test labeled id: tool_restricted currently only asserts the
transcript contains "DENIED_OK" (the prompt in prompt: | and the denied_tools:
[shell, write] setting), but the comment wrongly claims it verifies that the
shell tool was not invoked; update the test to either (A) change the comment to
accurately state it only verifies prompt compliance (i.e., that the model output
contains DENIED_OK) or (B) change the test to exercise deny-list behavior by
replacing the prompt with a task that requires shell access (e.g., "list files
in cwd") and then assert the model refused or the SDK returned a denied-tool
error, or (C) capture and assert the SDK request payload/debug log to ensure
excludedTools includes "shell"; refer to id: tool_restricted, prompt,
denied_tools and the transcript assertion around lines 113-118 when making the
change.
Extends the smoke from 4 to 7 live capabilities, plus a cleanup pass.
New nodes:
- setup_mcp_fixture / setup_skills_fixture (bash) — stage the
fixtures at run time so they don't pollute the repo when idle.
- mcp_demo (AI, mcp: ./.archon/test-fixtures/copilot-mcp.json) —
spawns the canonical @modelcontextprotocol/server-everything stdio
MCP server and asks the model to call its add(2,3) tool.
- skills_demo (AI, skills: [copilot-smoke]) — uses a staged SKILL.md
with a fixed marker.
- agents_demo (AI, inline agents: { smoke-helper: ... }) — asks the
model to invoke a custom sub-agent via the Task tool and surface
its marker.
- cleanup (bash, trigger_rule: all_done) — removes the fixtures
regardless of pass/fail.
Assert hardening:
- Drops the redundant outer single quotes around $nodeId.output
references. The DAG executor already shell-escapes for bash nodes
(coleam00#591 + dag-executor.ts:282 substituteNodeOutputRefs with
escapedForBash=true). Wrapping the engine's pre-quoted value in a
second pair of quotes broke parsing on outputs containing
apostrophes / parens (the `it's the "copilot-smoke" skill (listed
in available skills)` case).
- Routes the human-readable results table to stderr so it surfaces in
the terminal — successful bash node stdout is captured silently as
the node's output for downstream substitution.
- Skills check is soft: assert non-empty + log a WARN if the model
paraphrases. The deterministic proof is the Pino
copilot.skills_resolved event (resolved:1, missing:[]). gpt-5-mini
is known to paraphrase rather than emit exact SKILL.md markers, so
failing the workflow on model-adherence flake would be a false
negative for the wiring itself.
Live evidence on Linux + Copilot CLI 1.0.36 + gpt-5-mini:
- copilot.agents_registered { count: 1, names: ["smoke-helper"] }
- copilot.mcp_loaded { serverNames: ["everything"], missingVars: [] }
- copilot.skills_resolved { resolved: 1, missing: [] }
- mcp_demo returned `5` (model invoked the MCP add tool).
- agents_demo returned `COPILOT_AGENT_MARKER_OK`.
- skills_demo returned skill description chitchat — soft warn, expected.
- Workflow exit: PASS, ~110s end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skill assertion was previously soft (warn-not-fail) because gpt-5-mini
paraphrased the SKILL.md body instead of emitting the marker verbatim.
Agent assertion regressed in the prior strengthening pass — model
confabulated the skill marker rather than invoking the registered agent.
Two changes pin both checks deterministically:
1. Stronger SKILL.md and prompt — the description now carries an
explicit "Use when..." trigger so the metadata scan picks the right
invocation; the body has the directive in three places (frontmatter,
## Output, ## Behavior) so the lazy-loaded body is unambiguous; the
skills_demo prompt explicitly tells the model to "invoke the
copilot-smoke skill" so the SDK loads the body (Copilot lazy-loads
skill bodies on invocation, like Claude's progressive disclosure).
2. Unguessable random tokens. Skill emits SK_a8f3kL2qZTOK; agent
emits AG_n5k7HpT3wAGOK. The previous COPILOT_SKILL_MARKER_OK /
COPILOT_AGENT_MARKER_OK pattern was guessable from training data —
gpt-5-mini confabulated the skill marker even when only the agent
was registered (no skills loaded for that node). Random tokens
make confabulation arithmetically impossible: a token's presence
in the response is mathematical proof the SDK actually loaded that
capability's content into the model's context.
Also: agent renamed `smoke-helper` → `task-responder` so its name no
longer pattern-matches as a skill, and the kebab-case validator (added
in the loader) accepts it. agents_demo prompt now mandates Task-tool
invocation explicitly.
Live evidence on Linux + Copilot CLI 1.0.36 + gpt-5-mini, fresh DB:
── results ──
hello = PONG
reasoning = 391
restricted = DENIED_OK
json.model = gpt-5-mini, json.ok = true
mcp_demo = 5
skills_demo = SK_a8f3kL2qZTOK
agents_demo = ...AG_n5k7HpT3wAGOKAG_n5k7HpT3wAGOK
PASS: all seven capabilities exercised end-to-end
Pino confirms wiring (visible in run output):
copilot.skills_resolved { resolved: 1, missing: [] }
copilot.mcp_loaded { serverNames: ["everything"], missingVars: [] }
copilot.agents_registered { count: 1, names: ["task-responder"] }
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s in smoke
Doc clarification:
Unlike Claude (OAuth subscription OR ANTHROPIC_API_KEY) and Codex (OAuth OR
OPENAI_API_KEY), GitHub Copilot has only ONE auth model: GitHub OAuth bound
to a Copilot subscription. The four "options" previously listed as a flat
bullet list were misleading — they're different *delivery paths for the
same OAuth token*, not separate auth schemes.
The doc now leads with that fact and presents the four paths as a When-to-
Use table, with an explicit warning that GitHub Actions' workflow-scoped
${{ github.token }} does NOT carry Copilot scope.
Smoke quoting fix:
e2e-copilot-smoke.yaml had the same redundant-outer-single-quote pattern
we already fixed in e2e-copilot-all-features (coleam00#591 + dag-executor's
substituteNodeOutputRefs already shell-quotes for bash). Bumped
idle_timeout from 30s → 60s — gpt-5-mini occasionally pauses past the 30s
default on a cold session, leading to the smoke failing with empty output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks @Wirasm — quick run-down on your four asks. Video: my machine's codec installs are not cooperating, but the logs below are from the actual run. Models tested: Configs: Results: one DAG covers all 7 wired capabilities — chat, Smoke files: Full run output —
|
Summary
@github/copilot-sdk) in 2025 that drives the Copilot CLI through a supported JSON-RPC bridge. Archon had no way to use it — the only path to Copilot would have been screen-scraping the interactive TUI.gpt-5,gpt-5-mini, and cross-family models (Claude via Copilot) under an existing GitHub billing relationship. This is the second community provider after Pi (feat(providers): add Pi community provider (@mariozechner/pi-coding-agent) #1270) and validates that the community-provider seam scales.packages/providers/src/community/copilot/wrapping@github/copilot-sdk, registered viaregisterCopilotProvider()withbuiltIn: false. Seven phased commits plus one hermeticity fix on the resolver test: initial scaffold → tool restrictions → MCP → skills → structured output → hardening fixes from CodeRabbit/Devin → sub-agents → hermetic vendor-path test. Nine of the thirteen capability flags are wired end-to-end.builtIn: true. Those remain deliberate follow-ups (plan preserved at.claude/archon/plans/copilot-provider-parity.md). Claude and Codex code paths untouched. Pi provider's behavior is preserved byte-for-byte via re-exports.UX Journey
Before
Copilot was reachable only by screen-scraping the interactive Copilot TUI — no supported path from a workflow.
After
[+]marks fields newly accepted on workflow nodes whenprovider: copilot.Architecture Diagram
Before
After
Shared extractions (no behavior change to Pi):
shared/skills.ts::resolveSkillDirectoriespi/options-translator.ts::resolvePiSkillsshared/structured-output.ts::augmentPromptForJsonSchemapi/provider.tsshared/structured-output.ts::tryParseStructuredOutputpi/event-bridge.tsMCP reuse (no extraction): Copilot imports Claude's existing public
loadMcpConfigre-export directly. Env-var expansion and missing-var detection behave identically across providers.Capability matrix
sessionResumesessionId, reused on resumeenvInjectioneffortControleffort: low|medium|high|max→reasoningEffortthinkingControlthinking:accepted; object form warns (Claude-specific)toolRestrictionsallowed_tools→availableTools,denied_tools→excludedToolsmcpnodeConfig.mcpJSON →SessionConfig.mcpServers,$VARexpandedskillsskills: [name]→ absolute dirs containingSKILL.md, missing names warnstructuredOutputagentsnodeConfig.agents→customAgents; Claude-specific fields warn per agenthooksSessionHooksevent vocabulary diverges from Archon's hook schema; deferredfallbackModelcostControlmaxBudgetUsdenforcement)sandboxonPermissionRequestis policy control, not sandbox semanticsLabel Snapshot
risk: low— purely additive new community provider behindbuiltIn: false; no existing code path changes behavior unless a workflow opts in viaprovider: copilot.size: L— ~1,128 lines added across 28 files (provider + 9 test files + docs + small shared extractions).core(the new provider),dependencies(@github/copilot-sdk ^0.2.2),docs(getting-started AI-assistants page),tests(52 new tests).providers:community/copilot,providers:shared(new shared/skills + shared/structured-output),providers:registry(one new registration entry).Change Metadata
featuremulti— primarilyproviders(new community provider package), with secondary touches todocs(getting-started page) andtests. No changes tocore,workflows,isolation,git,adapters,server,web,cli, orpathsaside from one config-loader line that adds acopilotkey to the assistants config schema.Linked Issue
Validation Evidence
bun run type-check— all 10 packages exit 0;bun run lint— 0 errors 0 warnings;bun run format:check— clean.bun --filter @archon/providers testpasses every per-file batch..exehandling on win32), provider happy path + streaming + resume, tool restrictions, MCP translation, skills translation, structured output, agents translation, PR-review hardening. Plus 6 new tests inregistry.test.tsfor Copilot registration + capability flags +isModelCompatible.e2e-copilot-smoke(basic prompt →COPILOT_OK) — 13.0seffort: highreasoning (17×23 → 391) — 14.1sdenied_tools: [shell, write]passthrough — 13.7sSecurity Impact
process.env(COPILOT_GITHUB_TOKEN,GH_TOKEN,GITHUB_TOKEN) and passed toCopilotClient({ githubToken, ... }). Not persisted by Archon.useLoggedInUser: trueis the default when no token env var is present.cwdlike other providers.enableConfigDiscovery: falseis Archon's default, documented as a trust boundary in the docs.isExecutableFilevalidates the resolved CLI path is a regular file with the exec bit set (posix), not a directory or non-executable.Compatibility / Migration
assistants.copilot.modelto.archon/config.yamland supply auth viacopilot loginor one of three env vars.pi/options-translator.ts,pi/provider.ts, andpi/event-bridge.tsre-export the extracted functions under their historical names.Human Verification
What was personally validated beyond CI:
e2e-copilot-smokeworkflow) — full streaming response,sessionIdreturned.effort: high— multi-digit arithmetic (17×23) returned 391; no truncation.denied_tools: [shell, write]honored, model declined to use blocked tools.useLoggedInUser: truefallback path engages cleanly..exesuffix only applied onwin32— verified viaos.platform()mock inbinary-resolver.test.ts.dag.structured_output_missing(not a hard error).bun --filter @archon/providers testis green for the full Pi batch as well as the new Copilot batches.@github/copilot-sdk's cross-platform contract plus theos.platform()-aware tests.AbortSignal, so this is the documented happy path; I did not artificially exhaust the ceiling.builtIn: trueand the four deferred capabilities (hooks, fallback model, sandbox, costControl) — explicitly out of scope.Side Effects / Blast Radius
@archon/providers— newcommunity/copilot/directory (provider, config, binary resolver, capabilities, registration, 9 test files); newshared/directory (skills + structured-output) extracted from Pi; one new registry entry; one new line inindex.tsfor the registration export.@archon/providersPi — three files trimmed (event-bridge.ts,options-translator.ts,provider.ts) to delegate toshared/, with re-exports preserving every public name and signature.@archon/coreconfig-loader — one additional key (copilot) accepted in theassistantsconfig schema.packages/docs-web— getting-started AI-assistants page gains install/auth instructions, a config example, a feature compatibility table, and aenableConfigDiscoverytrust-boundary note.bun.lock— pulls in@github/copilot-sdk@^0.2.2and its transitive deps.provider: copilotnow load successfully where they previously errored. No existing workflow references it.assistants.copilotto the schema does not invalidate any existing config (Zod additions to an object are non-breaking when the new key is optional).registry.test.tscases gate the Copilot path.capabilities.tsare the public truth for what is wired — anything stillfalse(hooks, fallbackModel, sandbox, costControl) will surface as a clean "capability not supported" warning at workflow load, not a runtime crash.bun run validate(which CI runs) coverscheck:bundled, type-check, lint, format check, and the per-package test suites.Rollback Plan
dev. Eight commits, purely additive, no schema or interface changes — revert is clean.provider: copilotfrom workflow YAML orassistants.copilotfrom.archon/config.yaml.Risks and Mitigations
@github/copilot-sdkis pre-1.0 (v0.2.2) — breaking changes possible on minor bumps.^0.2.2. The 52 new Copilot tests plus type-check gate any API-surface drift.AbortSignalcan leave a Copilot session holding resources untilsendAndWaitresolves (up to 24 h).AbortSignal.dag.structured_output_missing.enableConfigDiscovery: truewould let the Copilot CLI load repo-level config outside Archon's workflow validation surface.false. Docs carry an explicit trust-boundary note.Credits
Original first-class-provider attempt by @mhingston in #1111 — this PR re-implements the same capability behind the community provider registry per @Wirasm's guidance on close. The community-provider seam itself is @Wirasm's (#1270 / #1297). @mhingston is not continuing the work per their comment on #1111.
🤖 Includes AI-assisted development and review.
Summary by CodeRabbit
New Features
Documentation
Tests & CI