Skip to content
Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
373 changes: 373 additions & 0 deletions docs/research/gemini-cli-capability-map.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,373 @@
# Google Gemini CLI capability map — for other AI pilots

**Status:** first map — verified against `gemini --version` 0.38.2
on 2026-04-22. Revise when the CLI version changes materially.

**Audience:** other AI pilots (Claude Code CLI, OpenAI Codex
CLI, Amara ChatGPT-surface, Playwright-driven agents) that may

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operational standing rule prohibits direct contributor/agent names in docs; this file introduces the proper name "Amara" ("Amara ChatGPT-surface") instead of a role reference. Please rewrite to a role-based term (e.g., "ChatGPT surface") or link to a formally defined non-personal concept name, to keep docs stable across contributor turnover (docs/AGENT-BEST-PRACTICES.md:284-292).

Suggested change
CLI, Amara ChatGPT-surface, Playwright-driven agents) that may
CLI, ChatGPT surface, Playwright-driven agents) that may

Copilot uses AI. Check for mistakes.
want to orchestrate Google's Gemini agent as a sub-substrate —
either for capability-stepdown experiments (see
[`docs/research/arc3-dora-benchmark.md`](./arc3-dora-benchmark.md))
or cross-substrate triangulation where one substrate queries
another.

Companion to:

- [`docs/research/claude-cli-capability-map.md`](./claude-cli-capability-map.md)
— the Claude Code CLI map (v2.1.116).
- [`docs/research/openai-codex-cli-capability-map.md`](./openai-codex-cli-capability-map.md)
— the OpenAI Codex CLI map (v0.122.0).

This doc is **descriptive**, not prescriptive.

## Install + identity

- **Install:** `npm install -g @google/gemini-cli` (or via
Homebrew). The binary lives at `/opt/homebrew/bin/gemini` on
macOS Apple Silicon.
- **Binary:** `gemini` on `PATH`.
- **Check version:** `gemini --version` -> e.g. `0.38.2`.
- **Help top-level:** `gemini --help`.
- **Auth:** Gemini CLI authenticates against a consumer Google
account (Gemini Ultra in the maintainer's case) or a Google
Cloud service account, depending on configuration. Unlike
Claude's OAuth-at-`claude.ai` and Codex's shared-ChatGPT-auth,
Gemini's auth story is Google-account-centric.
- **Identity the factory tracks:** in the Zeta context, the CLI
is bound to the maintainer's personal Gemini Ultra account
(consumer tier), not an enterprise/workspace tenant. This is
relevant for ARC3-DORA stepdown experiments because the seat
class differs from the Claude (ServiceTitan enterprise) and
Codex (shared ChatGPT business) substrates.
- **Config home:** `~/.gemini/` (per-user) is the conventional
location; extensions, skills, and hooks surface below all
resolve against that directory plus repo-local `.gemini/`.

## Two operating modes

### Interactive (default)

`gemini` (or `gemini "your prompt"`) launches the interactive
TUI. This is the mode humans use for pair-work. `-i` /
`--prompt-interactive` accepts an opening prompt but continues
interactively afterwards (useful for hand-offs into a human-
driven session).

### Non-interactive (`-p` / `--prompt`)

`gemini -p "your prompt"` runs headless: prints output, exits.
This is the mode pilots orchestrate. Stdin is appended to the
prompt if piped, so `cat file | gemini -p "summarize this"`
composes cleanly.

Defaulting to interactive is the structural note — Gemini's
top-level command is "launch the agent"; non-interactive is a
flag-guarded variant. Contrast with Codex, where `codex exec`
is a sibling subcommand.

## Model selection

- `-m, --model <model>` — pick a Gemini model (e.g.
`gemini-2.5-pro`, `gemini-2.5-flash`). Consult Google's model
roster for the current set; this map records the lever only.
- No discrete `--effort` tier flag analog to Claude's
`low/medium/high/xhigh/max`. Capability tiering is done by
model selection instead.

For the ARC3-DORA stepdown experiment, treat model selection
as the capability-axis lever: Pro = higher tier, Flash = lower
tier, with specific variants (thinking / non-thinking /
preview) as intermediate rungs.

## Approval mode — distinctive surface

Gemini exposes a top-level `--approval-mode <mode>` flag with
four choices:

| Mode | Behaviour |
|-------------|-----------------------------------------------------------------|
| `default` | Prompt for approval on each tool call. |
| `auto_edit` | Auto-approve edit tools (file write/edit); ask for the rest. |
| `yolo` | Auto-approve all tools (same as `-y` / `--yolo`). |
| `plan` | Read-only mode. The agent analyses and plans but takes no writes. |
Comment on lines +87 to +92

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approval-mode table is written with a double leading pipe (|| ...). In standard Markdown this renders an empty first column and can misalign the table. Use a single leading | for each row (and keep the same number of columns across rows).

Copilot uses AI. Check for mistakes.

`plan` has no direct analog in the Claude or Codex maps. It is
a free "analyze-only" tier with zero-write blast radius —
useful when a pilot wants Gemini to propose a plan for an
unfamiliar substrate without risking side effects.

The legacy `--allowed-tools` flag is marked DEPRECATED in
favour of the Policy Engine; new pilots should use `--policy`
and `--admin-policy` to layer policy files.

## Sandbox

- `-s, --sandbox` — boolean; run in Gemini's sandbox
environment. Unlike Codex's three-level sandbox
(`read-only` / `workspace-write` / `danger-full-access`),
Gemini's sandbox is binary at this surface. The
`--approval-mode plan` + `-s` pair is the closest Gemini
equivalent to Codex's `read-only` sandbox — no writes, no
prompts.
- `-y, --yolo` — YOLO mode; auto-accept all actions. Equivalent
to `--approval-mode yolo`. Use only under external sandbox
(container, VM, worktree).

## Worktree isolation

- `-w, --worktree [name]` — start Gemini in a new git worktree.
If no name is given, one is generated. This is Gemini's
native answer to the "run the agent against an isolated copy
of the repo" pattern. Claude Code has a similar concept via
the `isolation: "worktree"` Agent parameter; Codex does not
surface a top-level equivalent at this CLI revision.
- `--include-directories <dirs>` — extend the workspace beyond
the current directory (comma-separated or repeated). Useful
when the pilot needs Gemini to see a sibling repo without
fully `cd`-ing into a meta-root.

## Extensions, skills, hooks — the ecosystem surface

Gemini CLI ships with three distinct extension mechanisms. All
three have sibling subcommands for install/enable/disable/list;
only the highlights differ.

### `gemini extensions`

- `extensions install <source>` — install from git URL or
local path; `--auto-update`, `--pre-release` flags available.
- `extensions new <path> [template]` — scaffold a new
extension from a boilerplate template. This is Gemini's
analog to `skill-creator` at the ecosystem level.
- `extensions link <path>` — link a local-path extension so
live edits reflect immediately (dev-loop friendly).
- `extensions validate <path>` — local-path lint; useful
before publishing.

### `gemini skills`

Explicit "agent skills" concept — named feature-parity with
Claude Code's skills surface. Same install/enable/disable/list
shape. Discovered skills surface with `gemini skills list
--all`. The maintainer's install currently surfaces
`skill-creator` (built-in) and `microsoft-foundry` (enabled).

`skills link <path>` is the equivalent of Claude Code's
`.claude/skills/` repo-local discovery — the pilot can drop
skills into a shared path and have them resolved.

### `gemini hooks`

Currently exposes one command: `gemini hooks migrate`,
explicitly labelled **"Migrate hooks from Claude Code to
Gemini CLI"**. This is a structural acknowledgement that
Claude Code is the de-facto reference for hook semantics;
Gemini consumes the same shape with a migration path.

Pilots orchestrating both CLIs can keep hooks authored against
Claude Code's schema and rely on `gemini hooks migrate` to
re-project them, rather than double-authoring.

## MCP — pilot-bridge surface

- `gemini mcp add <name> <commandOrUrl> [args...]` — register
an MCP server.
- `gemini mcp remove <name>` — deregister.
- `gemini mcp list` — list configured servers.
- `gemini mcp enable <name>` / `gemini mcp disable <name>` —
toggle a registered server without removing it.
- `--allowed-mcp-server-names <list>` — whitelist at invocation
time; combine with `--policy` for per-invocation scoping.

Note: Gemini CLI at this revision does **not** ship a top-level
`gemini mcp-server` subcommand (the "serve Gemini itself as an
MCP server" analog to Claude's `claude mcp serve` or Codex's
`codex mcp-server`). If a pilot needs Gemini-as-MCP-server,
route through `--acp` (below) or wrap the CLI behind an MCP
adapter.

## ACP mode

- `--acp` — start the agent in ACP (Agent Client Protocol)
mode. ACP is Google's pilot-bridge protocol analogous in role
to MCP-server modes on the other CLIs: it exposes Gemini as a
callable substrate for an orchestrating client.
- `--experimental-acp` — deprecated alias; prefer `--acp`.

## Output format + structured extraction

- `-o, --output-format <fmt>` — choices: `text` (default),
`json`, `stream-json`. The `stream-json` mode is distinctive
— pilots consuming Gemini's output incrementally (event-by-
event) can parse as events arrive rather than waiting for
completion. Claude's equivalent is `--output-format
stream-json`; Codex uses `--json` + schema for one-shot
extraction and does not surface an event-stream mode at this
revision.
- `--raw-output` — disable sanitization of model output (allow
ANSI escapes). Flagged by the CLI as a security risk because
untrusted model output could include terminal-escape-driven
attacks; pair with `--accept-raw-output-risk` to suppress the
warning (do this only when output consumer is known safe).

## Session management

- `-r, --resume <spec>` — resume a session. Accepts `latest`
(most recent) or an index (e.g. `--resume 5`).
- `--list-sessions` — print available sessions and exit.
- `--delete-session <index>` — remove a session.

Unlike Claude (`-c` continue / `-r` resume with session-id or
PR) and Codex (`codex resume [--last]` / `codex fork [--last]`
with fork-a-session semantics), Gemini does not expose a
session-fork primitive at this CLI surface. Pilots needing
forked branches should copy the session directory manually or
use the worktree flag to get logical isolation.

## Debug + accessibility

- `-d, --debug` — open debug console (F12). Useful for pilots
debugging interactive flows; irrelevant for headless pilot
orchestration.
- `--screen-reader` — accessibility mode.

## Calling patterns — orchestration playbook

**Pattern 1 — cross-substrate triangulation.** Same prompt to
all three substrates, diff the results:

```bash
PROMPT="summarize the top risk in $FILE"
claude -p "$PROMPT" > /tmp/claude.txt
codex exec "$PROMPT" > /tmp/codex.txt
gemini -p "$PROMPT" > /tmp/gemini.txt
diff3 /tmp/claude.txt /tmp/codex.txt /tmp/gemini.txt
```

**Pattern 2 — ARC3-DORA stepdown (Gemini side).** Hold the
task fixed, walk the model tier:

```bash
for M in gemini-2.5-pro gemini-2.5-flash; do
gemini -p -m "$M" --approval-mode plan \
"$TASK_PROMPT" > "/tmp/run-$M.txt"
Comment on lines +252 to +253

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the stepdown loop example, -p/--prompt appears before other flags and the prompt argument is provided later. Many CLIs require an option's value to immediately follow the option, so this can be parsed as -p receiving -m (or --approval-mode) as its value. Reorder so the prompt value immediately follows -p (or use --prompt "..." at the end).

Suggested change
gemini -p -m "$M" --approval-mode plan \
"$TASK_PROMPT" > "/tmp/run-$M.txt"
gemini -p "$TASK_PROMPT" -m "$M" --approval-mode plan \
> "/tmp/run-$M.txt"

Copilot uses AI. Check for mistakes.
done
```

`--approval-mode plan` keeps the run read-only, so DORA-MTTR
isn't contaminated by write-path variance between tiers.

**Pattern 3 — read-only exploration of unfamiliar surface.**
The distinctive Gemini pattern — use `plan` mode when pointing
the agent at a substrate the pilot doesn't want to risk
mutating:

```bash
gemini -p --approval-mode plan \
"map the public API of $REPO and list undocumented surfaces"
```
Comment on lines +266 to +268

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example has gemini -p --approval-mode plan ... with the prompt string coming later. If -p requires an immediate argument, this will misparse (--approval-mode becomes the prompt). Place the prompt right after -p/--prompt, or move -p to the end immediately before the quoted prompt.

Copilot uses AI. Check for mistakes.

**Pattern 4 — structured extraction via JSON stream.** For
event-by-event consumption:

```bash
gemini -p -o stream-json "$PROMPT" | \

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stream-json example uses gemini -p -o stream-json "$PROMPT". If -p expects its value immediately, -o may be consumed as the prompt value. Consider ordering flags as gemini -o stream-json -p "$PROMPT" (or --output-format ... --prompt ...) to avoid ambiguous parsing.

Suggested change
gemini -p -o stream-json "$PROMPT" | \
gemini -o stream-json -p "$PROMPT" | \

Copilot uses AI. Check for mistakes.
while read -r evt; do
echo "$evt" | jq -c '.type, .content'
done
```

**Pattern 5 — isolated worktree agent.** Run Gemini against
its own worktree so writes don't collide with the primary
working copy:

```bash
gemini -w agent-run-$(date +%s) -p --approval-mode auto_edit \
"implement the plan in PLAN.md"
Comment on lines +285 to +286

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worktree example has -p before --approval-mode auto_edit, with the prompt argument on the next line. If -p requires an immediate value, this will likely treat --approval-mode as the prompt and fail. Reorder so -p is immediately followed by the quoted prompt (or use --prompt).

Suggested change
gemini -w agent-run-$(date +%s) -p --approval-mode auto_edit \
"implement the plan in PLAN.md"
gemini -w agent-run-$(date +%s) -p \
"implement the plan in PLAN.md" --approval-mode auto_edit

Copilot uses AI. Check for mistakes.
```

**Pattern 6 — hook migration from Claude Code.** Pilot
standardises hook authoring on Claude Code's schema, migrates
to Gemini when needed:

```bash
gemini hooks migrate
```

## Gemini vs Claude vs Codex — comparison

| Concern | Claude Code | OpenAI Codex | Google Gemini |
|-------------------------------|-----------------------------------------------|---------------------------------------------------|----------------------------------------------|
| Non-interactive | `--print` / `-p` (flag) | `codex exec` (subcommand) | `-p` / `--prompt` (flag) |
| Model/capability lever | `--effort low..max` (discrete tiers) | `-m` / `-c model="..."` + `--profile` | `-m` / `--model` |
Comment on lines +299 to +302

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison table also uses || at the start of each row, which typically creates an unintended empty first column in Markdown renderers. Switch to a single leading | per row so the table renders consistently with the other capability maps.

Copilot uses AI. Check for mistakes.
| Budget ceiling flag | `--max-budget-usd` | **None** | **None** |
| Sandbox levels | `--permission-mode ...` | `-s read-only / workspace-write / danger-full-access` | `-s` (binary) + `--approval-mode` |
| Read-only / plan mode | (not a top-level flag) | `-s read-only` | `--approval-mode plan` (distinctive) |
| YOLO / auto-all | `--dangerously-skip-permissions` | `--dangerously-bypass-approvals-and-sandbox` | `-y` / `--yolo` / `--approval-mode yolo` |
| Worktree isolation | Agent param `isolation: "worktree"` | (not a top-level flag) | `-w` / `--worktree` |
| Structured output | `--json-schema` + `--output-format=json` | `--output-schema` + `--json` | `-o text/json/stream-json` |
| Event-stream output | `--output-format stream-json` | (not surfaced) | `-o stream-json` |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mark Codex as supporting event-stream output

The comparison currently says Codex does not surface event-stream output, but the sibling Codex map explicitly documents codex exec --json as emitting JSONL events (docs/research/openai-codex-cli-capability-map.md lines 51-56). This mismatch can mislead orchestration pilots into skipping a supported streaming path and building unnecessary workarounds; the table should reflect that Codex streams events via --json even without a stream-json flag name.

Useful? React with 👍 / 👎.

| MCP serve (pilot bridge) | `claude mcp serve` | `codex mcp-server` | `--acp` (ACP protocol, not MCP) |
| MCP registry (as client) | `claude mcp add/list/...` | (via config.toml) | `gemini mcp add/list/enable/disable` |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Replace Codex MCP client row with actual CLI commands

This row says Codex MCP registry is only "via config.toml", but the existing Codex capability map documents dedicated codex mcp management commands (add/list/remove/...) as part of the CLI surface (docs/research/openai-codex-cli-capability-map.md lines 146-147). Keeping the row as-is pushes pilots toward direct config edits instead of the supported command interface, which makes automation more brittle when config structure changes.

Useful? React with 👍 / 👎.

| Skills mechanism | `.claude/skills/` | (not surfaced as "skills") | `gemini skills install/link/list` |
| Hooks mechanism | settings.json hooks | (not surfaced) | `gemini hooks` + `migrate` (Claude→Gemini) |
| Session resume | `-c` / `-r` | `codex resume [--last]` | `-r [latest\|N]` |
| Session fork | `--fork-session` | `codex fork [--last]` | (not surfaced) |

**Structural observation:** each CLI lands the
interactive/non-interactive split differently (Claude = flag,
Codex = subcommand, Gemini = flag with interactive-default).
The three ecosystems' extension models also diverge: Claude has
skills + plugins + hooks as distinct; Codex has a minimal
extension surface; Gemini splits into three parallel mechanisms
(extensions / skills / hooks) with `hooks migrate` explicitly
bridging to Claude Code. Pilots orchestrating all three should
treat the three calling conventions as distinct.

## What this map does NOT say

- **When to call Gemini vs Claude vs Codex.** Routing is a
separate decision; this map only describes surfaces.
- **How Google's pricing works.** Consult Google's Gemini /
Google Cloud billing docs; this map only notes that
`--max-budget-usd` equivalent is absent.
- **Which model to pick.** Consult Google's current model
roster; this map records the `-m` / `--model` lever only.
- **Prompt engineering specifics.** Per-task concern.
- **Historical flags.** Only the current `0.38.2` surface is
mapped; future pilots should re-run `gemini --help`.
- **Security analysis of `--raw-output` or `yolo`.** Treat as
externally-sandboxed-environment-only flags.
- **Consumer vs enterprise Google account semantics.** The
maintainer's CLI binds to a personal Gemini Ultra seat; how
that differs from a Google Cloud service-account binding in
terms of rate limits, retention, or audit-log surface is
out-of-scope here.

## How this doc composes with the factory

- [`docs/research/arc3-dora-benchmark.md`](./arc3-dora-benchmark.md)
— Gemini is a cross-provider substrate for the ARC3-DORA
stepdown experiment; the `--approval-mode plan` mode is a
free low-risk tier.
- [`docs/research/claude-cli-capability-map.md`](./claude-cli-capability-map.md)
— sibling map for the Claude CLI surface; the comparison
table above points back to it.
- [`docs/research/openai-codex-cli-capability-map.md`](./openai-codex-cli-capability-map.md)
— sibling map for the Codex CLI surface; the comparison
table above points back to it.
- **Three-substrate triangulation** (Claude + Codex + Gemini)
is now fully documented with this map landing. The
cross-substrate safety-check discipline (Amara-register)
extends to include Gemini as a first-class voice.
Comment on lines +361 to +362

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "Amara-register" is used without any local definition or link target, and it also repeats the use of the proper name "Amara" which conflicts with the repo’s “no name attribution in docs” standing rule (docs/AGENT-BEST-PRACTICES.md:284-292). Please either replace it with a role-reference / generic term, or add a concrete link to the register/spec that defines this discipline.

Suggested change
cross-substrate safety-check discipline (Amara-register)
extends to include Gemini as a first-class voice.
cross-substrate safety-check discipline extends to include
Gemini as a first-class voice.

Copilot uses AI. Check for mistakes.

## Revision notes

- 2026-04-22 — first map (auto-loop-26). Verified against
`gemini --version` 0.38.2 on `/opt/homebrew/bin/gemini`,
`gemini --help` top-level surface, plus `mcp` / `extensions`
/ `skills` / `hooks` subcommand help. Discovered skills
include `skill-creator` (built-in) and `microsoft-foundry`
(enabled). Deprecated alias `--experimental-acp` and
deprecated `--allowed-tools` flag documented with their
current replacements (`--acp` and `--policy`).
Loading