-
Notifications
You must be signed in to change notification settings - Fork 1
Round 44: Gemini CLI capability map (auto-loop-26) #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
3c4bc92
14cce21
a60a4e7
33272a8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,373 @@ | ||||||||||
| # Google Gemini CLI capability map — for other AI pilots | ||||||||||
|
|
||||||||||
| **Status:** first map — verified against `gemini --version` 0.38.2 | ||||||||||
| on 2026-04-22. Revise when the CLI version changes materially. | ||||||||||
|
|
||||||||||
| **Audience:** other AI pilots (Claude Code CLI, OpenAI Codex | ||||||||||
| CLI, Amara ChatGPT-surface, Playwright-driven agents) that may | ||||||||||
| want to orchestrate Google's Gemini agent as a sub-substrate — | ||||||||||
| either for capability-stepdown experiments (see | ||||||||||
| [`docs/research/arc3-dora-benchmark.md`](./arc3-dora-benchmark.md)) | ||||||||||
| or cross-substrate triangulation where one substrate queries | ||||||||||
| another. | ||||||||||
|
|
||||||||||
| Companion to: | ||||||||||
|
|
||||||||||
| - [`docs/research/claude-cli-capability-map.md`](./claude-cli-capability-map.md) | ||||||||||
| — the Claude Code CLI map (v2.1.116). | ||||||||||
| - [`docs/research/openai-codex-cli-capability-map.md`](./openai-codex-cli-capability-map.md) | ||||||||||
| — the OpenAI Codex CLI map (v0.122.0). | ||||||||||
|
|
||||||||||
| This doc is **descriptive**, not prescriptive. | ||||||||||
|
|
||||||||||
| ## Install + identity | ||||||||||
|
|
||||||||||
| - **Install:** `npm install -g @google/gemini-cli` (or via | ||||||||||
| Homebrew). The binary lives at `/opt/homebrew/bin/gemini` on | ||||||||||
| macOS Apple Silicon. | ||||||||||
| - **Binary:** `gemini` on `PATH`. | ||||||||||
| - **Check version:** `gemini --version` -> e.g. `0.38.2`. | ||||||||||
| - **Help top-level:** `gemini --help`. | ||||||||||
| - **Auth:** Gemini CLI authenticates against a consumer Google | ||||||||||
| account (Gemini Ultra in the maintainer's case) or a Google | ||||||||||
| Cloud service account, depending on configuration. Unlike | ||||||||||
| Claude's OAuth-at-`claude.ai` and Codex's shared-ChatGPT-auth, | ||||||||||
| Gemini's auth story is Google-account-centric. | ||||||||||
| - **Identity the factory tracks:** in the Zeta context, the CLI | ||||||||||
| is bound to the maintainer's personal Gemini Ultra account | ||||||||||
| (consumer tier), not an enterprise/workspace tenant. This is | ||||||||||
| relevant for ARC3-DORA stepdown experiments because the seat | ||||||||||
| class differs from the Claude (ServiceTitan enterprise) and | ||||||||||
| Codex (shared ChatGPT business) substrates. | ||||||||||
| - **Config home:** `~/.gemini/` (per-user) is the conventional | ||||||||||
| location; extensions, skills, and hooks surface below all | ||||||||||
| resolve against that directory plus repo-local `.gemini/`. | ||||||||||
|
|
||||||||||
| ## Two operating modes | ||||||||||
|
|
||||||||||
| ### Interactive (default) | ||||||||||
|
|
||||||||||
| `gemini` (or `gemini "your prompt"`) launches the interactive | ||||||||||
| TUI. This is the mode humans use for pair-work. `-i` / | ||||||||||
| `--prompt-interactive` accepts an opening prompt but continues | ||||||||||
| interactively afterwards (useful for hand-offs into a human- | ||||||||||
| driven session). | ||||||||||
|
|
||||||||||
| ### Non-interactive (`-p` / `--prompt`) | ||||||||||
|
|
||||||||||
| `gemini -p "your prompt"` runs headless: prints output, exits. | ||||||||||
| This is the mode pilots orchestrate. Stdin is appended to the | ||||||||||
| prompt if piped, so `cat file | gemini -p "summarize this"` | ||||||||||
| composes cleanly. | ||||||||||
|
|
||||||||||
| Defaulting to interactive is the structural note — Gemini's | ||||||||||
| top-level command is "launch the agent"; non-interactive is a | ||||||||||
| flag-guarded variant. Contrast with Codex, where `codex exec` | ||||||||||
| is a sibling subcommand. | ||||||||||
|
|
||||||||||
| ## Model selection | ||||||||||
|
|
||||||||||
| - `-m, --model <model>` — pick a Gemini model (e.g. | ||||||||||
| `gemini-2.5-pro`, `gemini-2.5-flash`). Consult Google's model | ||||||||||
| roster for the current set; this map records the lever only. | ||||||||||
| - No discrete `--effort` tier flag analog to Claude's | ||||||||||
| `low/medium/high/xhigh/max`. Capability tiering is done by | ||||||||||
| model selection instead. | ||||||||||
|
|
||||||||||
| For the ARC3-DORA stepdown experiment, treat model selection | ||||||||||
| as the capability-axis lever: Pro = higher tier, Flash = lower | ||||||||||
| tier, with specific variants (thinking / non-thinking / | ||||||||||
| preview) as intermediate rungs. | ||||||||||
|
|
||||||||||
| ## Approval mode — distinctive surface | ||||||||||
|
|
||||||||||
| Gemini exposes a top-level `--approval-mode <mode>` flag with | ||||||||||
| four choices: | ||||||||||
|
|
||||||||||
| | Mode | Behaviour | | ||||||||||
| |-------------|-----------------------------------------------------------------| | ||||||||||
| | `default` | Prompt for approval on each tool call. | | ||||||||||
| | `auto_edit` | Auto-approve edit tools (file write/edit); ask for the rest. | | ||||||||||
| | `yolo` | Auto-approve all tools (same as `-y` / `--yolo`). | | ||||||||||
| | `plan` | Read-only mode. The agent analyses and plans but takes no writes. | | ||||||||||
|
Comment on lines
+87
to
+92
|
||||||||||
|
|
||||||||||
| `plan` has no direct analog in the Claude or Codex maps. It is | ||||||||||
| a free "analyze-only" tier with zero-write blast radius — | ||||||||||
| useful when a pilot wants Gemini to propose a plan for an | ||||||||||
| unfamiliar substrate without risking side effects. | ||||||||||
|
|
||||||||||
| The legacy `--allowed-tools` flag is marked DEPRECATED in | ||||||||||
| favour of the Policy Engine; new pilots should use `--policy` | ||||||||||
| and `--admin-policy` to layer policy files. | ||||||||||
|
|
||||||||||
| ## Sandbox | ||||||||||
|
|
||||||||||
| - `-s, --sandbox` — boolean; run in Gemini's sandbox | ||||||||||
| environment. Unlike Codex's three-level sandbox | ||||||||||
| (`read-only` / `workspace-write` / `danger-full-access`), | ||||||||||
| Gemini's sandbox is binary at this surface. The | ||||||||||
| `--approval-mode plan` + `-s` pair is the closest Gemini | ||||||||||
| equivalent to Codex's `read-only` sandbox — no writes, no | ||||||||||
| prompts. | ||||||||||
| - `-y, --yolo` — YOLO mode; auto-accept all actions. Equivalent | ||||||||||
| to `--approval-mode yolo`. Use only under external sandbox | ||||||||||
| (container, VM, worktree). | ||||||||||
|
|
||||||||||
| ## Worktree isolation | ||||||||||
|
|
||||||||||
| - `-w, --worktree [name]` — start Gemini in a new git worktree. | ||||||||||
| If no name is given, one is generated. This is Gemini's | ||||||||||
| native answer to the "run the agent against an isolated copy | ||||||||||
| of the repo" pattern. Claude Code has a similar concept via | ||||||||||
| the `isolation: "worktree"` Agent parameter; Codex does not | ||||||||||
| surface a top-level equivalent at this CLI revision. | ||||||||||
| - `--include-directories <dirs>` — extend the workspace beyond | ||||||||||
| the current directory (comma-separated or repeated). Useful | ||||||||||
| when the pilot needs Gemini to see a sibling repo without | ||||||||||
| fully `cd`-ing into a meta-root. | ||||||||||
|
|
||||||||||
| ## Extensions, skills, hooks — the ecosystem surface | ||||||||||
|
|
||||||||||
| Gemini CLI ships with three distinct extension mechanisms. All | ||||||||||
| three have sibling subcommands for install/enable/disable/list; | ||||||||||
| only the highlights differ. | ||||||||||
|
|
||||||||||
| ### `gemini extensions` | ||||||||||
|
|
||||||||||
| - `extensions install <source>` — install from git URL or | ||||||||||
| local path; `--auto-update`, `--pre-release` flags available. | ||||||||||
| - `extensions new <path> [template]` — scaffold a new | ||||||||||
| extension from a boilerplate template. This is Gemini's | ||||||||||
| analog to `skill-creator` at the ecosystem level. | ||||||||||
| - `extensions link <path>` — link a local-path extension so | ||||||||||
| live edits reflect immediately (dev-loop friendly). | ||||||||||
| - `extensions validate <path>` — local-path lint; useful | ||||||||||
| before publishing. | ||||||||||
|
|
||||||||||
| ### `gemini skills` | ||||||||||
|
|
||||||||||
| Explicit "agent skills" concept — named feature-parity with | ||||||||||
| Claude Code's skills surface. Same install/enable/disable/list | ||||||||||
| shape. Discovered skills surface with `gemini skills list | ||||||||||
| --all`. The maintainer's install currently surfaces | ||||||||||
| `skill-creator` (built-in) and `microsoft-foundry` (enabled). | ||||||||||
|
|
||||||||||
| `skills link <path>` is the equivalent of Claude Code's | ||||||||||
| `.claude/skills/` repo-local discovery — the pilot can drop | ||||||||||
| skills into a shared path and have them resolved. | ||||||||||
|
|
||||||||||
| ### `gemini hooks` | ||||||||||
|
|
||||||||||
| Currently exposes one command: `gemini hooks migrate`, | ||||||||||
| explicitly labelled **"Migrate hooks from Claude Code to | ||||||||||
| Gemini CLI"**. This is a structural acknowledgement that | ||||||||||
| Claude Code is the de-facto reference for hook semantics; | ||||||||||
| Gemini consumes the same shape with a migration path. | ||||||||||
|
|
||||||||||
| Pilots orchestrating both CLIs can keep hooks authored against | ||||||||||
| Claude Code's schema and rely on `gemini hooks migrate` to | ||||||||||
| re-project them, rather than double-authoring. | ||||||||||
|
|
||||||||||
| ## MCP — pilot-bridge surface | ||||||||||
|
|
||||||||||
| - `gemini mcp add <name> <commandOrUrl> [args...]` — register | ||||||||||
| an MCP server. | ||||||||||
| - `gemini mcp remove <name>` — deregister. | ||||||||||
| - `gemini mcp list` — list configured servers. | ||||||||||
| - `gemini mcp enable <name>` / `gemini mcp disable <name>` — | ||||||||||
| toggle a registered server without removing it. | ||||||||||
| - `--allowed-mcp-server-names <list>` — whitelist at invocation | ||||||||||
| time; combine with `--policy` for per-invocation scoping. | ||||||||||
|
|
||||||||||
| Note: Gemini CLI at this revision does **not** ship a top-level | ||||||||||
| `gemini mcp-server` subcommand (the "serve Gemini itself as an | ||||||||||
| MCP server" analog to Claude's `claude mcp serve` or Codex's | ||||||||||
| `codex mcp-server`). If a pilot needs Gemini-as-MCP-server, | ||||||||||
| route through `--acp` (below) or wrap the CLI behind an MCP | ||||||||||
| adapter. | ||||||||||
|
|
||||||||||
| ## ACP mode | ||||||||||
|
|
||||||||||
| - `--acp` — start the agent in ACP (Agent Client Protocol) | ||||||||||
| mode. ACP is Google's pilot-bridge protocol analogous in role | ||||||||||
| to MCP-server modes on the other CLIs: it exposes Gemini as a | ||||||||||
| callable substrate for an orchestrating client. | ||||||||||
| - `--experimental-acp` — deprecated alias; prefer `--acp`. | ||||||||||
|
|
||||||||||
| ## Output format + structured extraction | ||||||||||
|
|
||||||||||
| - `-o, --output-format <fmt>` — choices: `text` (default), | ||||||||||
| `json`, `stream-json`. The `stream-json` mode is distinctive | ||||||||||
| — pilots consuming Gemini's output incrementally (event-by- | ||||||||||
| event) can parse as events arrive rather than waiting for | ||||||||||
| completion. Claude's equivalent is `--output-format | ||||||||||
| stream-json`; Codex uses `--json` + schema for one-shot | ||||||||||
| extraction and does not surface an event-stream mode at this | ||||||||||
| revision. | ||||||||||
| - `--raw-output` — disable sanitization of model output (allow | ||||||||||
| ANSI escapes). Flagged by the CLI as a security risk because | ||||||||||
| untrusted model output could include terminal-escape-driven | ||||||||||
| attacks; pair with `--accept-raw-output-risk` to suppress the | ||||||||||
| warning (do this only when output consumer is known safe). | ||||||||||
|
|
||||||||||
| ## Session management | ||||||||||
|
|
||||||||||
| - `-r, --resume <spec>` — resume a session. Accepts `latest` | ||||||||||
| (most recent) or an index (e.g. `--resume 5`). | ||||||||||
| - `--list-sessions` — print available sessions and exit. | ||||||||||
| - `--delete-session <index>` — remove a session. | ||||||||||
|
|
||||||||||
| Unlike Claude (`-c` continue / `-r` resume with session-id or | ||||||||||
| PR) and Codex (`codex resume [--last]` / `codex fork [--last]` | ||||||||||
| with fork-a-session semantics), Gemini does not expose a | ||||||||||
| session-fork primitive at this CLI surface. Pilots needing | ||||||||||
| forked branches should copy the session directory manually or | ||||||||||
| use the worktree flag to get logical isolation. | ||||||||||
|
|
||||||||||
| ## Debug + accessibility | ||||||||||
|
|
||||||||||
| - `-d, --debug` — open debug console (F12). Useful for pilots | ||||||||||
| debugging interactive flows; irrelevant for headless pilot | ||||||||||
| orchestration. | ||||||||||
| - `--screen-reader` — accessibility mode. | ||||||||||
|
|
||||||||||
| ## Calling patterns — orchestration playbook | ||||||||||
|
|
||||||||||
| **Pattern 1 — cross-substrate triangulation.** Same prompt to | ||||||||||
| all three substrates, diff the results: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| PROMPT="summarize the top risk in $FILE" | ||||||||||
| claude -p "$PROMPT" > /tmp/claude.txt | ||||||||||
| codex exec "$PROMPT" > /tmp/codex.txt | ||||||||||
| gemini -p "$PROMPT" > /tmp/gemini.txt | ||||||||||
| diff3 /tmp/claude.txt /tmp/codex.txt /tmp/gemini.txt | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| **Pattern 2 — ARC3-DORA stepdown (Gemini side).** Hold the | ||||||||||
| task fixed, walk the model tier: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| for M in gemini-2.5-pro gemini-2.5-flash; do | ||||||||||
| gemini -p -m "$M" --approval-mode plan \ | ||||||||||
| "$TASK_PROMPT" > "/tmp/run-$M.txt" | ||||||||||
|
Comment on lines
+252
to
+253
|
||||||||||
| gemini -p -m "$M" --approval-mode plan \ | |
| "$TASK_PROMPT" > "/tmp/run-$M.txt" | |
| gemini -p "$TASK_PROMPT" -m "$M" --approval-mode plan \ | |
| > "/tmp/run-$M.txt" |
Copilot
AI
Apr 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example has gemini -p --approval-mode plan ... with the prompt string coming later. If -p requires an immediate argument, this will misparse (--approval-mode becomes the prompt). Place the prompt right after -p/--prompt, or move -p to the end immediately before the quoted prompt.
Copilot
AI
Apr 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stream-json example uses gemini -p -o stream-json "$PROMPT". If -p expects its value immediately, -o may be consumed as the prompt value. Consider ordering flags as gemini -o stream-json -p "$PROMPT" (or --output-format ... --prompt ...) to avoid ambiguous parsing.
| gemini -p -o stream-json "$PROMPT" | \ | |
| gemini -o stream-json -p "$PROMPT" | \ |
Copilot
AI
Apr 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The worktree example has -p before --approval-mode auto_edit, with the prompt argument on the next line. If -p requires an immediate value, this will likely treat --approval-mode as the prompt and fail. Reorder so -p is immediately followed by the quoted prompt (or use --prompt).
| gemini -w agent-run-$(date +%s) -p --approval-mode auto_edit \ | |
| "implement the plan in PLAN.md" | |
| gemini -w agent-run-$(date +%s) -p \ | |
| "implement the plan in PLAN.md" --approval-mode auto_edit |
Copilot
AI
Apr 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparison table also uses || at the start of each row, which typically creates an unintended empty first column in Markdown renderers. Switch to a single leading | per row so the table renders consistently with the other capability maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mark Codex as supporting event-stream output
The comparison currently says Codex does not surface event-stream output, but the sibling Codex map explicitly documents codex exec --json as emitting JSONL events (docs/research/openai-codex-cli-capability-map.md lines 51-56). This mismatch can mislead orchestration pilots into skipping a supported streaming path and building unnecessary workarounds; the table should reflect that Codex streams events via --json even without a stream-json flag name.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace Codex MCP client row with actual CLI commands
This row says Codex MCP registry is only "via config.toml", but the existing Codex capability map documents dedicated codex mcp management commands (add/list/remove/...) as part of the CLI surface (docs/research/openai-codex-cli-capability-map.md lines 146-147). Keeping the row as-is pushes pilots toward direct config edits instead of the supported command interface, which makes automation more brittle when config structure changes.
Useful? React with 👍 / 👎.
Copilot
AI
Apr 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "Amara-register" is used without any local definition or link target, and it also repeats the use of the proper name "Amara" which conflicts with the repo’s “no name attribution in docs” standing rule (docs/AGENT-BEST-PRACTICES.md:284-292). Please either replace it with a role-reference / generic term, or add a concrete link to the register/spec that defines this discipline.
| cross-substrate safety-check discipline (Amara-register) | |
| extends to include Gemini as a first-class voice. | |
| cross-substrate safety-check discipline extends to include | |
| Gemini as a first-class voice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Operational standing rule prohibits direct contributor/agent names in docs; this file introduces the proper name "Amara" ("Amara ChatGPT-surface") instead of a role reference. Please rewrite to a role-based term (e.g., "ChatGPT surface") or link to a formally defined non-personal concept name, to keep docs stable across contributor turnover (docs/AGENT-BEST-PRACTICES.md:284-292).