Skip to content
Merged
24 changes: 23 additions & 1 deletion .agent/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ See `subagent-index.toon` for complete listing of agents, subagents, workflows,
| Code quality | `tools/code-review/code-standards.md` |
| Git/PRs | `workflows/git-workflow.md`, `tools/git/github-cli.md` |
| Releases | `workflows/release.md`, `workflows/version-bump.md` |
| Browser | `tools/browser/stagehand.md` or `tools/browser/playwright.md` |
| Browser | `tools/browser/browser-automation.md` (decision tree, then tool-specific subagent) |
| WordPress | `tools/wordpress/wp-dev.md`, `tools/wordpress/mainwp.md` |
| SEO | `seo/dataforseo.md`, `seo/google-search-console.md` |
| Video | `tools/video/video-prompt-design.md`, `tools/video/remotion.md`, `tools/video/higgsfield.md` |
Expand Down Expand Up @@ -213,6 +213,28 @@ Import community skills: `aidevops skill add <source>` (→ `*-skill.md` suffix)
└── memory/ # Cross-session patterns (SQLite FTS5)
```

## Browser Automation

**When to use a browser** (proactively, without being asked):
- Verifying a dev server works after changes (navigate, check content, screenshot if errors)
- Testing forms, auth flows, or UI after code changes
- Logging into websites to submit content, manage accounts, or extract data
- Verifying deployments are live and rendering correctly
- Debugging frontend issues (check console errors, network requests)

**How to choose a tool**: Read `tools/browser/browser-automation.md` for the full decision tree. Quick defaults:
- **Dev testing** (your app): Playwright direct (fastest) or dev-browser (persistent login)
- **Website interaction** (login, submit, manage): dev-browser (persistent) or Playwriter (your extensions/passwords)
- **Data extraction**: Crawl4AI (bulk) or Playwright (interactive first)
- **Debugging**: Chrome DevTools MCP paired with dev-browser

**AI page understanding** (how to "see" the page without vision tokens):
- Use ARIA snapshots (~0.01s, 50-200 tokens) for forms, navigation, interactive elements
- Use text extraction (~0.002s) for reading content
- Use screenshots only for visual debugging or when layout matters

**Benchmarks**: Read `tools/browser/browser-benchmark.md` to re-run if tools are updated.

## Localhost Standards

- Check port first: `localhost-helper.sh check-port <port>`
Expand Down
12 changes: 11 additions & 1 deletion .agent/tools/browser/agent-browser.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,17 @@ agent-browser close
- **Ref-based selection**: Deterministic element targeting from snapshots
- **AI-optimized**: `--json` output for machine parsing
- **Session isolation**: Multiple browser instances with `--session`
- **Fast**: Native Rust CLI, persistent daemon between commands
- **No server needed**: Daemon starts automatically, persists between commands
- **Headless by default**: Use `--headed` only for visual debugging

**Performance** (warm daemon): Navigate+screenshot 1.9s, form fill 1.4s, reliability 0.6s avg.
Cold-start penalty ~3-5s on first command while daemon launches.

**Parallel**: `--session s1/s2/s3` for isolated sessions (tested: 3 parallel in 2.0s). Each session has its own browser context.

**AI Page Understanding**: `agent-browser snapshot -i` returns ARIA tree with interactive refs. Use refs (`@e1`, `@e2`) for deterministic element targeting. Faster than screenshots for AI decision-making.

**Limitations**: No proxy support, no browser extensions, no Chrome DevTools MCP pairing.

<!-- AI-CONTEXT-END -->

Expand Down
663 changes: 368 additions & 295 deletions .agent/tools/browser/browser-automation.md

Large diffs are not rendered by default.

Loading