Skip to content

docs: update browser tool docs with benchmarks and add benchmark agent#163

Merged
marcusquinn merged 8 commits intomainfrom
chore/browser-docs-update
Jan 24, 2026
Merged

docs: update browser tool docs with benchmarks and add benchmark agent#163
marcusquinn merged 8 commits intomainfrom
chore/browser-docs-update

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Jan 24, 2026

Summary

  • Rewrites browser-automation.md with task-based decision tree, real performance benchmarks, feature matrix (headless/proxy/persistence/extensions), and detailed usage examples for all 6 tools
  • Updates individual tool docs (agent-browser, dev-browser, playwright, playwriter, stagehand, crawl4ai) with performance stats, limitations, and setup notes from hands-on testing
  • Adds browser-benchmark.md - a reusable benchmarking agent with standardised test scripts for all tools, so benchmarks can be re-run as tools get updated

Key Changes

Decision tree now routes by task type (interactive vs extraction) rather than defaulting to one tool.

Benchmark table (median of 3 runs, macOS ARM64, headless):

Test Playwright dev-browser agent-browser Crawl4AI Playwriter Stagehand
Navigate + Screenshot 1.43s 1.39s 1.90s 2.78s 2.95s 7.72s
Form Fill 0.90s 1.34s 1.37s N/A 2.24s 2.58s
Data Extraction 1.33s 1.08s 1.53s 2.53s 2.68s 3.48s
Multi-step 1.49s 1.49s 3.06s N/A 4.37s 4.48s

Feature matrix covers: headless, session persistence, proxy/SOCKS5, extensions, multi-session, form filling, screenshots, extraction, natural language, self-healing, AI output format.

Benchmark agent (browser-benchmark.md) includes ready-to-run scripts for Playwright, dev-browser, agent-browser, Crawl4AI, and Stagehand with consistent test methodology.

Summary by CodeRabbit

  • Documentation
    • Added detailed performance metrics and benchmarks (navigation, form fill, extraction, reliability) and a benchmarking framework with test scenarios and templates
    • Reworked guidance into a task-oriented tool selection and comparison matrix
    • Clarified defaults: headless-by-default and persistent daemon behavior (with cold-start note)
    • Updated proxy, extension, and session-persistence guidance per tool and recommended pairings

✏️ Tip: You can customize this high-level summary in your review settings.

- Rewrite browser-automation.md with task-based decision tree, performance
  table, feature matrix (headless/proxy/persistence), and detailed usage
- Add benchmark data to individual tool docs (agent-browser, dev-browser,
  playwright, playwriter, stagehand, crawl4ai)
- Add browser-benchmark.md agent for reproducible re-benchmarking with
  standardised test scripts for all 6 tools
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 24, 2026

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 24 minutes and 39 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Documentation reshapes browser tooling guidance: agent-browser is reframed as a daemon (headless-by-default) with performance notes and limitations; a new benchmarking suite is added; and tool-specific docs were updated with performance, proxy, and persistence details across multiple browser automation tools.

Changes

Cohort / File(s) Summary
Agent-Browser docs
agent/tools/browser/agent-browser.md
Replaced Rust/CLI phrasing with daemon-first messaging: "No server needed", "Headless by default"; added warm/cold performance metrics and a limitations note (no proxy, no extensions).
Tool selection & guidance
agent/tools/browser/browser-automation.md
Rewrote from an agent-browser-first guide to a task-oriented decision tree; reorganized into Performance Benchmarks, Feature Matrix, Detailed Usage, Visual Debugging, and Ethical Rules; replaced CLI-first workflows with multi-tool examples and snippets.
Benchmarking framework (new)
agent/tools/browser/browser-benchmark.md
Added comprehensive benchmarking doc: five standardized tests, per-tool mappings, runner scripts, templates for Playwright/dev-browser/agent-browser/Crawl4AI/Playwriter/Stagehand, results aggregation, and integration guidance.
Crawl4AI
agent/tools/browser/crawl4ai.md
Added key features (full proxy support, persistent contexts), install snippet, and performance extraction benchmarks; clarified extraction-only limitations (no clicks/forms).
Dev-Browser
agent/tools/browser/dev-browser.md
Updated performance bullets (navigate/form/extract timings), consistency metric, and headless-mode note; marketing/content-only changes.
Playwright
agent/tools/browser/playwright.md
Adjusted quick reference and install/MCP guidance, added Headless default, performance paragraph, and expanded key features (proxy, persistence, device emulation, throttling).
Playwriter
agent/tools/browser/playwriter.md
Added proxy notes, performance timings, updated icon states, and "When to use" guidance for sessions/extensions/AI collaboration.
Stagehand
agent/tools/browser/stagehand.md
Added performance subsection (timings, AI overhead), headless config guidance for benchmarks; documentation-only updates.
Chrome DevTools MCP
agent/tools/browser/chrome-devtools.md
Expanded Quick Reference, added detailed capability lists, connection methods, and pairing/usage guidance; reorganized examples and commands.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as CLI (user)
  participant Daemon as Agent Daemon
  participant Browser as Browser Engine
  participant Bench as Benchmark Runner

  CLI->>Daemon: send command (navigate / extract / screenshot)
  alt Daemon not running (cold start)
    CLI->>Daemon: start daemon
    Daemon-->>CLI: ready (3–5s cold penalty)
  end
  Daemon->>Browser: open persistent context (user_data_dir) / headless
  Browser-->>Daemon: context ready
  Daemon->>Bench: execute task / test case
  Bench->>Browser: perform actions (navigate, fill, screenshot, extract)
  Browser-->>Bench: results / artifacts
  Bench->>Daemon: aggregate results
  Daemon-->>CLI: return output (metrics, artifacts)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🧭 Daemon wakes, headless windows sleep,
Benchmarks hum while metrics creep,
Tools aligned, a task-first tune,
Results arrive beneath the moon,
Docs refreshed — automation leaps ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the main changes: documentation updates to browser tool docs with performance benchmarks and the addition of a new browser-benchmark agent.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the browser automation documentation by introducing a structured, task-oriented approach to tool selection, backed by concrete performance benchmarks and a detailed feature comparison. It also provides a dedicated benchmarking agent to ensure these performance metrics remain current, offering users a clearer understanding of each tool's strengths and optimal use cases.

Highlights

  • Documentation Overhaul: The browser-automation.md documentation has been completely rewritten to provide a task-based decision tree for selecting browser automation tools, replacing the previous default tool approach.
  • Performance Benchmarks: A new section with real performance benchmarks for all six browser tools (Playwright, dev-browser, agent-browser, Crawl4AI, Playwriter, Stagehand) has been added, including metrics for navigation, form filling, data extraction, multi-step actions, and reliability.
  • Feature Matrix: A comprehensive feature matrix has been introduced, detailing capabilities like headless mode, session persistence, proxy support, extensions, and AI optimization across all tools.
  • Benchmarking Agent: A new browser-benchmark.md file introduces a reusable benchmarking agent with standardized test scripts, enabling consistent re-evaluation of tool performance.
  • Individual Tool Updates: Each browser tool's documentation (agent-browser.md, dev-browser.md, playwright.md, playwriter.md, stagehand.md, crawl4ai.md) has been updated with specific performance statistics, limitations, and setup notes derived from hands-on testing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 406 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Jan 24 04:03:37 UTC 2026: Code review monitoring started
Sat Jan 24 04:03:38 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 406
Sat Jan 24 04:03:38 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Jan 24 04:03:40 UTC 2026: Codacy analysis completed with auto-fixes

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 406
  • VULNERABILITIES: 0

Generated on: Sat Jan 24 04:04:56 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a fantastic and comprehensive update to the browser automation documentation. It shifts from a tool-centric to a task-centric approach, which is much more helpful for users. The inclusion of performance benchmarks, a feature matrix, and detailed examples for all tools is a huge improvement. The new browser-benchmark.md agent is also a great addition for ensuring the documentation can be kept up-to-date with reproducible metrics.

My review includes a few suggestions to further improve the clarity and correctness of the documentation and benchmark scripts. These include clarifying a benchmark metric, fixing a potentially confusing code example, and improving the robustness of the new benchmark scripts. Overall, this is excellent work.

Comment on lines +125 to +126
// Later: restore state
const context = await browser.newContext({ storageState: 'state.json' });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The code example for restoring Playwright state is misleading. await browser.close() is called on line 123, which would make the subsequent call to browser.newContext() on line 126 fail because the browser object is disconnected. To make it clear that state restoration happens in a separate process, the browser needs to be re-launched.

Suggested change
// Later: restore state
const context = await browser.newContext({ storageState: 'state.json' });
// Later, in a new browser session:
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({ storageState: 'state.json' });

Comment on lines +286 to +350
```bash
#!/bin/bash
# bench-agent-browser.sh

TESTS=("navigate" "formFill" "extract" "multiStep")
declare -A RESULTS

bench_navigate() {
local start end
start=$(python3 -c 'import time; print(time.time())')
agent-browser open "https://the-internet.herokuapp.com/" 2>/dev/null
agent-browser screenshot /tmp/bench-ab-nav.png 2>/dev/null
end=$(python3 -c 'import time; print(time.time())')
echo "$(python3 -c "print(f'{$end - $start:.2f}')")"
agent-browser close 2>/dev/null
}

bench_formFill() {
local start end
start=$(python3 -c 'import time; print(time.time())')
agent-browser open "https://the-internet.herokuapp.com/login" 2>/dev/null
agent-browser snapshot -i 2>/dev/null
agent-browser fill '#username' 'tomsmith' 2>/dev/null
agent-browser fill '#password' 'SuperSecretPassword!' 2>/dev/null
agent-browser click 'button[type="submit"]' 2>/dev/null
agent-browser wait url '**/secure' 2>/dev/null
end=$(python3 -c 'import time; print(time.time())')
echo "$(python3 -c "print(f'{$end - $start:.2f}')")"
agent-browser close 2>/dev/null
}

bench_extract() {
local start end
start=$(python3 -c 'import time; print(time.time())')
agent-browser open "https://the-internet.herokuapp.com/challenging_dom" 2>/dev/null
agent-browser eval "JSON.stringify([...document.querySelectorAll('table tbody tr')].slice(0,5).map(r=>r.textContent.trim()))" 2>/dev/null
end=$(python3 -c 'import time; print(time.time())')
echo "$(python3 -c "print(f'{$end - $start:.2f}')")"
agent-browser close 2>/dev/null
}

bench_multiStep() {
local start end
start=$(python3 -c 'import time; print(time.time())')
agent-browser open "https://the-internet.herokuapp.com/" 2>/dev/null
agent-browser click 'a[href="/abtest"]' 2>/dev/null
agent-browser wait url '**/abtest' 2>/dev/null
agent-browser get url 2>/dev/null
end=$(python3 -c 'import time; print(time.time())')
echo "$(python3 -c "print(f'{$end - $start:.2f}')")"
agent-browser close 2>/dev/null
}

echo "=== agent-browser Benchmark ==="
for test in "${TESTS[@]}"; do
echo -n "$test: "
times=()
for i in 1 2 3; do
t=$(bench_"$test")
times+=("$t")
echo -n "${t}s "
done
echo ""
done
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The agent-browser benchmark script redirects stderr to /dev/null for all agent-browser commands. This suppresses all errors. If a command fails, the script will continue silently, leading to inaccurate timing measurements and making it very difficult to debug failures. Please remove the 2>/dev/null and add proper error handling. A simple way is to add set -e at the beginning of the script, which will cause it to exit immediately if a command fails.

| **Form Fill** (4 fields) | **0.90s** | 1.34s | 1.37s | N/A | 2.24s | 2.58s |
| **Data Extraction** (5 items) | 1.33s | **1.08s** | 1.53s | 2.53s | 2.68s | 3.48s |
| **Multi-step** (click + nav) | **1.49s** | 1.49s | 3.06s | N/A | 4.37s | 4.48s |
| **Reliability** (avg, 3 runs) | **0.64s** | 1.07s | 0.66s | 0.52s | 1.96s | 1.74s |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The "Reliability" metric in the benchmark table could be clearer. The name suggests it measures consistency (e.g., standard deviation), but the benchmark agent defines it as the average time of three consecutive runs of the 'Navigate + Screenshot' test. To avoid ambiguity, consider renaming it to something like "Avg. Consecutive Nav" or adding a footnote to clarify what this metric represents.


# 2. Add to MCP config (OpenCode)
# "playwriter": { "type": "local", "command": ["npx", "playwriter@latest"] }
// Structured extraction with schema

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Stagehand example uses zod for schema definition, but there's no mention that it's an external dependency that needs to be installed. Please add a note about this dependency to ensure the example is complete and runnable for users.

Suggested change
// Structured extraction with schema
// Structured extraction with schema (requires `npm install zod`)

Comment on lines +303 to +315
bench_formFill() {
local start end
start=$(python3 -c 'import time; print(time.time())')
agent-browser open "https://the-internet.herokuapp.com/login" 2>/dev/null
agent-browser snapshot -i 2>/dev/null
agent-browser fill '#username' 'tomsmith' 2>/dev/null
agent-browser fill '#password' 'SuperSecretPassword!' 2>/dev/null
agent-browser click 'button[type="submit"]' 2>/dev/null
agent-browser wait url '**/secure' 2>/dev/null
end=$(python3 -c 'import time; print(time.time())')
echo "$(python3 -c "print(f'{$end - $start:.2f}')")"
agent-browser close 2>/dev/null
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The bench_formFill function for agent-browser uses CSS selectors (#username, #password) for filling the form. However, the main documentation for agent-browser strongly recommends using the snapshot -i and ref pattern for AI agent robustness. To maintain consistency with the documented best practices and to accurately benchmark the recommended workflow, please consider updating this benchmark to use element references (@e...) instead of CSS selectors.

Comment on lines 224 to 282
```typescript
// Run via: cd ~/.aidevops/dev-browser/skills/dev-browser && bun x tsx bench.ts
import { connect, waitForPageLoad } from "@/client.js";

const TESTS = {
async navigate(page: any) {
await page.goto('https://the-internet.herokuapp.com/');
await waitForPageLoad(page);
await page.screenshot({ path: '/tmp/bench-dev-nav.png' });
},
async formFill(page: any) {
await page.goto('https://the-internet.herokuapp.com/login');
await waitForPageLoad(page);
await page.fill('#username', 'tomsmith');
await page.fill('#password', 'SuperSecretPassword!');
await page.click('button[type="submit"]');
await page.waitForURL('**/secure');
},
async extract(page: any) {
await page.goto('https://the-internet.herokuapp.com/challenging_dom');
await waitForPageLoad(page);
const rows = await page.$$eval('table tbody tr', (trs: any[]) =>
trs.slice(0, 5).map(tr => tr.textContent.trim())
);
if (rows.length < 5) throw new Error('Expected 5+ rows');
},
async multiStep(page: any) {
await page.goto('https://the-internet.herokuapp.com/');
await waitForPageLoad(page);
await page.click('a[href="/abtest"]');
await page.waitForURL('**/abtest');
}
};

async function run() {
const client = await connect("http://localhost:9222");
const results: Record<string, string[]> = {};

for (const [name, fn] of Object.entries(TESTS)) {
const times: string[] = [];
for (let i = 0; i < 3; i++) {
const page = await client.page("bench");
const start = performance.now();
try {
await fn(page);
times.push(((performance.now() - start) / 1000).toFixed(2));
} catch (e: any) {
times.push(`ERR: ${e.message}`);
}
}
results[name] = times;
}

await client.disconnect();
console.log(JSON.stringify(results, null, 2));
}

run();
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The dev-browser benchmark script uses the any type for the page object and for the results of $$eval. This undermines the benefits of using TypeScript, such as type safety and autocompletion. If type definitions are available (e.g., from @playwright/test), please consider using them (e.g., page: Page). This will make the code more robust and easier to maintain.

For example:

import { Page } from 'playwright'; // or from a relevant import
// ...
async navigate(page: Page) {
//...
const rows = await page.$$eval('table tbody tr', (trs: HTMLTableRowElement[]) =>
//...

@augmentcode
Copy link

augmentcode bot commented Jan 24, 2026

🤖 Augment PR Summary

Summary: This PR refreshes the browser tooling documentation to help agents pick the right automation/extraction tool and to make performance comparisons reproducible.

Changes:

  • Rewrites .agent/tools/browser/browser-automation.md around task-based decision trees (interactive automation vs extraction) instead of a single “default tool”.
  • Adds a consolidated performance benchmark table (median of 3 runs) plus a feature matrix covering headless mode, persistence, proxies, extensions, and automation/extraction capabilities.
  • Expands per-tool usage examples for Playwright, dev-browser, agent-browser, Crawl4AI, Playwriter, and Stagehand, including notes on persistence and proxy setup.
  • Adds .agent/tools/browser/browser-benchmark.md, a reusable benchmarking subagent with standardized scripts and methodology to rerun benchmarks as tool versions evolve.
  • Updates individual tool docs with measured performance/limitations/install notes (e.g., proxy support, cold-start behavior, headed vs headless constraints).

Technical Notes: The docs highlight that most tools are wrappers over Playwright (overhead comes from wrapper/runtime/LLM calls), while Crawl4AI is purpose-built for extraction; benchmark scripts target the-internet.herokuapp.com for repeatability.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 6 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

## Tool Selection: Choose by Task

**ALWAYS use agent-browser first** for any browser automation task. It's CLI-first, AI-optimized, and requires no server setup.
All tools run **headless by default** (no visible window, no mouse/keyboard competition).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement that "All tools run headless by default" appears inconsistent with Playwriter later being described as always headed (and the feature matrix listing Playwriter as non-headless), which may mislead tool selection/benchmark interpretation.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

| **Form Fill** (4 fields) | **0.90s** | 1.34s | 1.37s | N/A | 2.24s | 2.58s |
| **Data Extraction** (5 items) | 1.33s | **1.08s** | 1.53s | 2.53s | 2.68s | 3.48s |
| **Multi-step** (click + nav) | **1.49s** | 1.49s | 3.06s | N/A | 4.37s | 4.48s |
| **Reliability** (avg, 3 runs) | **0.64s** | 1.07s | 0.66s | 0.52s | 1.96s | 1.74s |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the benchmarks table, the Reliability row bolding looks incorrect: Crawl4AI is listed at 0.52s which is lower than the bolded 0.64s.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

```
// Save state for reuse
await page.context().storageState({ path: 'state.json' });
await browser.close();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Playwright example closes browser and then immediately uses it again to create a new context (browser.newContext), so the “restore state” snippet can’t work as written.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

# 2. Add to MCP config (OpenCode)
# "playwriter": { "type": "local", "command": ["npx", "playwriter@latest"] }
// Structured extraction with schema
const data = await stagehand.extract("get product details", z.object({
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Stagehand example uses z.object(...) but doesn’t show an import/definition for z, so readers copying this snippet may hit a runtime error.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

- **Setup**: Auto-installed via `setup.sh` → `setup_browser_tools()`
- **Purpose**: Cross-browser testing and automation (fastest browser engine)
- **Install**: `npm install playwright && npx playwright install`
- **MCP**: `npx @playwright/mcp` (with `--proxy-server`, `--storage-state` options)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Quick Reference says the MCP server is npx @playwright/mcp, but later sections still reference npx playwright-mcp@latest; consider aligning the docs to one command to avoid users starting the wrong server.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

agent-browser fill '#username' 'tomsmith' 2>/dev/null
agent-browser fill '#password' 'SuperSecretPassword!' 2>/dev/null
agent-browser click 'button[type="submit"]' 2>/dev/null
agent-browser wait url '**/secure' 2>/dev/null
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agent-browser wait url '**/secure' doesn’t match the agent-browser wait syntax documented elsewhere in this repo (it uses wait --url ...), so the benchmark script may not run as-is.

Other Locations
  • .agent/tools/browser/browser-benchmark.md:332

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @.agent/tools/browser/browser-automation.md:
- Around line 257-280: The example uses a Zod schema but never imports Zod; add
an import for z from "zod" at the top of the snippet so the call to
z.object(...) in stagehand.extract(...) works; specifically, update the module
imports so that z is available when calling Stagehand, stagehand.init, and
stagehand.extract (referencing Stagehand and the z.object schema in the
snippet).
- Around line 21-36: Update the header sentence that currently reads "All tools
run **headless by default**" to accurately scope Playwriter: change it to
something like "Most tools run **headless by default**; Playwriter is always
headed because it attaches to an existing browser session." Specifically mention
Playwriter by name to exclude it from the blanket statement and keep the rest of
the decision flow (Playwright, dev-browser, agent-browser, Stagehand) intact so
readers know which tools remain headless by default.

In @.agent/tools/browser/crawl4ai.md:
- Around line 39-40: Update the statement about Crawl4AI capabilities to say
that Crawl4AI (v0.8.0) does support form filling and click automation via
CrawlerRunConfig(js_code=...) for custom JS interactions and via the C4A-Script
DSL (commands: CLICK, TYPE, CLEAR, SET, PRESS) for built-in automation; also
mention that the old proxy="socks5://..." syntax is deprecated in favor of the
proxy_config dict format and add a brief note that use_persistent_context=True
can cause crashes when used with concurrent arun_many in some environments.

In @.agent/tools/browser/playwright.md:
- Around line 21-39: The docs reference an outdated package name; update the
Installation and MCP config sections to use the official package name
"@playwright/mcp" instead of "playwright-mcp@latest" (i.e., replace the
occurrences in the Installation line and the MCP args/config line so the Quick
Reference MCP entry and the install command match "@playwright/mcp").
🧹 Nitpick comments (3)
.agent/tools/browser/crawl4ai.md (1)

42-42: Add provenance for benchmark numbers (date, hardware, methodology).
These metrics will age quickly without explicit context. Please add benchmark date, environment, and a reference to the benchmark scripts so readers can re-run/validate.

♻️ Suggested doc tweak
-**Performance**: Structured extraction 2.5s (30 items), multi-page 3.8s (3 URLs), reliability 0.52s avg (fastest).
+**Performance**: Structured extraction 2.5s (30 items), multi-page 3.8s (3 URLs), reliability 0.52s avg (fastest).
+**Benchmarked**: 2026-01-24, macOS ARM64, headless, median of 3 runs. Reproduce via `browser-benchmark.md`.
.agent/tools/browser/browser-automation.md (2)

51-62: Add benchmark date + tool versions for reproducibility.

These numbers are great, but they’ll age quickly without a date and tool/version context. Consider adding a short “Benchmark date + tool versions” line (or link to the benchmark agent output) to preserve zero‑debt documentation quality.


291-296: Label networksetup as macOS-only.

The networksetup command is macOS-specific; consider annotating the method as “macOS” to prevent confusion on other platforms.

Comment on lines +39 to +40
- Full proxy support (HTTP, SOCKS5, residential)
- Persistent context with `user_data_dir`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Crawl4AI documentation proxy SOCKS5 support user_data_dir persistent context

💡 Result:

  • SOCKS5 proxy support: Crawl4AI’s docs show you can pass a SOCKS proxy URL directly, e.g. proxy="socks5://proxy.example.com:1080" when constructing AsyncWebCrawler. [1]
  • Preferred (new) way to configure proxies: In current docs, proxy is deprecated in favor of proxy_config, e.g. {"server": "...", "username": "...", "password": "..."}. [2]
  • Persistent context / profile (keeps cookies & sessions across runs): Use BrowserConfig(use_persistent_context=True, user_data_dir="..."). Docs note use_persistent_context=True enables a persistent browser context and user_data_dir is the directory where profile/cookies are stored and “must be set” for permanent sessions. [2]
  • Potential gotcha with persistent context + concurrency: There’s a reported crash when use_persistent_context=True is used with concurrent crawling (arun_many) in some versions/environments. [3]

Minimal examples

from crawl4ai import AsyncWebCrawler, BrowserConfig

# SOCKS5 (simple)
async with AsyncWebCrawler(proxy="socks5://proxy.example.com:1080") as crawler:
    r = await crawler.arun("https://example.com")
# Persistent login/session
browser_cfg = BrowserConfig(
    use_persistent_context=True,
    user_data_dir="./crawl4ai_profile",
)

async with AsyncWebCrawler(browser_config=browser_cfg) as crawler:
    r = await crawler.arun("https://example.com")

Sources: Proxy & Security docs (SOCKS example) [1]; API/parameters docs (proxy deprecation, proxy_config, use_persistent_context, user_data_dir) [2]; GitHub issue re: persistent context + concurrency [3].


🌐 Web query:

Crawl4AI form filling click automation capabilities current version

💡 Result:

Current version (stable)

  • Crawl4AI v0.8.0 (released Jan 16, 2026) is the latest tagged release on GitHub, with install instructions shown in the release notes (pip install crawl4ai==0.8.0). [1]

Form-filling & click automation capabilities (current)

Crawl4AI supports “form filling + click” style automation in two main ways:

  1. Programmatic page interaction via CrawlerRunConfig
  • You can run arbitrary JavaScript in-page using CrawlerRunConfig(js_code=...), including:
    • Clicking elements (e.g., document.querySelector(...).click()),
    • Filling inputs (setting .value),
    • Submitting forms (form.submit()),
    • Running multi-step flows by reusing a session (session_id) and continuing without full navigation (js_only=True), often paired with wait conditions (wait_for="css:..." or wait_for="js:..."). [2]
  1. C4A-Script DSL (built-in web automation scripting)
  • Crawl4AI includes C4A-Script, a human-readable automation DSL with explicit commands for interaction, including:
    • CLICK (also DOUBLE_CLICK, RIGHT_CLICK, DRAG, SCROLL),
    • TYPE, PRESS, CLEAR, SET (for input/value setting),
    • WAIT,
    • simple control flow like IF / REPEAT,
    • example “login flow” and “form automation” scripts showing fill + submit patterns. [3]

Sources: GitHub release v0.8.0 [1]; Crawl4AI “Page Interaction” docs [2]; Crawl4AI “C4A-Script” docs [3].


Line 43 is factually incorrect: Crawl4AI DOES support form filling and click automation.

Per current Crawl4AI v0.8.0 docs (Jan 16, 2026), form interaction is supported via:

  • JavaScript execution: CrawlerRunConfig(js_code=...) for custom click/fill logic
  • C4A-Script DSL: Built-in automation with CLICK, TYPE, CLEAR, SET, PRESS commands and session state management

Lines 39-40 are substantially correct (SOCKS5 proxy + persistent context confirmed), but note:

  • Proxy parameter deprecated: Old proxy="socks5://..." syntax works but is deprecated in favor of proxy_config dict format
  • Persistent context gotcha: Using use_persistent_context=True with concurrent crawling (arun_many) may cause crashes in some environments

Update line 43 to reflect actual capabilities and cite Crawl4AI v0.8.0 docs.

🤖 Prompt for AI Agents
In @.agent/tools/browser/crawl4ai.md around lines 39 - 40, Update the statement
about Crawl4AI capabilities to say that Crawl4AI (v0.8.0) does support form
filling and click automation via CrawlerRunConfig(js_code=...) for custom JS
interactions and via the C4A-Script DSL (commands: CLICK, TYPE, CLEAR, SET,
PRESS) for built-in automation; also mention that the old proxy="socks5://..."
syntax is deprecated in favor of the proxy_config dict format and add a brief
note that use_persistent_context=True can cause crashes when used with
concurrent arun_many in some environments.

…ion benchmarks

- browser-automation.md: Add parallel instance table (Playwright 5 contexts
  in 2.1s, agent-browser 3 sessions in 2.0s, Crawl4AI 1.7x speedup),
  extension support matrix, Chrome DevTools MCP companion section
- browser-benchmark.md: Add parallel, extension loading, visual verification,
  and DevTools benchmark scripts
- chrome-devtools.md: Rewrite to clarify it's a companion tool (not a browser),
  add connection methods, category toggles, best pairings
- Fix Feature Matrix: Playwright supports extensions via persistent context
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 406 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Jan 24 04:20:10 UTC 2026: Code review monitoring started
Sat Jan 24 04:20:10 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 406
Sat Jan 24 04:20:10 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Jan 24 04:20:12 UTC 2026: Codacy analysis completed with auto-fixes

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 406
  • VULNERABILITIES: 0

Generated on: Sat Jan 24 04:21:27 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

…idance

- ARIA snapshots (~0.01s, 50-200 tokens) beat screenshots (~0.05s, 1K tokens)
  for AI automation - screenshots only needed for visual debugging
- Add AI Page Understanding section with speed/token comparison table
- Add code example for fast page understanding without vision model
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.agent/tools/browser/chrome-devtools.md (1)

51-201: Remove non-existent throttleRequest() and throttleRequests() methods from the documentation.

The chrome-devtools-mcp package does not expose per-request throttling APIs. Only global network throttling is available via the emulate tool with networkConditions parameter. The documented code examples for throttleRequest() and throttleRequests() will fail—these methods do not exist in the package.

Additionally, if mentioning per-request throttling as a Chrome DevTools feature, update the Chrome version requirement from 136+ to Chrome 145+ (or note it's available in earlier Canary versions behind a flag). Per-request throttling is a DevTools UI capability, not an MCP tool method.

Replace the "Individual Request Throttling" section with accurate documentation of the emulate tool's global network throttling capabilities, or remove this section entirely if per-request throttling is out of scope for this MCP tool.

🤖 Fix all issues with AI agents
In @.agent/tools/browser/browser-automation.md:
- Around line 259-264: The BrowserConfig usage is using the deprecated proxy
parameter; update the BrowserConfig instantiation to use proxy_config instead
(replace proxy="socks5://127.0.0.1:1080" with
proxy_config={"server":"socks5://127.0.0.1:1080"}) or move proxy configuration
into a per-run CrawlerRunConfig (create run_config =
CrawlerRunConfig(proxy_config={"server":"socks5://127.0.0.1:1080"},
extraction_strategy=JsonCssExtractionStrategy(schema))) to get per-crawl
granularity, and also replace or remove the placeholder
user_data_dir="/path/to/profile" in BrowserConfig if you don’t need persistent
context or set it to a real directory; reference BrowserConfig, proxy_config,
CrawlerRunConfig, run_config, extraction_strategy, and JsonCssExtractionStrategy
when making the changes.
♻️ Duplicate comments (4)
.agent/tools/browser/browser-benchmark.md (4)

228-246: TypeScript any weakens benchmark reliability.
Line 228–246 uses page: any and (trs: any[]), which undermines TS safety. Consider using a proper Page type and typed DOM elements.


293-314: Don’t suppress errors in agent-browser benchmarks.
Lines 296–314 redirect stderr to /dev/null, which hides failures and can skew timing results. Prefer surfacing errors and failing fast (e.g., set -euo pipefail).


303-312: Align agent-browser form-fill with snapshot/ref best practices.
Line 307–310 uses CSS selectors after snapshot -i. To match recommended usage and ensure robustness, use element refs (@e...).


311-332: Agent-browser wait syntax looks inconsistent.
Lines 311 and 332 use agent-browser wait url ..., while other docs use wait --url .... This may fail if the CLI expects the flag form.

🧹 Nitpick comments (1)
.agent/tools/browser/browser-benchmark.md (1)

121-127: Add a timeout to the dev-browser health check.

curl -s http://localhost:9222/json/version can hang indefinitely if the port is filtered or a proxy stalls. Add a short timeout to keep the prereqs check snappy. Example: curl -s --max-time 2 ....

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 406 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Jan 24 04:22:57 UTC 2026: Code review monitoring started
Sat Jan 24 04:22:58 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 406
Sat Jan 24 04:22:58 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Jan 24 04:23:00 UTC 2026: Codacy analysis completed with auto-fixes

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 406
  • VULNERABILITIES: 0

Generated on: Sat Jan 24 04:24:06 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

… all browser subagents

Ensure consistent coverage across all 6 browser tool subagents:
- playwright.md: parallel contexts, extension via persistent ctx, ARIA snapshots, DevTools pairing
- dev-browser.md: named pages (shared), extension install in profile, ARIA snapshots, DevTools on :9222
- agent-browser.md: --session parallel (2.0s for 3), snapshot refs, no extensions/DevTools
- playwriter.md: multi-tab (shared), full extension access, password managers work, DevTools via autoConnect
- stagehand.md: multiple instances (slow), possible extensions, built-in AI understanding, DevTools overhead warning
- crawl4ai.md: arun_many parallel (1.7x), LLM-ready output, no extensions/DevTools
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 406 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Jan 24 04:28:09 UTC 2026: Code review monitoring started
Sat Jan 24 04:28:10 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 406
Sat Jan 24 04:28:10 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Jan 24 04:28:12 UTC 2026: Codacy analysis completed with auto-fixes

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 406
  • VULNERABILITIES: 0

Generated on: Sat Jan 24 04:29:26 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Replace simple task-type tree with comprehensive decision tree covering:
- Extension/password manager routing (Playwriter > dev-browser > Playwright CLI)
- Parallel session routing (Playwright 5ctx/2.1s > agent-browser 3/2.0s > Crawl4AI)
- Persistent login routing (dev-browser profile > Playwright storageState)
- Proxy/VPN routing (Playwright/Crawl4AI direct > Playwriter extension > system)
- Dev testing routing (dev-browser persistent > Playwright parallel > CI/CD)
- AI page understanding tree (ARIA > text > elements > screenshot)
- Explicit preferences: fastest first, ARIA over screenshots, headless default
…triggers

Tell agents WHEN to use a browser (dev testing, website interaction,
deployment verification, debugging) and HOW to choose a tool (quick
defaults + pointer to full decision tree). Update progressive disclosure
table to point to browser-automation.md instead of individual tools.
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 406 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Jan 24 04:31:38 UTC 2026: Code review monitoring started
Sat Jan 24 04:31:39 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 406
Sat Jan 24 04:31:39 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Jan 24 04:31:41 UTC 2026: Codacy analysis completed with auto-fixes

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 406
  • VULNERABILITIES: 0

Generated on: Sat Jan 24 04:32:55 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

…ature matrix

- Add dedicated section with performance table (6 tools, 5 test types)
- Add feature matrix (headless, proxy, extensions, password managers, parallel, persistence)
- Add tool selection guide (by need: speed, persistence, extensions, extraction, CI/CD)
- Add AI page understanding comparison (ARIA vs text vs screenshot token costs)
- Update MCP category list with all 7 browser tools and benchmark highlights
- Update Agent Guides link description
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 406 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Jan 24 04:32:44 UTC 2026: Code review monitoring started
Sat Jan 24 04:32:45 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 406
Sat Jan 24 04:32:45 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Jan 24 04:32:47 UTC 2026: Codacy analysis completed with auto-fixes

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 406
  • VULNERABILITIES: 0

Generated on: Sat Jan 24 04:34:05 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 406 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Jan 24 04:34:19 UTC 2026: Code review monitoring started
Sat Jan 24 04:34:20 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 406
Sat Jan 24 04:34:20 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Jan 24 04:34:22 UTC 2026: Codacy analysis completed with auto-fixes

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 406
  • VULNERABILITIES: 0

Generated on: Sat Jan 24 04:35:36 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

- Fix headless statement to exclude Playwriter (always headed)
- Add Zod import to Stagehand example
- Fix Playwright MCP package name to @playwright/mcp
- Update Crawl4AI: supports js_code/C4A-Script interactions, fix proxy
  to proxy_config dict format, note arun_many crash with persistent ctx
- Remove non-existent throttleRequest/throttleRequests methods from
  chrome-devtools.md, replace with accurate emulate tool documentation
- Add benchmark date/environment to performance tables
- Label networksetup as macOS-only in proxy table
- Fix TypeScript types in dev-browser benchmark (Page instead of any)
- Add curl timeout to dev-browser health check
@sonarqubecloud
Copy link

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 406 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Jan 24 04:37:55 UTC 2026: Code review monitoring started
Sat Jan 24 04:37:55 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 406
Sat Jan 24 04:37:55 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Jan 24 04:37:57 UTC 2026: Codacy analysis completed with auto-fixes

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 406
  • VULNERABILITIES: 0

Generated on: Sat Jan 24 04:39:13 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@marcusquinn marcusquinn merged commit 8402512 into main Jan 24, 2026
9 checks passed
@marcusquinn marcusquinn deleted the chore/browser-docs-update branch January 24, 2026 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant