Skip to content

Conversation

@Genteki
Copy link
Contributor

@Genteki Genteki commented Oct 13, 2025

Original #156

A hud remote-browser based environment for Online-Mind2Web dataset.


Note

Introduces a Dockerized HUD environment for Online-Mind2Web with persistent context, multiple cloud browser providers, Playwright-based executor, setup/evaluation hubs, telemetry, and a test task.

  • Environment (Dockerized MCP server):
    • Adds environments/online_mind2web/ with Dockerfile, pyproject.toml, and README.md to run a HUD remote-browser MCP server with persistent context (hud_controller.context) and main server (hud_controller.server).
    • Exposes telemetry via resource telemetry://live and progress-enabled initialization; supports initial URL and graceful shutdown.
  • Providers:
    • Implements AnchorBrowserProvider, BrowserBaseProvider, HyperBrowserProvider, and SteelProvider under src/hud_controller/providers/ with BrowserProvider base, status/telemetry, live view URLs, and proxy helper (helper/proxy.py).
    • Provider registry and get_provider for BROWSER_PROVIDER selection.
  • Tools/Executor:
    • Adds BrowserExecutor to drive Playwright page for clicks/keys/scroll/drag and screenshots.
    • Wraps computer-use tools with recording: AnthropicComputerToolWithRecord and OpenAIComputerToolWithRecord, saving screenshots to /screenshot and actions to /action_history.
  • Setup & Evaluation Hubs:
    • setup.navigate_to_url for navigation via Playwright.
    • Evaluators in evaluate/: autonomous, webjudge, and overall_judge (aggregates), leveraging OpenAI (gpt-4o) with screenshot(s) and action history.
  • Dataset runner:
    • Adds test_task.json and README instructions for running single tasks or HF dataset (Genteki/Online-Mind2Web).

Written by Cursor Bugbot for commit 4c96563. This will update automatically on new commits. Configure here.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@promptless
Copy link
Contributor

promptless bot commented Oct 13, 2025

📝 Documentation updates detected!

New suggestion: Add comprehensive Online-Mind2Web environment documentation for PR #168
Updated existing suggestion: Add comprehensive Mind2Web evaluation documentation (updated for PR #156)

cursor[bot]

This comment was marked as outdated.

@Genteki Genteki changed the title Online-Mind2Web Folder New Env: Online-Mind2Web Oct 23, 2025
Copy link
Contributor

@Parth220 Parth220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I'd love to break out tools with action history records as a standard thing in the SDK in the future, but this is a great implementation for the OnlineMind2Web environment

# Note: Environment variables for browser providers should be set at runtime:
# - BROWSER_PROVIDER: anchorbrowser, steel, browserbase, hyperbrowser, kernel
# - Provider-specific API keys: ANCHOR_API_KEY, STEEL_API_KEY, etc.
# - GCP_CREDENTIALS_JSON: For Google Sheets functionality (if needed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this line, not relevant in OM2W

Comment on lines +16 to +17
class AnthropicComputerToolWithRecord(AnthropicComputerTool):
def __init__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should likely make this a first class tool in the SDK, but could be on a case by case basis with each environment.

Seems super valuable for LLM/VLM as judge.

@Parth220 Parth220 merged commit 8a2485a into hud-evals:main Nov 5, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants