New Env: Online-Mind2Web #168

Genteki · 2025-10-13T19:38:43Z

Original #156

A hud remote-browser based environment for Online-Mind2Web dataset.

Note

Introduces a Dockerized HUD environment for Online-Mind2Web with persistent context, multiple cloud browser providers, Playwright-based executor, setup/evaluation hubs, telemetry, and a test task.

Environment (Dockerized MCP server):
- Adds environments/online_mind2web/ with Dockerfile, pyproject.toml, and README.md to run a HUD remote-browser MCP server with persistent context (hud_controller.context) and main server (hud_controller.server).
- Exposes telemetry via resource telemetry://live and progress-enabled initialization; supports initial URL and graceful shutdown.
Providers:
- Implements AnchorBrowserProvider, BrowserBaseProvider, HyperBrowserProvider, and SteelProvider under src/hud_controller/providers/ with BrowserProvider base, status/telemetry, live view URLs, and proxy helper (helper/proxy.py).
- Provider registry and get_provider for BROWSER_PROVIDER selection.
Tools/Executor:
- Adds BrowserExecutor to drive Playwright page for clicks/keys/scroll/drag and screenshots.
- Wraps computer-use tools with recording: AnthropicComputerToolWithRecord and OpenAIComputerToolWithRecord, saving screenshots to /screenshot and actions to /action_history.
Setup & Evaluation Hubs:
- setup.navigate_to_url for navigation via Playwright.
- Evaluators in evaluate/: autonomous, webjudge, and overall_judge (aggregates), leveraging OpenAI (gpt-4o) with screenshot(s) and action history.
Dataset runner:
- Adds test_task.json and README instructions for running single tasks or HF dataset (Genteki/Online-Mind2Web).

^{Written by Cursor Bugbot for commit 4c96563. This will update automatically on new commits. Configure here.}

promptless · 2025-10-13T19:57:42Z

📝 Documentation updates detected!

New suggestion: Add comprehensive Online-Mind2Web environment documentation for PR #168
Updated existing suggestion: Add comprehensive Mind2Web evaluation documentation (updated for PR #156)

Parth220

Looks great!

I'd love to break out tools with action history records as a standard thing in the SDK in the future, but this is a great implementation for the OnlineMind2Web environment

Parth220 · 2025-11-04T19:45:39Z

environments/online_mind2web/Dockerfile

+# Note: Environment variables for browser providers should be set at runtime:
+# - BROWSER_PROVIDER: anchorbrowser, steel, browserbase, hyperbrowser, kernel
+# - Provider-specific API keys: ANCHOR_API_KEY, STEEL_API_KEY, etc.
+# - GCP_CREDENTIALS_JSON: For Google Sheets functionality (if needed)


nit: remove this line, not relevant in OM2W

Parth220 · 2025-11-05T01:22:04Z

environments/online_mind2web/src/hud_controller/tools/anthropic.py

+class AnthropicComputerToolWithRecord(AnthropicComputerTool):
+    def __init__(


We should likely make this a first class tool in the SDK, but could be on a case by case basis with each environment.

Seems super valuable for LLM/VLM as judge.

Create Online-Mind2Web folder

c9666f9

This comment was marked as outdated.

Sign in to view

reformat

96e5448

This comment was marked as outdated.

Sign in to view

Add evaluate tool for choice

88a041b

This comment was marked as outdated.

Sign in to view

Genteki changed the title ~~Online-Mind2Web Folder~~ New Env: Online-Mind2Web Oct 23, 2025

Genteki and others added 2 commits October 24, 2025 03:27

Merge branch 'hud-evals:main' into Online-Mind2Web

f70d067

Deleted files and fixed bugs

4c96563

Parth220 approved these changes Nov 5, 2025

View reviewed changes

Parth220 merged commit 8a2485a into hud-evals:main Nov 5, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Env: Online-Mind2Web #168

New Env: Online-Mind2Web #168

Uh oh!

Genteki commented Oct 13, 2025 •

edited by cursor bot

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

promptless bot commented Oct 13, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Parth220 left a comment

Uh oh!

Parth220 Nov 4, 2025

Uh oh!

Parth220 Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		class AnthropicComputerToolWithRecord(AnthropicComputerTool):
		def __init__(

New Env: Online-Mind2Web #168

New Env: Online-Mind2Web #168

Uh oh!

Conversation

Genteki commented Oct 13, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

promptless bot commented Oct 13, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Parth220 left a comment

Choose a reason for hiding this comment

Uh oh!

Parth220 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Parth220 Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Genteki commented Oct 13, 2025 •

edited by cursor bot

Loading