-
Notifications
You must be signed in to change notification settings - Fork 43
New Env: Online-Mind2Web #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
📝 Documentation updates detected! New suggestion: Add comprehensive Online-Mind2Web environment documentation for PR #168 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I'd love to break out tools with action history records as a standard thing in the SDK in the future, but this is a great implementation for the OnlineMind2Web environment
| # Note: Environment variables for browser providers should be set at runtime: | ||
| # - BROWSER_PROVIDER: anchorbrowser, steel, browserbase, hyperbrowser, kernel | ||
| # - Provider-specific API keys: ANCHOR_API_KEY, STEEL_API_KEY, etc. | ||
| # - GCP_CREDENTIALS_JSON: For Google Sheets functionality (if needed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove this line, not relevant in OM2W
| class AnthropicComputerToolWithRecord(AnthropicComputerTool): | ||
| def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should likely make this a first class tool in the SDK, but could be on a case by case basis with each environment.
Seems super valuable for LLM/VLM as judge.
Original #156
A hud remote-browser based environment for Online-Mind2Web dataset.
Note
Introduces a Dockerized HUD environment for Online-Mind2Web with persistent context, multiple cloud browser providers, Playwright-based executor, setup/evaluation hubs, telemetry, and a test task.
environments/online_mind2web/withDockerfile,pyproject.toml, andREADME.mdto run a HUD remote-browser MCP server with persistent context (hud_controller.context) and main server (hud_controller.server).resourcetelemetry://liveand progress-enabled initialization; supports initial URL and graceful shutdown.AnchorBrowserProvider,BrowserBaseProvider,HyperBrowserProvider, andSteelProviderundersrc/hud_controller/providers/withBrowserProviderbase, status/telemetry, live view URLs, and proxy helper (helper/proxy.py).get_providerforBROWSER_PROVIDERselection.BrowserExecutorto drive Playwrightpagefor clicks/keys/scroll/drag and screenshots.AnthropicComputerToolWithRecordandOpenAIComputerToolWithRecord, saving screenshots to/screenshotand actions to/action_history.setup.navigate_to_urlfor navigation via Playwright.evaluate/:autonomous,webjudge, andoverall_judge(aggregates), leveraging OpenAI (gpt-4o) with screenshot(s) and action history.test_task.jsonand README instructions for running single tasks or HF dataset (Genteki/Online-Mind2Web).Written by Cursor Bugbot for commit 4c96563. This will update automatically on new commits. Configure here.