-
Notifications
You must be signed in to change notification settings - Fork 43
Online-Mind2Web Example (Rebased) #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
📝 Documentation updates detected! Updated existing suggestion: Add comprehensive Mind2Web evaluation documentation (updated for PR #156) |
| if openai_api_key: | ||
| logging.info( | ||
| f"DEBUG: Raw key repr: {repr(openai_api_key[:10])}" | ||
| ) # Show first 50 chars with repr to see any weird characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Logging Mismatch: Key Truncation
The logging statement for openai_api_key shows only the first 10 characters, but its inline comment indicates it should display the first 50. This mismatch between the code's behavior and its description can be confusing during debugging.
Additional Locations (1)
| if main_score >= score_threshold: | ||
| # Include high-scoring screenshots in final evaluation | ||
| final_images = [] | ||
| for screenshot_b64 in screenshot_history: # All screenshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Online-Mind2Web Evaluation
The file changes here have been rebased and modified for
hud-python==0.4.50. Original PR: #145Update
AnthropicComputerToolWithRecord: Claude computer tool with screenshot recording, using callback functionOpenAIComputerToolWithRecordOpenAI computer tool with screenshot recording, using callback functionwebjudge_online_mind2web.py]Evaluating method, using GPT-4o, based on problem description, screenshot history and action history. [Reference: Online-Mind2Web]autonomous_eval.py: Evaluating method, using GPT-4o, based on problem description, final screenshot and action history. [Reference: Online-Mind2Web]AnthropicComputerTool,OpenAIComputerToolwith the ones with history recordingUpdates Oct 8, 2025
Online-Mind2Web supported in
remote-browserenvNote
Introduce Anthropic/OpenAI computer tools that auto-save screenshots and record actions, and add Online‑Mind2Web evaluators (autonomous_eval, webjudge, overall_judge) wired into both browser and remote-browser environments.
evaluate/online_mind2webwithautonomous_evalandwebjudgeusing GPT‑4o, leveraging screenshot and action history.environments/browser/server/evaluate/__init__.pyand include in server.evaluate/autonomous_eval.py,evaluate/webjudge.py, andevaluate/overall_judge.py; register viaevaluatehub.AnthropicComputerToolWithRecordandOpenAIComputerToolWithRecordthat:/screenshoton key actions./action_history/action_history.txt.browser/server/main.pyand remote server wiring; export from local tool packages.hud-pythonpin to@maininbrowser/server/pyproject.toml.Written by Cursor Bugbot for commit a2cfa28. This will update automatically on new commits. Configure here.