Skip to content

Releases: Farama-Foundation/miniwob-plusplus

MiniWoB++ v1.0: Gymnasium Integration and Expanded Actions

14 Aug 17:07
553daee
Compare
Choose a tag to compare

v1.0 Release notes

In this release, we fully integrated MiniWoB++ with Gymnasium, which provides a standardized API for reinforcement learning. All MiniWoB++ environments can be easily instantiated using gymnasium.make and can be transformed via wrappers. The observations and actions are now fully typed and can likewise be transformed.

Additionally, we expanded the set of possible actions on web pages to include mouse dragging, scrolling, and pressing key combinations. Many aspects of the action space, such as the allowed key combinations or scrolling duration, can be customized. Finally, we added 18 environments from the hidden test set.

Breaking Changes

The upgrade to Python 3 and integration with Gymnasium means that old code that uses MiniWoB++ is likely to break. We recommend updating the code as follows (see the Basic Usage page for an example):

  • Update the code to Python 3.8+.
  • Environment instantiation: env = MiniWoBEnvironment('click-test-2')env = gymnasium.make('miniwob/click-test-2-v1')
  • Running multiple instances in parallel is no longer supported. The observation returned by env.reset() and env.step(action) is now a single observation instead of a list. Similarly, the action argument in env.step(action) should be a single action.
  • The env.reset(seed) method now returns (observation, info) instead of just observation . The observation is a Python dict defined by the observation space.
  • The action in env.step(action) should be a Python dict defined by the action space. The env.step(action) method now returns (observation, reward, terminated, truncated, info), though the added truncated is always False.

A less recommended option is to use the Legacy Release of MiniWoB++, which is written in Python 2.

New Features and Improvements

  • Updated the code to Python 3.8+ and Selenium 4.5 (#7)
  • Integrate the code with Gymnasium. This includes registering the environments and redefining the action and observation spaces (#12, #20, #43)
  • Implemented more actions such as mouse dragging, scrolling, and pressing key combinations (#46)
  • Added ActionSpaceConfig, which allows selecting a subset of allowed actions and configuring each action type (#25)
  • Added presets for ActionSpaceConfig, which configure the action space following previous works on MiniWoB++ (#65)
  • Added 18 environments from the hidden test set (#69)
  • Added code standardization enforcement, including full type annotation (#9, #16)
  • Added integration tests, including a check for seed determinism (#13, #18)
  • Added the project to PyPI (#56, #61)

Bug Fixes

  • Made the environments fully deterministic for a given random seed unless impossible (#18)
  • Fixed issues with screenshots not being scaled correctly (#45)
  • Renamed environments to reduce confusion: drag-item → drag-circle and drag-shape → drag-single-shape (#21, #69).

Documentation Updates

  • Added a documentation site with guides on how to use the API and the description of each environment (#22, #23, #31, #47, #50).
  • Updated README.md, LICENSE, and CONTRIBUTING.md (#2, #47, #50)
  • Added templates for GitHub issues and pull requests (#8)
  • Added logo and banner (#35)
  • Added a project roadmap (#59)

Contributors

This release includes contributions from: @elliottower, @jjshoots, @jkterry1, @mgoulao, @ppasupat, @pseudo-rnd-thoughts, @rodrigodelazcano, @younik

MiniWoB++ v0.1 Release: Legacy code for reproducibility

27 Mar 08:04
Compare
Choose a tag to compare

MiniWoB++ (Mini World of Bits++) contains a collection of over 100 web interaction environments, ranging from simple button clicks to more complex forms and web apps. The benchmark was originally designed for reinforcement learning. The environments are self-contained and have a unified interface (similar to Atari environments). On the other hand, they pose several challenges such as sparse rewards, long episode horizons, and multimodal states.

This legacy release contains the original version of the code as released with the paper (Liu et al., 2018) Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration, which implements a Selenium-based Python interface for interacting with the environment.

While we are excited to bring standardization with the Gymnasium API, action space customization, and other new features to MiniWoB++, the updated Python interface becomes incompatible with older projects. This release allows such projects to be reproduced. New projects are encouraged to use the standardized Gymnasium interface.

Note that the environments themselves (HTML, CSS, and JavaScript files) have remained mostly the same since the initial release. Older projects that only use the environments can use the latest version of the code as well.

The next stable release (v1) will support Python 3.6+ and have full integration with the Gymnasium API.

Requirements

The code was tested on Python 2.7. Other requirements are listed in python/requirements.txt. Follow the instructions in python/README.md to set up the code.

Example code usage

The code below performs a deterministic action on the click-test-2 environment (Instruction: “Click button ONE.”). The equivalent code for the new Gymnasium interface is here.

import os
import time
from miniwob.action import MiniWoBElementClick
from miniwob.environment import MiniWoBEnvironment

env = MiniWoBEnvironment('click-test-2')

# Wrap the code in try-finally to ensure proper cleanup.
try:
  base_url = os.environ.get('MINIWOB_BASE_URL')
  env.configure(base_url=base_url, headless=False, seeds=[123])

  # Start a new episode.
  states = env.reset()
  time.sleep(2)       # Only here to let you look at the environment.
  
  # Find the HTML element with text "ONE".
  for element in states[0].dom_elements:
    if element.text == "ONE":
      break

  # Click on the element.
  actions = [MiniWoBElementClick(element)]
  states, rewards, dones, info = env.step(actions)

  # Check if the action was correct. 
  assert rewards[0] >= 0      # Should be around 0.8 since 2 seconds have passed.
  assert dones[0] is True
  time.sleep(2)

finally:
  env.close()

For more code usage, check out the following projects: