WebDriver API command support and mapping to AutoPy API

Desired Capabilities

These are capabilities that AutoPyDriverServer supports.

Standard WebDriver Desired Capabilities

takesScreenshot = true

Custom Desired Capabilities specific to AutoPyDriverServer

imageRecognitionToleranceValue = a double/floating point value that sort of indicates tolerance in precision of image recognition. See https://github.com/msanders/autopy/issues/25 for details.
defaultImageFolder = defines image folder used by AutoPyDriverServer when performing element lookup by name. Defaults to image subfolder within AutoPyDriverServer directory (if starting server relative to that location).
defaultElementImageMapConfigFile = the mapping config file that defines element ID mapping to actual image file location when performing element lookup by ID. Defaults to the config file in AutoPyDriverServer directory (if starting server relative to that location).

Element Location Strategy

AutoPy manipulates against bitmaps (i.e. images as PNG or BMP) and colors. With respect to interfacing against WebDriver API, we only deal with bitmaps in PNG format, no BMP, no colors. As such, the translation to element lookup becomes:

Find Element By ID

ID is arbitrarily defined by you the user in a config file of key/value pairs. Key is ID, value is (absolute) path to PNG image that defines the element.

The config file looks like this (a typical INI/CFG file):

[Element Mapping]
2btn=C:\\pathTo\\calc_2_btn.png
3btn=/pathTo/calc_3_btn.png
4tbn=\\server\shareName\pathTo\calc_4_btn.png

The Element Mapping section header is required. Use whatever file path convention is applicable to your platform (Windows, Linux/Mac). By default, the config file used will be element_image_map under AutoPyDriverServer directory. The actual file used is configurable from command line at server startup as well as via Desired Capabilities when establishing a session with WebDriver client.

Example: driver.findElement(By.id("2btn")).click();

Find Element By Name

Name will be the filename of the PNG file used to define the element. Do not include path to the file, just filename. The absolute path is composed from the image folder path used by AutoPyDriverServer appended with the (file) name supplied by WebDriver client. The image folder path is configurable from command line at server startup as well as via Desired Capabilities when establishing a session with WebDriver client. By default, the image folder used will be the subfolder within AutoPyDriverServer directory.

Example: driver.findElement(By.name("calc_2_btn.png")).click();

Find Element By XPath

Instead of actual XPath, we use the "path" terminology of this find by method. So the value of this lookup method defines the absolute path to the PNG image that defines the element. No path checking is performed. If path is incorrect, you'll end up getting an exception.

Example: driver.findElement(By.xpath("C:\\Temp\\demo1.png")).click();

Example: driver.findElement(By.xpath("/pathTo/demo2.png")).click();

Other find element strategies

The other strategies have no mapped equivalent for AutoPyDriverServer. Attempting to use them will return an exception.

I am however contemplating using one of the unused strategies to allow passing a base 64 encoded string of a PNG image as the find by value to find the element. The server will decode the string to image and try to find that image. But that's for future enhancement if to be implemented.

Supported/available WebDriver API/commands

Only the commands shown here are supported. Invoking any other command will return exception. These commands interface to the AutoPy bitmap and mouse modules and sparingly the key module.

Execute Script - gives you shell execute access and returns the standard output from the execution. Beware that this may block WebDriver execution until the script completes.
Taking screenshot - of entire dekstop. Tested on single monitor desktop, untested for multiple monitor screens.
WebElement.click
WebElement.sendKeys - only supports plain text for now, doesn't handle modifier keys
*Generic send keys not for element - only supports plain text for now, doesn't handle modifier keys
WebElement.location - returns x,y coordinates of element on desktop screen
WebElement.size - returns width & height of element
WebElement.isDisplayed() - returns indicator whether element exists or is displayed on screen

Mouse operations supported via JSONWireProtocol methods and Actions API:

drag & drop
mouse up
mouse down
mouse move to element
mouse move to element with offset
mouse move to offset (relative to mouse location)
right mouse click
mouse click

Limitations in WebDriver API implementation

not yet Selenium Grid compatible/enabled for connecting to a Selenium grid hub as a node

Element referencing and WebDriver overall functionality may be impacted by length of image paths that identify elements

As image recognition does not involve DOM objects, and the nature of desktop GUI application testing in terms of images recognition differs from web testing and desktop GUI testing based on object/component identification by ID, name, properties, there is no real object reference to a WebElement on locating it.

So the server tracks the located element for future manipulation by encoding the image path to the element that was originally used to find it, and returning this back to the client for future requests to manipulate the same element. The encoding is done with base 64 then URL encoding the result. This result become the WebElement reference ID/value. In terms of the WebDriver JSON WireProtocol the WebElement ID is passed as a parameter within the URL resource request. Therefore it could be impacted by URL resource path length since it's not transmitted as part of a POST body instead like some other parameters, which don't have such limitations.

Therefore, if you experience issues with WebElements, check that your (absolute) image paths used are not too long (e.g. greater than 256 characters, etc.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebDriver API command support and mapping to AutoPy API

Desired Capabilities

Element Location Strategy

Supported/available WebDriver API/commands

Limitations in WebDriver API implementation

Clone this wiki locally