Skip to content

Commit ff44f0b

Browse files
committed
fix(browser_action): align tool definition with implementation
- Change coordinate from object to string format 'x,y@WIDTHxHEIGHT' - Change size from object to string format 'WIDTHxHEIGHT' - Update required params to only include 'action' (others are conditionally required) - Add missing 'press' action to enum - Update text parameter description to clarify usage for both type and press actions Fixes issue where assistant was correctly following schema but receiving 'Missing value for required parameter' errors.
1 parent 791544f commit ff44f0b

File tree

1 file changed

+10
-32
lines changed

1 file changed

+10
-32
lines changed

src/core/prompts/tools/native-tools/browser_action.ts

Lines changed: 10 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -5,59 +5,37 @@ export default {
55
function: {
66
name: "browser_action",
77
description:
8-
"Interact with a Puppeteer-controlled browser session. Always start by launching at a URL and always finish by closing the browser. While the browser is active, do not call any other tools. Use coordinates within the viewport to hover or click, provide text for typing, and ensure actions are grounded in the latest screenshot and console logs.",
8+
"Interact with a browser session. Always start by launching at a URL and always finish by closing the browser. While the browser is active, do not call any other tools. Use coordinates within the viewport to hover or click, provide text for typing, and ensure actions are grounded in the latest screenshot and console logs.",
99
strict: true,
1010
parameters: {
1111
type: "object",
1212
properties: {
1313
action: {
1414
type: "string",
1515
description: "Browser action to perform",
16-
enum: ["launch", "hover", "click", "type", "resize", "scroll_down", "scroll_up", "close"],
16+
enum: ["launch", "click", "hover", "type", "press", "scroll_down", "scroll_up", "resize", "close"],
1717
},
1818
url: {
1919
type: ["string", "null"],
2020
description: "URL to open when performing the launch action; must include protocol",
2121
},
2222
coordinate: {
23-
type: ["object", "null"],
23+
type: ["string", "null"],
2424
description:
25-
"Screen coordinate for hover or click actions; target the center of the desired element",
26-
properties: {
27-
x: {
28-
type: "number",
29-
description: "Horizontal pixel position within the current viewport",
30-
},
31-
y: {
32-
type: "number",
33-
description: "Vertical pixel position within the current viewport",
34-
},
35-
},
36-
required: ["x", "y"],
37-
additionalProperties: false,
25+
"Screen coordinate for hover or click actions in format 'x,y@WIDTHxHEIGHT' where x,y is the target position on the screenshot image and WIDTHxHEIGHT is the exact pixel dimensions of the screenshot image (not the browser viewport). Example: '450,203@900x600' means click at (450,203) on a 900x600 screenshot. The coordinates will be automatically scaled to match the actual viewport dimensions.",
3826
},
3927
size: {
40-
type: ["object", "null"],
41-
description: "Viewport dimensions to apply when performing the resize action",
42-
properties: {
43-
width: {
44-
type: "number",
45-
description: "Viewport width in pixels",
46-
},
47-
height: {
48-
type: "number",
49-
description: "Viewport height in pixels",
50-
},
51-
},
52-
required: ["width", "height"],
53-
additionalProperties: false,
28+
type: ["string", "null"],
29+
description:
30+
"Viewport dimensions for the resize action in format 'WIDTHxHEIGHT' or 'WIDTH,HEIGHT'. Example: '1280x800' or '1280,800'",
5431
},
5532
text: {
5633
type: ["string", "null"],
57-
description: "Text to type when performing the type action",
34+
description:
35+
"Text to type when performing the type action, or key name to press when performing the press action (e.g., 'Enter', 'Tab', 'Escape')",
5836
},
5937
},
60-
required: ["action", "url", "coordinate", "size", "text"],
38+
required: ["action"],
6139
additionalProperties: false,
6240
},
6341
},

0 commit comments

Comments
 (0)