Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Resize file in screen toolkit #81

Merged
merged 2 commits into from
Sep 24, 2024
Merged

Conversation

zakiali
Copy link
Collaborator

@zakiali zakiali commented Sep 19, 2024

Updating the screen toolkit to have the capability to resize the image.

This is a solution for errors in payload size when sending images, for example. We've been seeing this in some providers, but not all. The limits in OpenaAI for payload size is 20MB.

By tools default functionality doesn't change, only when prompted to resize the image if you run into errors.

edit: Additionally, wanted to note that the file size isn't the right number for the payload since the images are being base64 encoded, inflating the disk size by 37%.

@michaelneale michaelneale changed the title Resize file in screen toolkit bug: Resize file in screen toolkit Sep 19, 2024
@michaelneale
Copy link
Collaborator

@zakiali any thought on a ride for this PR: make it screenshot all screens?

@michaelneale michaelneale changed the title bug: Resize file in screen toolkit fix: Resize file in screen toolkit Sep 19, 2024
@zakiali zakiali force-pushed the zaki/allow-resize-screencapture branch from 0d5fa21 to 1804f72 Compare September 21, 2024 01:27
@zakiali
Copy link
Collaborator Author

zakiali commented Sep 21, 2024

@zakiali any thought on a ride for this PR: make it screenshot all screens?

Added options to take screenshots of diff screens instead of the main and it's a pretty easy extension to do all screens (pass in more file paths to the main command). I left it out though because it might be too much and maybe an extended toolkit that does that for power use cases?

@zakiali zakiali marked this pull request as ready for review September 21, 2024 01:32
@michaelneale
Copy link
Collaborator

hrm - was able to test it, but when asking it to take it of both screens and mention size, saw this error:

image

@michaelneale
Copy link
Collaborator

the parameter thing may be different, but I also saw this (thought I could try 2 screens):

G❯ can you take a look at both my screens and tell me what you see

screencapture -x -D 1 /tmp/goose_screenshot_4cc6811760b04cc7984317fef33c02b2.png  

screencapture -x -D 2 /tmp/goose_screenshot_aa42b9aabaf94fbb814259f669c2a5c8.png 
Traceback (most recent call last):
  File "/Users/micn/Documents/code/domestic-goose/.venv/lib/python3.12/site-packages/exchange/providers/utils.py", line 30, in raise_for_status
    response.raise_for_status()
  File "/Users/micn/Documents/code/domestic-goose/.venv/lib/python3.12/site-packages/httpx/_models.py", line 763, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://api.openai.com/v1/chat/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

it may be stuffing 2 images at once to openai - if you like I can take a look and try to push some fixes to this branch?

"""
Take a screenshot to assist the user in debugging or designing an app. They may need help with moving a screen element, or interacting in some way where you could do with seeing the screen.
Optionally, resize the image using the `sips` cli tool. You may do this fs there is potential for error in the payload size.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think at the moment, the error around the payload size is not surfaced to the model but instead exits the loop? we'll need an update in exchange i think for it to see the need to do this.

We might instead consider always resizing to 728x on the low side, given the openai approach. We can also switch from png to jpg.

detail: high images are first scaled to fit within a 2048 x 2048 square, maintaining their aspect ratio. Then, they are scaled such that the shortest side of the image is 768px long.

That'd make for maybe 200kb uploads, so always well below the limit without needing the model to consider when it needs to resize

@baxen baxen force-pushed the zaki/allow-resize-screencapture branch from 230ec67 to 77b2bf2 Compare September 24, 2024 04:58
@baxen baxen merged commit b9bb48e into main Sep 24, 2024
2 checks passed
lily-de pushed a commit that referenced this pull request Oct 7, 2024
@lamchau lamchau deleted the zaki/allow-resize-screencapture branch October 24, 2024 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants