-
-
Notifications
You must be signed in to change notification settings - Fork 80
2.3.30 Satellite: OmniParser
av edited this page Nov 17, 2024
·
1 revision
Handle:
omniparser
URL: http://localhost:34271
📢 [Project Page] [Blog Post] [Models] [Huggingface demo]
OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.
[!WARN] OmniParser currently requires CUDA and Nvidia GPU to run.
# [Optional] Pre-build the image
# Will take a while, as depends on CUDA
harbor build omniparser
# Start the service
harbor up omniparser
# [Optional] first run will take a while to download the models
# monitor the progress:
harbor logs omniparser
# [Optional] open in browser
harbor open omniparser
You can use either an original Gradio App or its API (via Gradio Client).
Either way, the service will produce an image with annotations as well as textual output, ensure to use both together for the best results.