2.3.30 Satellite: OmniParser

OmniParser

Handle: omniparser
URL: http://localhost:34271

📢 [Project Page] [Blog Post] [Models] [Huggingface demo]

Example omniparser with Harbor UI

OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.

Starting

[!WARN] OmniParser currently requires CUDA and Nvidia GPU to run.

# [Optional] Pre-build the image
# Will take a while, as depends on CUDA
harbor build omniparser

# Start the service
harbor up omniparser

# [Optional] first run will take a while to download the models
# monitor the progress:
harbor logs omniparser

# [Optional] open in browser
harbor open omniparser

Usage

You can use either an original Gradio App or its API (via Gradio Client).

Either way, the service will produce an image with annotations as well as textual output, ensure to use both together for the best results.

Home | CLI Reference | Services | Adding New Service | Compatibility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.3.30 Satellite: OmniParser

OmniParser

Starting

Usage

Clone this wiki locally