GetAI is a powerful API library and command-line tool for AI Models and Datasets. It simplifies the process of searching, downloading, and exploring AI models and datasets from various sources like Hugging Face and other platforms. With GetAI, you can easily find and download the models and datasets you need with a simple import statement and minimal lines of code, without the hassle of navigating through multiple websites and repositories.
-
Easy to Download and Use
- Many tools force you into their controlled ecosystem. GetAI liberates you and your AI agents.
- Install:
pip install getai
- Search Models:
getai search model <query>
- Search Datasets:
getai search dataset <query
- Download a Model:
getai model author/model_name
Working Example:getai model meta-llama/Llama-2-7b-hf
-
Two lines of code to add AI Model Search or Download) to Your Project
from getai import search_datasets, download_dataset; import asyncio asyncio.run(search_datasets("sentiment analysis", hf_token=None, max_connections=5, output_dir="datasets"))
-
Interactive Search for Models and Datasets
- Powerful fully interactive console UX for search (showing sizes, branches, last updated, and more)
- Accelerates navigation and model discovery
-
Detailed Information
- Displays sizes and last modified dates
- Enables sorting results in various ways
-
Future-Ready Design
- Quickly find the most current models and datasets
- Designed to simplify integration with your AI agents in searching and downloading datasets and models for fully autonomous projects
- Why GetAI?
- Features of GetAI
- Installation
- Usage
- Configuration
- Contributing
- License
- Support and Feedback
- Sources of Inspiration
- Using GetAI as a Library
The advent of large language models has led many companies to release open-source versions of their pre-trained foundation models. This has enabled anyone to download and run AI models locally, rather than relying on third-party API services. However, the process of finding, downloading, and setting up these models can be cumbersome and not always straightforward, especially for new users.
GetAI aims to simplify this process, not only for human developers but also for AI agents that need a simple tool to search and download models and datasets easily. With GetAI, you can quickly find the models and datasets you need, download them asynchronously, and start exploring and using them in your projects.
- Asynchronous Downloads: GetAI allows you to download models asynchronously, making efficient use of network resources and saving you time.
- Searching for AI Models: You can easily search for AI models using various filters such as name, last updated date, and other attributes. GetAI provides a user-friendly interface to find the models that best suit your needs.
- Searching for Datasets: GetAI also enables you to search for datasets by name and download them easily. You can quickly find and access the datasets you require for training or evaluating your AI models.
- Multiple Sources: GetAI supports downloading models and datasets from multiple sources, including Hugging Face, TensorFlow Hub, and other platforms. You can access a wide range of resources from different providers through a single tool.
- Flexible Configuration: GetAI allows you to configure sources and authentication through a
config.yaml
file (default location:/home/.getai/config.yaml
). You can easily set up your credentials and preferences to streamline your workflow. - Interactive CLI: GetAI provides an easy-to-use command-line interface with interactive features such as branch selection and progress display. You can navigate through the available options and monitor the download progress seamlessly.
- API Access: Easily integrate GetAI functionalities into your own applications using the provided API.
You can install getai
using pip:
pip install getai
GetAI provides a simple and intuitive command-line interface. Here are some examples of how you can use GetAI:
getai search model <query> [--author <author>] [--filter <filter>] [--sort <sort>] [--direction <direction>] [--limit <limit>] [--full]
This command allows you to search for AI models based on the provided query. You can use various options to refine your search results:
--author
: Filter models by author or organization.--filter
: Filter models based on tags.--sort
: Property to use when sorting models.--direction
: Direction in which to sort models.--limit
: Limit the number of models fetched.--full
: Fetch full model information.
Example:
getai search model "text-generation" --sort downloads --direction -1 --limit 10
Sample output:
Search results for 'text-generation' (Page 1 of 1, Total: 10):
1. gpt2 by OpenAI (openai/gpt2) (Size: 548.09 MB)
2. distilgpt2 by HuggingFace (distilgpt2) (Size: 353.75 MB)
3. gpt2-large by OpenAI (openai/gpt2-large) (Size: 1.50 GB)
...
Enter 'n' for the next page, 'p' for the previous page, 'f' to filter, 's' to sort, 'r' to return to previous search results, or the model number to download.
getai search dataset <query> [--author <author>] [--filter <filter>] [--sort <sort>] [--direction <direction>] [--limit <limit>] [--full]
This command enables you to search for datasets based on the provided query. You can use various options to refine your search results:
--author
: Filter datasets by author or organization.--filter
: Filter datasets based on tags.--sort
: Property to use when sorting datasets.--direction
: Direction in which to sort datasets.--limit
: Limit the number of datasets fetched.--full
: Fetch full dataset information.
Example:
getai search dataset "sentiment analysis" --filter language:en --sort downloads --direction -1 --limit 5
Sample output:
Search results for 'sentiment analysis' (Page 1 of 1, Total: 5):
1. imdb by andrew-maas (andrew-maas/imdb) (Size: 80.23 MB)
2. twitter_sentiment by nlp-with-deeplearning (nlp-with-deeplearning/twitter_sentiment) (Size: 63.15 MB)
3. sst2 by glue (glue/sst2) (Size: 7.09 MB)
...
Enter 'n' for the next page, 'p' for the previous page, 'f' to filter, 's' to sort, 'r' to return to previous search results, or the dataset number to download.
getai model <identifier> [--branch <branch>] [--output-dir <output-dir>] [--max-retries <max-retries>] [--max-connections <max-connections>] [--clean] [--check]
This command allows you to download a specific model by providing its identifier. You can use various options to customize the download process:
--branch
: Specify a branch name or enable branch selection.--output-dir
: Directory to save the model.--max-retries
: Max retries for downloads.--max-connections
: Max simultaneous connections for downloads.--clean
: Start download from scratch.--check
: Validate the checksums of files after download.
Example:
getai model meta-llama/Llama-2-7b-hf --branch main --output-dir models/gpt2 --max-retries 3 --max-connections 5
getai dataset <identifier> [--revision <revision>] [--output-dir <output-dir>] [--max-retries <max-retries>] [--max-connections <max-connections>] [--full]
This command enables you to download a specific dataset by providing its identifier. You can use various options to customize the download process:
--revision
: Revision of the dataset.--output-dir
: Directory to save the dataset.--max-retries
: Max retries for downloads.--max-connections
: Max simultaneous connections for downloads.--full
: Fetch full dataset information.
Example:
getai dataset glue/sst2 --revision main --output-dir datasets/sst2 --max-retries 3 --max-connections 5
For more detailed usage instructions and additional options, please refer to the GetAI documentation.
GetAI uses a config.yaml
file to store configuration settings such as API tokens and other preferences. By default, the configuration file is located at /home/.getai/config.yaml
. However, we recommend using the huggingface-cli login
command to securely set up your Hugging Face token or setting the HF_TOKEN
environment variable.
Example of setting the environment variable:
export HF_TOKEN=your_huggingface_token_here
Here's an example of a config.yaml
file:
hf_token: your_huggingface_token_here
Replace your_huggingface_token_here
with your actual Hugging Face token.
GetAI isn't just a command-line tool; it's also a powerful Python library for searching and downloading datasets and models from Hugging Face. This guide shows you how to leverage GetAI's capabilities programmatically in your Python applications.
First, install the GetAI package:
pip install getai
Below are examples of how to use the main functions provided by GetAI. These examples demonstrate how to search for and download datasets and models programmatically.
Let's say you're working on a project related to sentiment analysis and you want to find relevant datasets. Here's how you can do it:
from getai import search_datasets
import asyncio
async def search_datasets_example():
await search_datasets(
query="sentiment analysis",
hf_token="your_huggingface_token",
max_connections=5,
output_dir="datasets"
)
if __name__ == "__main__":
asyncio.run(search_datasets_example())
Once you've found the dataset you need, downloading it is simple. For example, to download the SST-2 dataset from the GLUE benchmark:
from getai import download_dataset
import asyncio
async def download_dataset_example():
await download_dataset(
identifier="stanfordnlp/imdb",
hf_token="None",
max_connections=5,
output_dir="datasets/stanfordnlp/imdb"
)
if __name__ == "__main__":
asyncio.run(download_dataset_example())
Imagine you need a model for text generation. Here's how you can search for it:
from getai import search_models
import asyncio
async def search_models_example():
await search_models(
query="text-generation",
hf_token="your_huggingface_token",
max_connections=5
)
if __name__ == "__main__":
asyncio.run(search_models_example())
After finding the model, downloading it is straightforward. For instance, to download the GPT-2 model:
from getai import download_model
import asyncio
async def download_model_example():
await download_model(
identifier="openai/gpt2",
branch="main",
hf_token="your_huggingface_token",
max_connections=5,
output_dir="models/gpt2"
)
if __name__ == "__main__":
asyncio.run(download_model_example())
Replace your_huggingface_token
with your actual Hugging Face token.
To give you a deeper understanding, here are the detailed descriptions and usages of the core functions provided by GetAI.
async def search_datasets(
query, hf_token=None, max_connections=5, output_dir=None, **kwargs
):
"""
Search datasets on Hugging Face based on a query.
Args:
query (str): The search query.
hf_token (str): Hugging Face token.
max_connections (int): Maximum number of concurrent connections.
output_dir (Path): Directory to save search results.
**kwargs: Additional keyword arguments for filtering search results.
"""
async def download_dataset(
identifier, hf_token=None, max_connections=5, output_dir=None, **kwargs
):
"""
Download a dataset from Hugging Face by its identifier.
Args:
identifier (str): The dataset identifier.
hf_token (str): Hugging Face token.
max_connections (int): Maximum number of concurrent connections.
output_dir (Path): Directory to save the dataset.
**kwargs: Additional keyword arguments for dataset download.
"""
async def search_models(
query, hf_token=None, max_connections=5, **kwargs
):
"""
Search models on Hugging Face based on a query.
Args:
query (str): The search query.
hf_token (str): Hugging Face token.
max_connections (int): Maximum number of concurrent connections.
**kwargs: Additional keyword arguments for filtering search results.
"""
async def download_model(
identifier, branch="main", hf_token=None, max_connections=5, output_dir=None, **kwargs
):
"""
Download a model from Hugging Face by its identifier and branch.
Args:
identifier (str): The model identifier.
branch (str): The branch name.
hf_token (str): Hugging Face token.
max_connections (int): Maximum number of concurrent connections.
output_dir (Path): Directory to save the model.
**kwargs: Additional keyword arguments for model download.
"""
Contributions to GetAI are welcome! If you would like to contribute to the project, please follow the guidelines outlined in the CONTRIBUTING.md
file. You can help improve GetAI by reporting issues, suggesting new features, or submitting pull requests.
GetAI is released under the MIT License with attribution to the author, Ben Gorlick (github.com/bgorlick). Please see the LICENSE
file for more details.
(c) 2023-2024 Ben Gorlick github.com/bgorlick
If you encounter any issues, have questions, or would like to provide feedback, please open an issue on the GetAI GitHub repository. We appreciate your input and will do our best to assist you.
Thank you for using GetAI! We hope it simplifies your workflow and enhances your experience with AI models and datasets.
This project started as an attempt to create a completely asynchronous port of oobagooba's text-generation-webui model downloading script. His script at the time operated with a multithreaded design and I wanted to explore building an asynchronous version. Credits go entirely to him for the initial approach, pagination methods, and parsing logic for a variety of the file types.