Analyze unstructured data blazingly fast with machine learning. Connect your own ML models to your own data sources and query away!
In order to start using AIDB, all you need to do is install the requirements, specify a configuration, and query! Setting up on the environment is as simple as
git clone https://github.com/ddkang/aidb.git
cd aidb
pip install -r requirements.txt
# Optional if you'd like to run the examples below
gdown https://drive.google.com/uc?id=1SyHRaJNvVa7V08mw-4_Vqj7tCynRRA3x
unzip data.zip -d tests/
We've set up an example of analyzing product reviews with HuggingFace. Set your HuggingFace API key. After this, all you need to do is run
python launch.py --config=config.sentiment --setup-blob-table --setup-output-table
As an example query, you can run
SELECT AVG(score)
FROM sentiment
WHERE label = '5 stars'
ERROR_TARGET 10%
CONFIDENCE 95%;
You can see the mappings here. We use the HuggingFace API to generate sentiments from the reviews.
We've also set up another example of analyzing whether or not user-generated content is adult content for filtering. In order to run this example, all you need to do is run
python launch.py --config=config.nsfw_detect --setup-blob-table --setup-output-table
As an example query, you can run
SELECT *
FROM nsfw
WHERE racy LIKE 'POSSIBLE';
You can see the mappings here. We use the Google Vision API to generate the safety labels.
AIDB focuses on keeping cost down and interoperability high.
We reduce costs with our optimizations:
- First-class support for approximate queries, reducing the cost of aggregations by up to 350x.
- Caching, which speeds up multiple queries over the same data.
We keep interoperability high by allowing you to bring your own data source, ML models, and vector databases!
One key feature of AIDB is first-class support for approximate queries.
Currently, we support approximate AVG
, COUNT
, and SUM
.
We don't currently support GROUP BY
or JOIN
for approximate aggregations, but it's on our roadmap.
Please reach out if you'd like us to support your queries!
In order to execute an approximate aggregation query, simply append ERROR_TARGET <error percent>% CONFIDENCE <confidence>%
to your normal aggregation.
As a full example, you can compute an approximate count by doing:
SELECT COUNT(xmin)
FROM objects
ERROR_TARGET 5%
CONFIDENCE 95%;
The ERROR_TARGET
specifies the percent error compared to running the query exactly.
For example, if the true answer is 100, you will get answers between 95 and 105 (95% of the time).
We have many improvements we'd like to implement. Please help us! For the time being, please email us, if you'd like to help contribute.
Need help in setting up AIDB for your specific dataset or want a new feature? Please fill this form.