HeatWave is an integrated, massively parallel, high-performance, in-memory query accelerator for MySQL Database Service that accelerates performance of MySQL by orders of magnitude for analytics and mixed workloads. It is the only service that enables you to run OLTP and OLAP workloads simultaneously and directly from your MySQL database, without any changes to your applications. This eliminates the need for complex, time-consuming, and expensive data movement and integration with a separate analytics database. Your applications connect to the HeatWave cluster through standard MySQL protocols.
HeatWave users currently do not have an easy way of creating machine-learning models for their data in the database, or generating predictions and explanations for it. Such users, while being database experts, frequently are relatively new to Machine Learning and can benefit from products that streamline the creation and usage of machine learning models. HeatWave AutoML is the product that addresses this need.
- Provision MySQL Database Service instance and add a HeatWave cluster.
- Clone this repository and change directories
git clone https://github.com/oracle-samples/heatwave-ml.git
- Create a Python virtual environment and activate it as follows
python3.8 -m venv py_heatwaveml
source py_heatwaveml/bin/activate
- Install the necessary Python packages
pip install pandas numpy unlzw3 scikit-learn pyreadr --user
To help customers get started with Heatwave ML and showcase its capabilities, we have prepared a set of Jupyter notebooks. Each notebook focuses on a simple application of Heatwave ML components in practice and walks you through a solution. Here is the list of existing notebooks and a screenshot of the rendered HTML.
Description | Link |
---|---|
Training a model to predict whether a bank customer will subscribe to a term deposit | Bank marketing |
Training a model to predict the price of a diamond | Diamonds |
SQL Code to run training, predictions and scoring on a variety of common Machine Learning classification and regression datasets.
Example | Description | #Rows (Training Set) | #Features |
---|---|---|---|
airlines | Predict Flight Delays | 377568 | 8 |
bank_marketing | Direct marketing – Banking Products | 31648 | 17 |
cnae-9 | Documents with free text business descriptions of Brazilian companies | 757 | 857 |
connect-4 | 8-ply positions in the game of connect-4 in which neither player has won yet – predict win/loss | 47290 | 161 |
fashion_mnist | Clothing classification problem | 60000 | 785 |
nomao | Active learning is used to efficiently detect data that refer to a same place based on Nomao browser | 24126 | 119 |
numerai | Data is cleaned, regularized and encrypted global equity data | 67425 | 22 |
higgs | Monte Carlo Simulations | 10500000 | 29 |
census | Determine if a person makes > $50k | 32561 | 15 |
titanic | Survival Status of individuals | 917 | 14 |
creditcard | Identify fraudulent transactions | 199364 | 30 |
appetency | Predict the propensity of customers to buy new products | 35000 | 230 |
black_friday | Customer purchases on Black Friday | 116774 | 10 |
diamonds | Predict price of a diamond | 37758 | 10 |
mercedes | Time the car took to pass testing | 2946 | 377 |
news_popularity | Predict the number of shares of article in social networks (popularity) | 27750 | 60 |
nyc_taxi | Predict tip amount for NYC taxi cab | 407284 | 15 |
The popularity of a topic on social media | 408275 | 78 |
This project welcomes contributions from the community. Before submitting a pull request, please review our contribution guide
Please consult the security guide for our responsible security vulnerability disclosure process
Copyright (c) 2025 Oracle and/or its affiliates.
Released under the Universal Permissive License v1.0 as shown at https://oss.oracle.com/licenses/upl/.