Kernel Combinations for Sparse Gaussian Processes in Correlated Web Traffic Forecasting

Authors: Aditya Shrey and Arnav Chahal

This repository contains code and experiments from our research on combining multiple kernels within a Sparse Gaussian Process (GP) framework for correlated web traffic forecasting. Our approach leverages a variety of kernels—Squared Exponential, Spectral Mixture, Matérn, Linear, and Sinusoidal—to capture complex temporal patterns. By optimizing the Evidence Lower Bound (ELBO), we tune kernel weights and hyperparameters, investigating how each kernel contributes to forecast accuracy and uncertainty estimation.

Files and Directories

exp_inducing_points.ipynb
Experiment investigating how varying the number of inducing points affects the model’s performance and efficiency.
exp_kernel_weights.ipynb
Experiment testing kernel optimization on filtered datasets (Soccer, Politics, and Technology) to identify which kernels play a more dominant role.
exp_step_size.ipynb
Experiment exploring the impact of step size in ELBO maximization on convergence, predictive performance, and uncertainty calibration.
kernels.py
Contains implementations of various kernels:
- Squared Exponential (SE)
- Spectral Mixture (SM)
- Matérn
- Linear
- Sinusoidal
data.py
Data preprocessing and manipulation methods, including:
- Splitting input-output matrices
- Cleaning and filtering data (e.g., median filtering)
- Optional normalization
plot.py
Visualization utilities to generate plots of time series, forecasts, ELBO curves, and kernel weight distributions.
test_kernels.ipynb
Preliminary tests ensuring correct kernel implementations.
test_simple2D.ipynb
A toy experiment using a simple 2D dataset to verify the pipeline before applying it to more complex web traffic data.
sparse_gp.py
Implementation of the Sparse Gaussian Process and ELBO-based variational inference, including:
- Variational optimization of inducing points and kernel hyperparameters
- Computation of ELBO, predictive distributions, and other components of the GP framework

Kernels and Their Representations

We combine multiple kernels to capture different aspects of time series behavior. Some kernels model smooth variations, others handle periodicity or complex frequency structures. Below are covariance heatmaps for two illustrative kernels:

Sinusoidal Kernel Covariance Heatmap:
Captures periodic patterns in the data.

Spectral Mixture Kernel Covariance Heatmap:
Handles multi-periodic or complex frequency patterns through a Gaussian mixture in the spectral domain.

Sparse Gaussian Processes and Inducing Points

Traditional GPs scale poorly for large datasets with complexity on the order of O(N^3). Sparse GPs address this by using a set of inducing points M (with M << N) to reduce computational complexity to approximately O(M^2 N).

Inducing Points Visualization:
This figure conceptually shows how inducing points represent a compressed summary of the data, balancing complexity and scalability.

Data and Preprocessing

We apply our methods to subsets of the Wikipedia Traffic Data Exploration dataset (2015-2017). Our focus is on correlated web traffic time series—e.g., English Premier League soccer clubs, political figures, and major technology companies.

To handle outliers or extreme behavior, we experimented with median filtering. While it improved performance in some datasets, certain datasets like the unfiltered soccer data still yielded strong forecasts, potentially because there were no extreme anomalies to remove.

Experimental Results

Toy Example (Simple 2D Data):
Before tackling real-world complexity, we validated our approach on a simple toy dataset. This ensures that our code and methods function correctly in a controlled scenario.
Soccer Dataset (Unfiltered):
On this raw dataset—without median filtering—our model could still identify underlying patterns. This suggests that when data aren’t plagued by severe outliers, filtering may not be necessary.

In contrast, for datasets like Politics or Technology (not shown here), median filtering helped stabilize the model due to more erratic search volume patterns. The experiments showed that:

Some kernels never fully dropped to zero weight, indicating even less dominant kernels still offered incremental improvements.
Step size tuning affected how confidently (and how accurately) the model predicted.
Varying inducing points had less impact than anticipated, suggesting a broad robustness in how the sparse GP model leveraged them.

How to Use

Data Preparation:
Use data.py to preprocess your dataset into the required format.
Running Experiments:
- exp_step_size.ipynb: Test different step sizes.
- exp_inducing_points.ipynb: Vary inducing points.
- exp_kernel_weights.ipynb: Optimize kernel weights on filtered datasets.
For initial checks:
- test_kernels.ipynb: Verify kernel implementations.
- test_simple2D.ipynb: Run a toy scenario.
Visualization:
Use plot.py to generate plots and data_imgs/ for reference images or saved figures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kernel Combinations for Sparse Gaussian Processes in Correlated Web Traffic Forecasting

Files and Directories

Kernels and Their Representations

Sparse Gaussian Processes and Inducing Points

Data and Preprocessing

Experimental Results

How to Use

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
data_imgs		data_imgs
.gitignore		.gitignore
README.md		README.md
data.npy		data.npy
data.py		data.py
data_plots.ipynb		data_plots.ipynb
exp_inducing_points.ipynb		exp_inducing_points.ipynb
exp_kernel_weights.ipynb		exp_kernel_weights.ipynb
exp_step_size.ipynb		exp_step_size.ipynb
kernels.py		kernels.py
plots.py		plots.py
sparse_gp.py		sparse_gp.py
test_kernels.ipynb		test_kernels.ipynb
test_simple2D.ipynb		test_simple2D.ipynb

arvchahal/sparse-GP-kernel-combinations-web-forecasting

Folders and files

Latest commit

History

Repository files navigation

Kernel Combinations for Sparse Gaussian Processes in Correlated Web Traffic Forecasting

Files and Directories

Kernels and Their Representations

Sparse Gaussian Processes and Inducing Points

Data and Preprocessing

Experimental Results

How to Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages