This repository contains the source code for a collection of efficient parallel in-memory Top-k selection algorithms. CPU-only versions are verified and tested for correctness. GPU-only implementations are still under development.
The following is a short description of each CPU algorithm:
-
cpu/TA.h
- Scalar implementation of the Threshold Algorithm using standard C++ library. -
cpu/TPAc.h
- Scalar, SIMD, SIMD + multi-threaded, and multi-query implementation of Full Table Evaluation (FTE) using column major order. -
cpu/TPAr.h
- Scalar, SIMD, and SIMD + multi-threaded implementation of Full Table Evaluation (FTE) using row major order. -
cpu/VTA.h
- Scalar, SIMD, SIMD + multi-threaded, and multi-query implementation of Vectorized Threshold Algorithm (VTA). -
cpu/SLA.h
- Scalar, SIMD, and SIMD + multi-threaded implementation of Skyline Layered Algorithm (SLA). -
cpu/PTA.h
- Scalar, SIMD, SIMD + multi-threaded, and multi-query implementation of Partitioned Threshold Algorithm (PTA).
To evaluate the above implementations, you need to configure debug.sh
. Following we provide a short description
of those parameters and a few example how to configure them.
-
START_N, END_N
- For synthetic data, choose number of objects to test. -
DIMS
- For synthetic data, choose number of attributes for each object. Specified also queries to be tested (i.e. DIMS=8, queries on 2,3,4,5,6,7,8 attributes). -
KKS, KKE
- Select range of k to test incrementing by a factor of 2 (i.e. KKS=16 KKE=128, 16,32,64,128). -
LD
- Specified synthetic or real data. LD=0 synthetic data on disk, LD=1 synthetic data generated in-memory, LD=3 real data. -
distr
- Choose data distribution (i.e. distr=c(orrelated), distr=i(ndependent), distr=a(nticorrelated) to generate on disk or in-memory. -
script
- script name used to generate synthetic data on disk. -
REAL_DATA_PATH
- Path to real data file. It needs to follow a specific naming convention __ -
device
- Choose between cpu or gpu evaluation. device=0, cpu-only for now. -
QM=0
- Query attributes starting from the last or first attribute, 0 or 1 respectively -
QD
- Query interval testing (i.e. QD=2 then 2,4,6,8). -
IMP
- Choose what implementation to test (i.e. Scalar=0, SIMD=1, Threads=2, Multi-query random dimension=3, Multi-query fixed dimension=4) -
ITER
- Choose number of iterations to benchmark a query. -
MQTHREADS
- Multi-query number of threads. -
STATS_EFF
- Gather statistics associated with number of objects evaluated. -
WORKLOAD
- Number of queries generated for multi-query evaluation. -
TA_B
- Enable TA benchmark. -
TPAc_B
- Enable FTE column major benchmark. -
TPAr_B
- Enable FTE row major benchmark. -
VTA_B
- Enable VTA benchmark. -
PTA_B
- Enable PTA benchmark. -
SLA_B
- Enable SLA benchmark.