TopK Algorithms Benchmark

Note: switch to the appropriate branch to run the algorithms (cpu_ks for CPUs, gpu_topk for GPUs)

This repository contains the source code for a collection of efficient parallel in-memory Top-k selection algorithms. CPU-only versions are verified and tested for correctness. GPU-only implementations are still under development.

The following is a short description of each CPU algorithm:

cpu/TA.h
- Scalar implementation of the Threshold Algorithm using standard C++ library.
cpu/TPAc.h
- Scalar, SIMD, SIMD + multi-threaded, and multi-query implementation of Full Table Evaluation (FTE) using column major order.
cpu/TPAr.h
- Scalar, SIMD, and SIMD + multi-threaded implementation of Full Table Evaluation (FTE) using row major order.
cpu/VTA.h
- Scalar, SIMD, SIMD + multi-threaded, and multi-query implementation of Vectorized Threshold Algorithm (VTA).
cpu/SLA.h
- Scalar, SIMD, and SIMD + multi-threaded implementation of Skyline Layered Algorithm (SLA).
cpu/PTA.h
- Scalar, SIMD, SIMD + multi-threaded, and multi-query implementation of Partitioned Threshold Algorithm (PTA).

Behnchmark Instructions

To evaluate the above implementations, you need to configure debug.sh. Following we provide a short description of those parameters and a few example how to configure them.

START_N, END_N
- For synthetic data, choose number of objects to test.
DIMS
- For synthetic data, choose number of attributes for each object. Specified also queries to be tested (i.e. DIMS=8, queries on 2,3,4,5,6,7,8 attributes).
KKS, KKE
- Select range of k to test incrementing by a factor of 2 (i.e. KKS=16 KKE=128, 16,32,64,128).
LD
- Specified synthetic or real data. LD=0 synthetic data on disk, LD=1 synthetic data generated in-memory, LD=3 real data.
distr
- Choose data distribution (i.e. distr=c(orrelated), distr=i(ndependent), distr=a(nticorrelated) to generate on disk or in-memory.
script
- script name used to generate synthetic data on disk.
REAL_DATA_PATH
- Path to real data file. It needs to follow a specific naming convention __
device
- Choose between cpu or gpu evaluation. device=0, cpu-only for now.
QM=0
- Query attributes starting from the last or first attribute, 0 or 1 respectively
QD
- Query interval testing (i.e. QD=2 then 2,4,6,8).
IMP
- Choose what implementation to test (i.e. Scalar=0, SIMD=1, Threads=2, Multi-query random dimension=3, Multi-query fixed dimension=4)
ITER
- Choose number of iterations to benchmark a query.
MQTHREADS
- Multi-query number of threads.
STATS_EFF
- Gather statistics associated with number of objects evaluated.
WORKLOAD
- Number of queries generated for multi-query evaluation.
TA_B
- Enable TA benchmark.
TPAc_B
- Enable FTE column major benchmark.
TPAr_B
- Enable FTE row major benchmark.
VTA_B
- Enable VTA benchmark.
PTA_B
- Enable PTA benchmark.
SLA_B
- Enable SLA benchmark.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TopK Algorithms Benchmark

Note: switch to the appropriate branch to run the algorithms (cpu_ks for CPUs, gpu_topk for GPUs)

Behnchmark Instructions

Files

README.md

Latest commit

History

README.md

File metadata and controls

TopK Algorithms Benchmark

Note: switch to the appropriate branch to run the algorithms (cpu_ks for CPUs, gpu_topk for GPUs)

Behnchmark Instructions