StreamCPI is a framework for reducing the memory consumption of streaming graph partitioners by Compressing the array of block assignments (Partition Indices) used by such partitioners. In particular, StreamCPI utilizes run-length data compression to encode runs of repeating block assignments generated by the streaming partitioner on-the-fly. In this framework, we offer a novel (semi-)dynamic compression vector that functions as a drop-in replacement to standard arrays, like std::vector in C++, used to store block assignments in streaming graph partitioners.
Can we use the (semi-)dynamic compression vector to reduce memory consumption in our streaming algorithm?
Yes, if your streaming algorithm stores arrays with repeating values, you can greatly benefit from our compression vector which supports both append and access operations, and is very easy to integrate. The code and more details on how to use the compression vector are provided in a seperate GitHub repository https://github.com/kurpicz/cpi.
In this repository, we build StreamCPI with a modified Fennel partitioning scoring function. However, StreamCPI can be inserted as a subroutine to reduce the memory footprint in any streaming graph
partitioning tool that requires to store block ids in a vector of size
- C++-14 ready compiler (g++ version 10+)
- CMake
- Scons (http://www.scons.org/)
- Argtable (http://argtable.sourceforge.net/)
To build the software, run
./compile.sh
Alternatively, you can use the standard CMake build process.
The resulting binary is located in deploy/stream_cpi
and deploy/stream_cpi_generated
.
To partition a graph in METIS format using StreamCPI, run
./stream_cpi <graph filename> --k=<number of blocks>
By default, the partitioner stores the resulting block assignments in a file identified by graph_k.txt
. To obtain more information pertaining to the quality of the partition, such as, edge cut, running time, memory consumption, etc., pass the flag --write_results
.
To partition a graph in METIS format using the StreamCPI with complete run length compression, run
./stream_cpi <graph filename> --k=<number of blocks> --rle_length=<mode, eg., 0, refer to table below>
The --rle_length
flag can be set to various values depending on which mode you wish to select. Refer to the following table.
rle_length | mode |
---|---|
0 | complete run length compression (recommended) |
-1 | std::vector (fastest) |
-2 | external memory PQ (using stxxl, easily configurable in lib/data_structure/ExternalPQ.h) |
100 or more | batch-wise compression: each compression vector is responsible for rle_length nodes |
To further enhance memory reduction and faster runtime, pass a flag --kappa
to encourage repeated block assignments.
./stream_cpi <graph filename> --k=<number of blocks> --rle_length=<mode, eg., 0> --kappa=<scale factor, eg. 20>
For a complete list of parameters alongside with descriptions, run:
./stream_cpi --help
Note:
- The program stores the results of the executed command in a flatbuffer
.bin
file identified bygraph_k_kappa.bin
if you pass the flagwrite_results
. - To partition graphs in StreamCPI with 64 bit vertex IDs, edit the CMakeLists.txt file to change
Line 70: option(64BITVERTEXMODE "64 bit mode" OFF)
toLine 70: option(64BITVERTEXMODE "64 bit mode" ON)
, and then run./compile.sh
. By default, 64 bit vertex IDs are enabled. - For a description of the METIS graph format, please have a look at the KaHiP manual.
In our work, we performed experiments with graphs sourced from the following repositories:
- SNAP Dataset: https://snap.stanford.edu/data/
- 10th Dimacs Challenge: https://sites.cc.gatech.edu/dimacs10/downloads.shtml
- Laboratory for Web Algorithmics: https://law.di.unimi.it/datasets.php
- Network Repository Website: https://networkrepository.com/
For our experiments, we converted these graphs to the METIS format, while removing parallel edges, self-loops, and directions, and assigning unitary weight to all nodes and edges.
This repository includes another program, deploy/stream_cpi_generated
, in which a user can partition a graph generated on-the-fly with a novel streaming graph generator. The streaming graph generator is also
made available open-source in the following GitHub repository: https://github.com/adilchhabra/KaGen, which includes instructions on how a user can experiment with various graph generation models in a streaming setting.
This has wide applicability across all streaming algorithms under development for experimentation and testing. Soon, this streaming generator will be integrated into the popular
KaGen graph generation repository: https://github.com/KarlsruheGraphGeneration/KaGen.
To partition a generated Barabassi-Albert graph using StreamCPI, run
./stream_cpi_generated <partition_output_filename> --k=<number of blocks> --rle_length=<mode> --kappa=<scaling factor> --ba --nodes_to_generate=<n> --kagen_d_ba=<avg. deg. of BA graph generation> --kagen_chunk_count=<num. of chunks within which to generate graph>
To partition a generated RGG2D graph using StreamCPI, run
./stream_cpi_generated <partition_output_filename> --k=<number of blocks> --rle_length=<mode> --kappa=<scaling factor> --rgg2d --nodes_to_generate=<n> --kagen_r=<radius of RGG graph generation> --kagen_chunk_count=<num. of chunks within which to generate graph>
Please refer to https://github.com/adilchhabra/KaGen to learn more about the graph generation models and their corresponding parameters.