Skip to content

StreamCPI is a framework for reducing the memory consumption of streaming graph partitioners by compressing the array of block assignments used by such partitioners with run-length compression.

License

Notifications You must be signed in to change notification settings

KaHIP/CompressedStreamingGraphPartitioning

Repository files navigation

StreamCPI: Framework for Streaming Graph Partitioning

License: MIT

What is StreamCPI?

StreamCPI is a framework for reducing the memory consumption of streaming graph partitioners by Compressing the array of block assignments (Partition Indices) used by such partitioners. In particular, StreamCPI utilizes run-length data compression to encode runs of repeating block assignments generated by the streaming partitioner on-the-fly. In this framework, we offer a novel (semi-)dynamic compression vector that functions as a drop-in replacement to standard arrays, like std::vector in C++, used to store block assignments in streaming graph partitioners.

Can we use the (semi-)dynamic compression vector to reduce memory consumption in our streaming algorithm?

Yes, if your streaming algorithm stores arrays with repeating values, you can greatly benefit from our compression vector which supports both append and access operations, and is very easy to integrate. The code and more details on how to use the compression vector are provided in a seperate GitHub repository https://github.com/kurpicz/cpi.

Which streaming partitioner does StreamCPI use?

In this repository, we build StreamCPI with a modified Fennel partitioning scoring function. However, StreamCPI can be inserted as a subroutine to reduce the memory footprint in any streaming graph partitioning tool that requires to store block ids in a vector of size $\Theta(n)$.

Installation Notes

Requirements

Building StreamCPI

To build the software, run

./compile.sh

Alternatively, you can use the standard CMake build process.

The resulting binary is located in deploy/stream_cpi and deploy/stream_cpi_generated.

Running StreamCPI

To partition a graph in METIS format using StreamCPI, run

./stream_cpi <graph filename> --k=<number of blocks> 

By default, the partitioner stores the resulting block assignments in a file identified by graph_k.txt. To obtain more information pertaining to the quality of the partition, such as, edge cut, running time, memory consumption, etc., pass the flag --write_results.

To partition a graph in METIS format using the StreamCPI with complete run length compression, run

./stream_cpi <graph filename> --k=<number of blocks> --rle_length=<mode, eg., 0, refer to table below>

The --rle_length flag can be set to various values depending on which mode you wish to select. Refer to the following table.

rle_length mode
0 complete run length compression (recommended)
-1 std::vector (fastest)
-2 external memory PQ (using stxxl, easily configurable in lib/data_structure/ExternalPQ.h)
100 or more batch-wise compression: each compression vector is responsible for rle_length nodes

To further enhance memory reduction and faster runtime, pass a flag --kappa to encourage repeated block assignments.

./stream_cpi <graph filename> --k=<number of blocks> --rle_length=<mode, eg., 0> --kappa=<scale factor, eg. 20>

For a complete list of parameters alongside with descriptions, run:

./stream_cpi --help

Note:

  • The program stores the results of the executed command in a flatbuffer .bin file identified by graph_k_kappa.bin if you pass the flag write_results.
  • To partition graphs in StreamCPI with 64 bit vertex IDs, edit the CMakeLists.txt file to change Line 70: option(64BITVERTEXMODE "64 bit mode" OFF) to Line 70: option(64BITVERTEXMODE "64 bit mode" ON), and then run ./compile.sh. By default, 64 bit vertex IDs are enabled.
  • For a description of the METIS graph format, please have a look at the KaHiP manual.

Data References

In our work, we performed experiments with graphs sourced from the following repositories:

For our experiments, we converted these graphs to the METIS format, while removing parallel edges, self-loops, and directions, and assigning unitary weight to all nodes and edges.

Additional Information

This repository includes another program, deploy/stream_cpi_generated, in which a user can partition a graph generated on-the-fly with a novel streaming graph generator. The streaming graph generator is also made available open-source in the following GitHub repository: https://github.com/adilchhabra/KaGen, which includes instructions on how a user can experiment with various graph generation models in a streaming setting. This has wide applicability across all streaming algorithms under development for experimentation and testing. Soon, this streaming generator will be integrated into the popular KaGen graph generation repository: https://github.com/KarlsruheGraphGeneration/KaGen.

To partition a generated Barabassi-Albert graph using StreamCPI, run

./stream_cpi_generated <partition_output_filename> --k=<number of blocks> --rle_length=<mode> --kappa=<scaling factor> --ba --nodes_to_generate=<n> --kagen_d_ba=<avg. deg. of BA graph generation> --kagen_chunk_count=<num. of chunks within which to generate graph>

To partition a generated RGG2D graph using StreamCPI, run

./stream_cpi_generated <partition_output_filename> --k=<number of blocks> --rle_length=<mode> --kappa=<scaling factor> --rgg2d --nodes_to_generate=<n> --kagen_r=<radius of RGG graph generation> --kagen_chunk_count=<num. of chunks within which to generate graph>

Please refer to https://github.com/adilchhabra/KaGen to learn more about the graph generation models and their corresponding parameters.

About

StreamCPI is a framework for reducing the memory consumption of streaming graph partitioners by compressing the array of block assignments used by such partitioners with run-length compression.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published