A SNAP dataset http://snap.stanford.edu/ is a text file that each line is an edge with source and destination vertex IDs, such as
11 25
A CSR data generally contains two lists, the begin position array which represents the offsets of the neighbor lists by source vertex ID, and the adjacency list which stores destination vertices of all edges.
adjacent.bin
begin.bin
head.bin
edge
The codes are in the folder below.
TC/graph_converter/undirected_csr/
./tuple_to_undirected_csr.bin input
It automatically generates CSR files adjacent.bin
, head.bin
and begin.bin
to current path. Each line in SNAP file (e.g., 0,1
) generates two edges (0,1) and (1,0). The converted files are bit files. The begin position file begin.bin
represents the offsets of each neighbor lists in both head.bin
and adjacent.bin
. The neighbor lists are not sorted.
The codes are in the folder below.
TC/graph_cleaner/
./graph_cleane.bin input_path
The cleaner sorts all neighbor lists, removes self-loop and multiple edges. After this step, the graph is an undirected graph in CSR format (both directions of each edge is stored in the CSR). The output files are written to current folder, so to copy the executable to target folder is suggested.
./rank_by_degree.bin input_path
The rank_by_degree orientation tool removes half of all edges. It only keeps the edges from lower degree vertex to higher degree vertex. After this step, the graph is an undirected graph in CSR format. The output files are written to current folder, so to copy the executable to target folder is suggested.
The codes are in the folder below.
TC/TC-EXT/partitioner-x/
./partition-x input_path
The partitioner tool partition the graph in a 2-D fashion. The rule is described in paper It only keeps the edges from lower degree vertex to higher degree vertex. After this step, the graph is an undirected graph in CSR format. The output files are written to current folder, so to copy the executable to target folder is suggested.
The path of in-memory GPU triangle counting is TC/TC-GPU/work-steal/
.
Run the code with command ./tc <input_path>
.
The path of in-memory CPU triangle counting is TC/TC-CPU/tc-ne-cpu/
.
Run the code with command ./tc <input_path>
.
The dataset path is TC/example_dataset/amazon0302/
, the _adj.mmio is the original data format downloaded from https://graphchallenge.mit.edu/data-sets. Converter tool source code is in TC/gConv/
, copy the executable to dataset path, line 8 of the bash script converter.sh
needs to be modified (set the number to (#vertex+1), the vertex count can be found by less <input>_adj.mmio
). Run the bash script with command ./converter.sh <input>_adj.mmio
(replace the input name, for example, amazon0302).